How Prophet's Agent Works

A technical look at how our AI agent sees web pages, makes decisions, and automates browser tasks. Built with the Anthropic API with tool use and a custom agent loop.

Last updated: March 2026

The Agent Loop

Prophet's agent follows a continuous loop to understand and interact with web pages:

Observe

Take accessibility tree snapshot

Think

Claude analyzes page & task

Act

Execute tool calls on browser

Repeat

Continue until task complete

The agent can execute up to 10 tool calls per conversation to prevent runaway behavior. Each action is logged in the chat for full transparency.

How the Agent "Sees" Pages

Instead of raw HTML, Prophet uses the browser's accessibility tree - a structured representation of interactive elements that's perfect for AI understanding.

Why Accessibility Tree?

• Focuses only on interactive elements (buttons, links, inputs)

• Filters out visual noise and decorative elements

• Provides semantic roles and names

• More stable than CSS selectors across page changes

• Same data structure screen readers use

Unique Identifier System

• Each element gets an 8-character UID

• UIDs injected as data-prophet-nodeid attributes

• Stable across snapshots for the same elements

• Agent uses UIDs to target specific elements

• UIDs are internal-only (never exposed to users)

Chrome DevTools Protocol

Prophet uses the Chrome DevTools Protocol (CDP) - the same technology that powers Chrome DevTools. This provides low-level browser control.

Direct DOM access and manipulation
Mouse and keyboard event simulation
Page navigation control
Network request interception
Accessibility tree inspection
Tab management

Agent Tools (19 Available)

Claude has access to 19 specialized tools for browser automation. Each tool is designed for a specific type of interaction:

Observation

take_snapshot

Captures the accessibility tree - how the agent "sees" the page with all interactive elements.

get_page_content

Extracts the cleaned text content of the current page.

search_snapshot

Searches the accessibility tree for specific elements by text.

get_page_info

Gets metadata about the current page (URL, title, viewport).

Interaction

click_element_by_uid

Clicks buttons, links, checkboxes using unique identifiers from the snapshot.

fill_element_by_uid

Types into text inputs, textareas, and form fields.

hover_element_by_uid

Hovers over elements to reveal dropdowns and tooltips.

Navigation

navigate

Navigates the browser to a specific URL.

scroll_page

Scrolls the page in any direction to reveal more content.

go_back

Navigates back in browser history.

go_forward

Navigates forward in browser history.

reload_page

Reloads the current page.

Wait

wait_for_selector

Waits for dynamic content to load (for SPAs like React/Vue).

wait_for_navigation

Waits for page navigation to complete before proceeding.

wait_for_timeout

Pauses execution for a specified duration.

Tabs

list_tabs

Lists all open browser tabs.

switch_tab

Switches focus to a specific tab.

close_tab

Closes a specific tab.

open_new_tab

Opens a URL in a new tab.

Decision Making with Claude

Prophet uses Anthropic Claude as the reasoning engine. Users can select from three Claude models:

Haiku 4.5

Fast & efficient for simple tasks

Sonnet 4.6

Balanced performance & capability

Opus 4.6

Most capable for complex tasks

Here's how Claude makes decisions:

1. Context Analysis

Claude receives your message, conversation history, and the current accessibility tree snapshot. It analyzes what you want to accomplish and what's visible on the page.

2. Tool Selection

Based on the task, Claude chooses which tools to use. For example, to fill a form it might: take_snapshot → search_snapshot for the form → fill_element_by_uid → click_element_by_uid to submit.

3. Execution & Feedback

Each tool returns results (success/failure, new page content, etc). Claude uses this feedback to decide the next action or determine if the task is complete.

4. Error Recovery

If something fails (element not found, page changed unexpectedly), Claude can retry with different approaches, scroll to reveal content, or ask for clarification.

Architecture Overview

Prophet uses a custom browser automation architecture built on the Anthropic API with tool use. The system has three main components:

1. Chrome Extension

Runs in your browser, manages the agent loop, and executes tools locally using Chrome DevTools Protocol.

2. Backend API

Handles authentication, rate limiting, and billing. Streams Claude's responses to the extension.

3. Anthropic API

Receives page context and returns intelligent actions (clicks, typing, navigation) for the browser.

Why Accessibility Tree?

Unlike screenshot-based approaches (Computer Use, Claude in Chrome), Prophet uses the accessibility tree:

• Fast - No image processing or vision models

• Deterministic - UIDs target exact elements

• Efficient - Less tokens than screenshots

• Reliable - Same approach as Playwright MCP

Why Custom Agent Loop?

Prophet implements its own agent loop instead of using Claude Agent SDK:

• Browser Context - Tools run in your logged-in session

• No Dependencies - No Claude Code CLI required

• Full Control - Custom tool execution via CDP

• Security - Tool execution isolated from backend

Why Client-Side Tool Execution?

Prophet executes tools inside your browser (client-side) rather than on a server. This is a critical design choice that enables browser automation.

The Requirement

Browser automation tools need access to the Chrome DevTools Protocol (CDP) to:

• Control the browser (click, type, scroll)

• Read page state (accessibility tree, element properties)

• Manage tabs and navigation

CDP is only available in Chrome extensions - not on backend servers.

The Benefits

Running tools in your browser means:

• Your session, your control - Automation happens in your logged-in browser, not a separate instance

• Security - Backend never sees what you're browsing

• Privacy - Page content stays local to your machine

• No dependencies - No separate browser instances needed

This architecture choice is what makes Prophet different from server-side tools like web scrapers or coding agents. For more details on when to use client-side vs server-side tool execution, see our Architecture Guide.

Prophet vs Claude in Chrome

Anthropic offers Claude in Chrome, their official browser extension. Here's how Prophet's approach differs:

Feature	Claude in Chrome	Prophet
How it "sees" pages	Screenshots (vision model)	Accessibility tree (structured data)
Speed	"Noticeably slower" - screenshot/analyze cycle	Fast - direct element targeting
Vision model	Required	Not needed
Element targeting	Coordinate-based (probabilistic)	UID-based (deterministic)
Token usage	High (images are expensive)	Low (structured text)
Infrastructure	Anthropic's servers	Your own backend (full control)
Billing	Claude subscription ($20-200/mo)	Pay-per-use credits

Key insight: Prophet's accessibility tree approach is the same method used by Playwright MCP, which states: "Rather than relying on screenshots, it generates structured accessibility snapshots... making interactions more deterministic and efficient."

Learn More

Anthropic API - Tool Use

Official documentation on how Claude processes and executes tools.

platform.claude.com →

Playwright MCP

Microsoft's MCP server using the same accessibility tree approach.

github.com/microsoft/playwright-mcp →

Claude in Chrome

Anthropic's official browser extension using screenshot-based approach.

anthropic.com →

Chrome DevTools Protocol

The low-level protocol Prophet uses for browser automation.

chromedevtools.github.io →

Ready to See It in Action?

Install Prophet and experience AI browser automation. Free plan available — no credit card required.

Add to Chrome