Prophet LogoProphet

How Prophet's Agent Works

A technical look at how our AI agent sees web pages, makes decisions, and automates browser tasks. Built with the Anthropic API with tool use and a custom agent loop.

The Agent Loop

Prophet's agent follows a continuous loop to understand and interact with web pages:

1

Observe

Take accessibility tree snapshot

2

Think

Claude analyzes page & task

3

Act

Execute tool calls on browser

4

Repeat

Continue until task complete

The agent can execute up to 10 tool calls per conversation to prevent runaway behavior. Each action is logged in the chat for full transparency.

How the Agent "Sees" Pages

Instead of raw HTML, Prophet uses the browser's accessibility tree - a structured representation of interactive elements that's perfect for AI understanding.

Why Accessibility Tree?

• Focuses only on interactive elements (buttons, links, inputs)

• Filters out visual noise and decorative elements

• Provides semantic roles and names

• More stable than CSS selectors across page changes

• Same data structure screen readers use

Unique Identifier System

• Each element gets an 8-character UID

• UIDs injected as data-prophet-nodeid attributes

• Stable across snapshots for the same elements

• Agent uses UIDs to target specific elements

• UIDs are internal-only (never exposed to users)

Chrome DevTools Protocol

Prophet uses the Chrome DevTools Protocol (CDP) - the same technology that powers Chrome DevTools. This provides low-level browser control.

  • Direct DOM access and manipulation
  • Mouse and keyboard event simulation
  • Page navigation control
  • Network request interception
  • Accessibility tree inspection
  • Tab management

Agent Tools (19 Available)

Claude has access to 19 specialized tools for browser automation. Each tool is designed for a specific type of interaction:

Observation

take_snapshot

Captures the accessibility tree - how the agent "sees" the page with all interactive elements.

get_page_content

Extracts the cleaned text content of the current page.

search_snapshot

Searches the accessibility tree for specific elements by text.

get_page_info

Gets metadata about the current page (URL, title, viewport).

Interaction

click_element_by_uid

Clicks buttons, links, checkboxes using unique identifiers from the snapshot.

fill_element_by_uid

Types into text inputs, textareas, and form fields.

hover_element_by_uid

Hovers over elements to reveal dropdowns and tooltips.

Navigation

navigate

Navigates the browser to a specific URL.

scroll_page

Scrolls the page in any direction to reveal more content.

go_back

Navigates back in browser history.

go_forward

Navigates forward in browser history.

reload_page

Reloads the current page.

Wait

wait_for_selector

Waits for dynamic content to load (for SPAs like React/Vue).

wait_for_navigation

Waits for page navigation to complete before proceeding.

wait_for_timeout

Pauses execution for a specified duration.

Tabs

list_tabs

Lists all open browser tabs.

switch_tab

Switches focus to a specific tab.

close_tab

Closes a specific tab.

open_new_tab

Opens a URL in a new tab.

Decision Making with Claude

Prophet uses Anthropic Claude as the reasoning engine. Users can select from three Claude 4.5 models:

Haiku 4.5

Fast & efficient for simple tasks

Sonnet 4.5

Balanced performance & capability

Opus 4.5

Most capable for complex tasks

Here's how Claude makes decisions:

1. Context Analysis
Claude receives your message, conversation history, and the current accessibility tree snapshot. It analyzes what you want to accomplish and what's visible on the page.
2. Tool Selection
Based on the task, Claude chooses which tools to use. For example, to fill a form it might: take_snapshot → search_snapshot for the form → fill_element_by_uid → click_element_by_uid to submit.
3. Execution & Feedback
Each tool returns results (success/failure, new page content, etc). Claude uses this feedback to decide the next action or determine if the task is complete.
4. Error Recovery
If something fails (element not found, page changed unexpectedly), Claude can retry with different approaches, scroll to reveal content, or ask for clarification.

Architecture Overview

Prophet uses a custom browser automation architecture built on the Anthropic API with tool use. The system has three main components:

1. Chrome Extension

Runs in your browser, manages the agent loop, and executes tools locally using Chrome DevTools Protocol.

2. Backend API

Handles authentication, rate limiting, and billing. Streams Claude's responses to the extension.

3. Anthropic API

Receives page context and returns intelligent actions (clicks, typing, navigation) for the browser.

Why Accessibility Tree?

Unlike screenshot-based approaches (Computer Use, Claude in Chrome), Prophet uses the accessibility tree:

Fast - No image processing or vision models

Deterministic - UIDs target exact elements

Efficient - Less tokens than screenshots

Reliable - Same approach as Playwright MCP

Why Custom Agent Loop?

Prophet implements its own agent loop instead of using Claude Agent SDK:

Browser Context - Tools run in your logged-in session

No Dependencies - No Claude Code CLI required

Full Control - Custom tool execution via CDP

Security - Tool execution isolated from backend

Why Client-Side Tool Execution?

Prophet executes tools inside your browser (client-side) rather than on a server. This is a critical design choice that enables browser automation.

The Requirement

Browser automation tools need access to the Chrome DevTools Protocol (CDP) to:

• Control the browser (click, type, scroll)

• Read page state (accessibility tree, element properties)

• Manage tabs and navigation

CDP is only available in Chrome extensions - not on backend servers.

The Benefits

Running tools in your browser means:

Your session, your control - Automation happens in your logged-in browser, not a separate instance

Security - Backend never sees what you're browsing

Privacy - Page content stays local to your machine

No dependencies - No separate browser instances needed

This architecture choice is what makes Prophet different from server-side tools like web scrapers or coding agents. For more details on when to use client-side vs server-side tool execution, see our Architecture Guide.

Prophet vs Claude in Chrome

Anthropic offers Claude in Chrome, their official browser extension. Here's how Prophet's approach differs:

FeatureClaude in ChromeProphet
How it "sees" pagesScreenshots (vision model)Accessibility tree (structured data)
Speed"Noticeably slower" - screenshot/analyze cycleFast - direct element targeting
Vision modelRequiredNot needed
Element targetingCoordinate-based (probabilistic)UID-based (deterministic)
Token usageHigh (images are expensive)Low (structured text)
InfrastructureAnthropic's serversYour own backend (full control)
BillingClaude subscription ($20-200/mo)Pay-per-use credits

Key insight: Prophet's accessibility tree approach is the same method used by Playwright MCP, which states: "Rather than relying on screenshots, it generates structured accessibility snapshots... making interactions more deterministic and efficient."

Learn More

Anthropic API - Tool Use

Official documentation on how Claude processes and executes tools.

platform.claude.com →

Playwright MCP

Microsoft's MCP server using the same accessibility tree approach.

github.com/microsoft/playwright-mcp →

Claude in Chrome

Anthropic's official browser extension using screenshot-based approach.

anthropic.com →

Chrome DevTools Protocol

The low-level protocol Prophet uses for browser automation.

chromedevtools.github.io →