Playwright Agent

A multi - agent browser automation framework based on AgentKit, which realizes intelligent web page navigation and task execution through division of labor and collaboration, including four core modules: task planning, navigation control, browser operation and result verification.

Browser automation Developer tools #Intelligent Navigation #Multi - Agent #Automation #Task Decomposition .TypeScript

rating : 2 points

downloads : 6.4K

update time : 2025-04-28

Open Site

What is AgentKit Browser Automation?

This is an advanced browser automation solution that adopts a multi - agent system architecture. It can intelligently decompose tasks, navigate web pages, perform operations, and verify results. The system simulates the entire process of human browser operation and is suitable for various web page automation scenarios.

How to use AgentKit Browser Automation?

The system works through the collaboration of four professional agents: the planning agent decomposes tasks, the navigation agent determines operation steps, the browser agent performs specific operations, and the verification agent confirms results. Users only need to provide the target task, and the system will automatically complete the entire process.

Applicable scenarios

It is suitable for scenarios that require browser automation, such as web data collection, form auto - filling, web test automation, and regular web operations. It is particularly suitable for complex and multi - step web operation tasks.

Main Features

Multi - Agent System

Four professional agents work together to simulate humans to complete complex web operations

Intelligent Task Planning

Automatically decompose complex tasks into an executable sequence of steps

State Management

Track the browser state and operation results in real - time to ensure process coherence

Error Handling

Built - in robust error handling and recovery mechanisms

Result Verification

Automatically verify the task completion status and result accuracy

Advantages

Intelligently decompose complex tasks and reduce manual planning

Multi - agent collaboration improves task success rate

Perfect error handling and state management

Highly scalable and supports custom operations

Limitations

Requires an OpenAI API key support

Limited support for dynamic web pages

Initial configuration is relatively complex

How to Use

Environment Preparation

Install Node.js (v14 or higher) and npm/yarn

Clone the Repository

Get the project source code

Install Dependencies

Install all the dependency packages required by the project

Configure the Environment

Set necessary configurations such as the OpenAI API key

Start the Service

Start different components in three terminals respectively

Usage Examples

Web Data Collection

Automatically log in to the website, navigate to the specified page, collect table data and save it

Form Batch Filling

Read an Excel file, automatically fill out web forms and submit them

Frequently Asked Questions

Do I need programming knowledge to use this system?

Which browsers does the system support?

How to handle captchas?

Related Resources

AgentKit Official Documentation

Official documentation and examples of the AgentKit framework

Playwright MCP

Official repository of Playwright MCP

Example Projects

Related example projects and implementations

🚀 AgentKit Browser Automation Framework

An advanced browser automation framework built on AgentKit, leveraging a multi-agent system for intelligent web navigation and task execution.

🚀 Quick Start

Dependencies

Node.js (v14 or higher)
npm or yarn
OpenAI API key (for GPT models)

Installation

Clone the repository:

git clone https://github.com/tmahesh/playwright-agent.git
cd playwright-agent

Install the dependencies:

npm install

Set up environment variables:

cp .env.sample .env
# Edit the .env file with your OpenAI API key and other configurations

Run the following commands in different terminals: index.ts, playwright-mcp, inngest-cli

npx @playwright/mcp@latest --port 8931

npx tsx index.ts

npx inngest-cli@latest dev --no-discovery -u http://localhost:3000/api/inngest -v

✨ Features

Intelligent Task Planning: Break down complex tasks into manageable steps.
State Management: Track browser states and operation results.
Error Handling: Robust error handling and recovery mechanisms.
Event System: Comprehensive event logging and monitoring.
Flexible Action System: An extensible action registry to support custom behaviors.
Validation Framework: Built - in validation features to confirm task completion.
Memory Management: Maintain operation context and history.

📚 Documentation

Overview

This project implements a multi - agent system based on AgentKit for browser automation. Different agents work together to achieve the following goals:

Plan and decompose tasks
Navigate web pages
Perform browser operations
Validate results

Architecture (To be determined)

The system consists of four specialized agents:

Planning Agent
- Decompose tasks into executable steps
- Create detailed execution plans
- Determine task completion criteria
Navigation Agent
- Determine the next action
- Manage state transitions
- Handle action execution
- Provide detailed logs and feedback
Browser Agent
- Perform browser automation operations
- Interact with web elements
- Handle page navigation
- Manage browser states
Validation Agent
- Validate task completion
- Verify results
- Handle error situations
- Provide success/failure feedback