没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
内容概要:本文介绍了IoA(Internet of Agents)框架,一种基于互联网概念的新颖多智能体系统,旨在解决现有框架中存在的三大问题:生态系统隔离、单设备仿真和僵化通信协调。IoA引入了一种灵活的多智能体集成协议、即时通讯架构设计以及动态机制来管理和控制团队形成和对话流程。通过对不同类型任务进行的广泛实验证明,IoA显著超越了现有的多智能体系统基线,在不同任务上表现卓越。 适合人群:对多智能体系统感兴趣的研究者、开发者,特别是那些希望提升智能体协作能力的人群。 使用场景及目标:适用于各种复杂、需要多智能体协同合作的任务环境,如通用助理任务、具身智能任务、增强型生成基准测试等,目标是通过高效的协作提升系统的整体性能。 其他说明:IoA不仅支持分布式环境下的智能体协作,还提供了丰富的工具和灵活的设计,便于集成第三方智能体并适应不同应用场景的需求。同时,该框架的代码已公开发布,为后续研究和应用提供支持。
资源推荐
资源详情
资源评论
Work in progress
INTERNET OF AGENTS: WEAVING A WEB OF HET-
EROGENEOUS AGENTS FOR COLLABORATIVE INTEL-
LIGENCE
Weize Chen
1∗
, Ziming You
2
, Ran Li
1
, Yitong Guan
2
, Chen Qian
1
, Chenyang Zhao
1
Cheng Yang
3
, Ruobing Xie
4
, Zhiyuan Liu
1B
, Maosong Sun
1
1
Tsinghua University,
2
Peking University
3
Beijing University of Posts and Telecommunications,
4
Tencent
chenwz21@mails.tsinghua.edu.cn, rl759@nau.edu
{zimingyou, 2101210206}@stu.pku.edu.cn
liuzy@tsinghua.edu.cn
ABSTRACT
The rapid advancement of large language models (LLMs) has paved the way
for the development of highly capable autonomous agents. However, existing
multi-agent frameworks often struggle with integrating diverse capable third-party
agents due to reliance on agents defined within their own ecosystems. They also
face challenges in simulating distributed environments, as most frameworks are
limited to single-device setups. Furthermore, these frameworks often rely on
hard-coded communication pipelines, limiting their adaptability to dynamic task
requirements. Inspired by the concept of the Internet, we propose the Internet of
Agents (IoA), a novel framework that addresses these limitations by providing a
flexible and scalable platform for LLM-based multi-agent collaboration. IoA in-
troduces an agent integration protocol, an instant-messaging-like architecture de-
sign, and dynamic mechanisms for agent teaming and conversation flow control.
Through extensive experiments on general assistant tasks, embodied AI tasks, and
retrieval-augmented generation benchmarks, we demonstrate that IoA consistently
outperforms state-of-the-art baselines, showcasing its ability to facilitate effective
collaboration among heterogeneous agents. IoA represents a step towards linking
diverse agents in an Internet-like environment, where agents can seamlessly col-
laborate to achieve greater intelligence and capabilities. Our codebase has been
released at https://github.com/OpenBMB/IoA.
1 INTRODUCTION
The Internet has revolutionized the way people collaborate and share knowledge, connecting indi-
viduals with diverse skills and backgrounds from all around the world. This global network has
enabled the creation of remarkable collaborative projects, such as Wikipedia
1
and the development
of the Linux operating system
2
, which would have been impossible for any single person to achieve.
The Internet has greatly facilitated collaboration among people, making the impossible possible and
pushing the boundaries of human achievement.
The success of the Internet in enabling human collaboration raises an intriguing question: can we
create a similar platform to facilitate collaboration among autonomous agents? With the rapid ad-
vancements in LLMs (OpenAI, 2023; Reid et al., 2024), we now have autonomous agents capable
of achieving near-human performance on a wide range of tasks. These LLM-based agents have
demonstrated the ability to break down complex tasks into executable steps, leverage various tools,
and learn from feedback and experience (Qin et al., 2023; Wang et al., 2023c; Shinn et al., 2023;
∗
Equal Contribution. B Corresponding author.
1
https://www.wikipedia.org/
2
https://www.linux.org/
1
arXiv:2407.07061v1 [cs.CL] 9 Jul 2024
Work in progress
Qian et al., 2023b). As the capabilities of these agents continue to grow, and with an increasing
number of third-party agents with diverse skills consistently emerging (Chase, 2022; Team, 2023;
Significant Gravitas, 2023; Open Interpreter, 2023), it is crucial to explore how we can effectively
and efficiently orchestrate their collaboration, just as the Internet has done for humans.
To address this challenge, we propose the concept of the Internet of Agents (IoA), a general frame-
work for agent communication and collaboration inspired by the Internet. IoA aims to address
three fundamental limitations of existing multi-agent frameworks (Chen et al., 2023; Wu et al.,
2023; Hong et al., 2023; Qian et al., 2023a): (1) Ecosystem Isolation: Most frameworks only con-
sider agents defined within their own ecosystems, potentially blocking the integration of various
third-party agents and limiting the diversity of agent capabilities and the platform’s generality; (2)
Single-Device Simulation: Nearly all multi-agent frameworks simulate multi-agent systems on a
single device, which differs significantly from real-world scenarios where agents could be distributed
across multiple devices located in different places; (3) Rigid Communication and Coordination:
The communication process, agent grouping, and state transitions are mostly hard-coded, whereas
in real life, humans decide on teammates based on the task at hand and dynamically switch between
discussion and task assignment or execution.
To overcome these limitations, we propose an agent integration protocol that enables different third-
party agents running on different devices to be seamlessly integrated into the framework and collabo-
rate effectively. Additionally, we introduce an instant-messaging-app-like framework that facilitates
agent discovery and dynamic teaming. By autonomously searching for potential agents capable of
handling the tasks at hand, agents can dynamically decide to form different teams and communicate
within various group chats. Inspired by Speech Act Theory (Searle, 1969), and its application in
conventional multi-agent system (Finin et al., 1994; Labrou et al., 1999), within each group chat, we
abstract out several conversation states and provide a flexible and general finite-state machine mech-
anism that allows agents to autonomously decide the state of the conversation, facilitating discussion
and sub-task execution.
We demonstrate the effectiveness of IoA through extensive experiments and comparisons with state-
of-the-art autonomous agents. By integrating AutoGPT (Significant Gravitas, 2023) and Open In-
terpreter (Open Interpreter, 2023), we show that IoA achieves a 66 to 76% win rate in open-domain
task evaluations when compared with these agents individually. Furthermore, with only a few basic
ReAct agents integrated, IoA outperforms previous works on the GAIA benchmark (Mialon et al.,
2023). In the retrieval-augmented generation (RAG) question-answering domain, our framework
substantially surpasses existing methods, with a GPT-3.5-based implementation achieving perfor-
mance close to or even exceeding GPT-4, and effectively surpassing previous multi-agent frame-
work.
The impressive performance of IoA across various domains highlights the potential of this paradigm
for autonomous agents. As smaller LLMs continue to advance (Mesnard et al., 2024; Hu et al.,
2024; Abdin et al., 2024), running agents on personal computer or even mobile device is becoming
increasingly feasible. This trend opens up new opportunities for deploying multi-agent systems
in real-world scenarios, where agents can be distributed across multiple devices and collaborate
to solve complex problems. We believe that by further exploring and refining the IoA paradigm,
more sophisticated and adaptable multi-agent systems can be developed, ultimately pushing the
boundaries of what autonomous agents can achieve in problem-solving and decision-making.
2 FRAMEWORK DESIGN AND KEY MECHANISMS OF IOA
In this section, we present a comprehensive overview of IoA, detailing its architecture and key
mechanisms. We will explore how these components work together to enable effective collaboration
among autonomous agents, facilitating dynamic team formation, structured communication, and
efficient task execution.
2.1 OVERVIEW OF IOA
IoA is designed as an instant-messaging-app-like platform that enables seamless communication and
collaboration among diverse autonomous agents. Inspired by the concept of Internet, IoA addresses
2
Work in progress
Client Side
Interaction Layer
Data LayerFoundation Layer
!
Agent Integration Block
Custom Agents Third-Party Agents
"
Data Infra Block
#
Network Infra Block
Websocket
$
Agent Contact Block
%
WeatherAgent
&
Tools: Weather API
'
Desc: It obtains the weather from…
%
AutoGPT
&
Tools: Browser, File System
'
Desc: It is capable yet expensive…
…
(
Group Info Block
)
group_chat_id: 0
*
goal: Calculate sqrt(9!)
+
team_members: […]
,
chat_records: xxx
…
-
Task Management Block
*
sub_goal: Calculate 9!
.
task_status: ongoing
/
assigner: xxx
0
assignee: xx
1
is_trigger_set: False
…
2
Communication Block
3
Team Formation Block
Server Side
Group Chat Members:
CalcAgent AutoGPT
Math Masters
4
Hey CalcAgent, […] Since you
can use calculator, could you…
5
Protocol-Compliant Agent Message
{
“content”: “Sure! […]”,
“sender”: “CalcAgent”,
}
…
Goal: Calculate sqrt(9!)
Agent
Message
Protocol
code interpreter}
Search: {calculation,
Server’s Agent Query Service
AutoGPT
Open Interpreter
Teamup: AutoGPT
Math Masters
4
joined in
the group
6
"
Data Infra Block
#
Network Infra Block
7
Security Block
Agent Register
Rules
Agent
Register
8
Agent Registry Block
%
CalcAgent
&
To o ls : C alcu lato r
'
Description: It can […]
%
AutoGPT
&
To o ls : B ro ws e r, …
'
Description: It is capable…
…
9
Session Management Block
Web-socket Co nn ectio n 1
Web-socket Co nn ectio n 2
…
:
Agent Query
Query: {calculation …}
Agent Registry
;
Group Setup
Team with AutoGPT!
Group ID: xxx
Group Name: Math Masters
Group Members: […]
<
Message Routing
Agent Message from
Communication Block
Team Members
Figure 1: The illustration on the conceptual layered architecture on the design of IoA.
three fundamental challenges in multi-agent systems (Chen et al., 2023; Wu et al., 2023; Qian et al.,
2023a):
1. Distributed agent collaboration: Unlike traditional frameworks that simulate multi-agent sys-
tems on a single device, IoA supports agents distributed across multiple devices and locations.
(Sections 2.2 and 2.3.1)
2. Dynamic and adaptive communication: IoA implements mechanisms for autonomous team for-
mation and conversation flow control, allowing agents to adapt their collaboration strategies
based on task requirements and ongoing progress. (Sections 2.3.2 to 2.3.4)
3. Integration of heterogeneous agents: IoA provides a flexible protocol for integrating various
third-party agents, expanding the diversity of agent capabilities within the system. (Section 2.4)
At its core, IoA consists of two main components: the server and the client. The server acts as a
central hub, managing agent registration, discovery, and message routing. It enables agents with
varying capabilities to find each other and initiate communication. The client, on the other hand,
serves as a wrapper for individual agents, providing them with the necessary communication func-
tionalities and adapting them to the specified protocol. IoA employs a layered architecture (Bass
et al., 1999) for both the server and client components, comprising three layers:
• Interaction Layer: Facilitates team formation and agent communication.
• Data Layer: Manages information related to agents, group chats, and tasks.
• Foundation Layer: Provides essential infrastructure for agent integration, data management,
and network communication.
These layers work together to facilitate agent collaboration through the network. In the following
subsections, we will go through the IoA’s architecture and design.
2.2 ARCHITECTURE OF IOA
The layered architecture of IoA is designed to support scalable, flexible, and efficient multi-agent
collaboration. This architecture enables a clear separation of concerns and facilitates the integration
of diverse agents and functionalities (Fig. 1).
2.2.1 SERVER ARCHITECTURE
The server acts as the central hub of IoA, facilitating agent discovery, group formation, and message
routing. Its architecture consists of three layers:
Interaction Layer: At the top level, the Interaction Layer manages high-level interactions between
agents and the system. It encompasses the Agent Query Block for enabling agents to search for
other agents based on specific characteristics, the Group Setup Block for facilitating the creation
and management of group chats, and the Message Routing Block for ensuring efficient and accurate
routing of messages between agents and group chats.
3
Work in progress
Data Layer: Serving as the information backbone, the Data Layer handles the storage and manage-
ment of critical system information. The Agent Registry Block maintains a comprehensive database
of registered agents, including their capabilities and current status, similar to service discovery in
distributed systems (Meshkova et al., 2008; Netflix). Meanwhile, the Session Management Block
manages active connections and ensures continuous communication between the server and con-
nected clients.
Foundation Layer: Underpinning the entire system, the Foundation Layer provides the essential
infrastructure for the server’s operations. It encompasses the Data Infrastructure Block for handling
data persistence and retrieval, the Network Infrastructure Block for managing network communi-
cations, and the Security Block for implementing authentication, authorization, and other security
measures to maintain system integrity.
2.2.2 CLIENT ARCHITECTURE
The client component of IoA serves as a wrapper for individual agents, providing them with the
necessary interfaces to communicate within the system. Its architecture mirrors that of the server
with three layers:
Interaction Layer: At the forefront of agent operations, the Interaction Layer manages the agent’s
interactions within the system. The Team Formation Block implements the logic for identifying
suitable collaborators and forming teams for the task at hand, similar to coalition formation in con-
ventional multi-agent research (Rahwan et al., 2009). Complementing this, the Communication
Block manages the agent’s participation in group chats and handles message processing.
Data Layer: Functioning as the agent’s memory, the Data Layer maintains local data relevant to the
agent’s operations. It includes the Agent Contact Block for storing information about other agents
the current agent has interacted with, the Group Info Block for maintaining details about ongoing
group chats and collaborations, and the Task Management Block for tracking the status and progress
of tasks assigned to the agent.
Foundation Layer: Forming the base of the client architecture, the Foundation Layer provides the
basic functionalities for the client’s operations. The Agent Integration Block defines the protocols
and interfaces for integrating third-party agents into the IoA ecosystem. Alongside this, the Data
Infrastructure Block handles local data storage and retrieval, while the Network Infrastructure Block
manages network communications with the server.
This layered architecture enables IoA to support a wide range of agent types and collaboration sce-
narios. By providing a clear separation of concerns and well-defined interfaces between layers, the
architecture facilitates the integration of diverse agents and allows for future extensibility. Further-
more, this design supports the key mechanisms of IoA, such as autonomous team formation and
conversation flow control, which we will explore in detail in the following subsections.
2.3 KEY MECHANISMS
The effectiveness of IoA relies on several key mechanisms that enable seamless collaboration among
diverse agents. These mechanisms work in concert to facilitate agent integration, team formation,
task allocation, and structured communication. We detail these critical components in this section.
2.3.1 AGENT REGISTRATION AND DISCOVERY
To enable collaboration among distributed agents with heterogeneous architectures, tools, and en-
vironments, we propose the agent registration and discovery mechanism. This mechanism forms
the foundation for collaborative interactions within IoA, enabling the integration of diverse agents
into the system and facilitating their discovery on the online server by other agents for potential
collaboration through the network.
Agent Registration: When a new agent joins the IoA, its client wrapper undergoes a registration
process with the server. During registration, the agent should provide a comprehensive description
of its capabilities, skills, and areas of expertise. This description, denoted as d
i
for an agent c
i
, is
stored in the Agent Registry Block of the server’s Data Layer. Formally, we represent the set of all
registered agents as C = {c
1
, c
2
, ..., c
n
}, where each c
i
is associated with its description d
i
.
4
Work in progress
Agent Discovery: The agent discovery function leverages the information stored in the Agent Reg-
istry from the online server to enable agents to find suitable collaborators for specific tasks. When an
agent needs to form a team or seek assistance, it can use the search client tool provided by the
server’s Agent Query Block. This tool allows an agent to search for other agents based on desired
characteristics or capabilities. Formally, the agent discovery process can be described as follows: Let
L
d
= [l
1
, l
2
, ..., l
k
] be a list of desired characteristics generated by an agent seeking collaborators.
The search client function can be represented as: search client : L
d
→ P(C), where
P(C) denotes the power set of C. The function returns a subset of clients C
d
⊆ C whose descriptions
d
j
match the desired characteristics in L
d
. The matching process between L
d
and d
j
can be imple-
mented with various semantic matching techniques (Robertson & Zaragoza, 2009; Karpukhin et al.,
2020). It ensures that agents with relevant capabilities can be discovered even if their descriptions
do not exactly match the search criteria.
2.3.2 AUTONOMOUS NESTED TEAM FORMATION
The autonomous nested team formation mechanism enables dynamic and flexible combinations of
appropriate agents. This mechanism allows agents to form teams adaptively based on task require-
ments and to create nested sub-teams for complex, multi-faceted tasks.
Team Formation Process: When a client c
i
∈ C is assigned a task t, it initiates the team formation
process. The client has access to two essential tools provided by the server: search client
and launch group chat. The LLM in the client is prompted to decide which tool to call based
on the task and the current set of discovered clients. If more collaborators are needed, it calls
search client with appropriate characteristics. Once suitable collaborators are found, it calls
launch group chat to initiate a new group chat g ∈ G, where G is the space of all group chats.
Nested Team Structure: The nested team formation allows for a hierarchical structure of teams and
sub-teams. Let g
0
∈ G be the initial group chat for task t. During the execution of t, if a client c
i
is assigned with a sub-task t
l
(the task assignment mechanism will be introduced in Section 2.3.4),
and it identifies t
l
requires additional expertise, c
i
is allowed to search for appropriate agents again
and initiate a new sub-group chat g
l
∈ G. This process can continue recursively for the new sub-
tasks assigned in g
l
, forming a tree-like structure of group chats. Formally, we can define a function
h : G → P(G) that maps a group chat to its set of sub-group chats. The nested structure can be
represented as: h(g
0
) = {g
1
, g
2
, ..., g
m
}, h(g
i
) = {g
i1
, g
i2
, ..., g
in
}, and so on.
Sub-Task: Market Research
and Data Collection
Assignee:
GoogleAgent
Nested Team Formation?
✅
Overall Goal: Create a Comprehensive
Market Analysis Report for iPhone 15.
Agents:
GoogleAgent,
✍
ReportWritingAgent
Sub-Task: Data Analysis
and Visualization
Assignee:
✍
ReportWritingAgent
Nested Team Formation?
✅
Sub-Task: Report Writing
Assignee:
✍
ReportWritingAgent
Nested Team Formation?
❌
Sub-Task: Competitor Analysis
Agents:
GoogleAgent
Nested Team Formation?
❌
Sub-Task: Customer Analysis
Agents:
MarketAPIAgent
Nested Team Formation?
❌
Chat
Tasks
Overall Goal: Market
Research and Data Collection
Agents:
GoogleAgent,
MarketAPIAgent
Chat
Tasks
……
Figure 2: An example of nested team formation
mechanism. The process is simplified for clarity.
Communication Complexity: The nested
team formation mechanism helps reduce com-
munication complexity in large agent teams.
Assuming fully connected communication
within each group, the number of communi-
cation channels (connected edges) in a single
group with |g| members is c
full
=
|g |(|g|−1)
2
.
However, by decomposing a task into sub-tasks
and allocating them to sub-group chats, we
can reduce the total number of communication
channels. Let S(g) denote the set of all sub-
groups (including g itself) formed for a task ini-
tially assigned to group g. The total number of
communication channels can then be expressed
as: c
nested
=
P
g
i
∈S(g)
|g
i
|(|g
i
|−1)
2
≤ c
full
.
Fig. 2 illustrates an example of the nested team
formation process. In this example, the initial
group chat g
0
spawns three sub-group chats g
1
, g
2
and g
3
for specific sub-tasks during the discussion.
g
1
further creates two sub-group chats g
21
and g
22
for a more specialized sub-task.
2.3.3 AUTONOMOUS CONVERSATION FLOW CONTROL
Effective communication is crucial for successful collaboration among autonomous agents. Inspired
by Speech Act Theory (Austin, 1975; Searle, 1969) and its applications in multi-agent systems (Finin
5
剩余30页未读,继续阅读
资源评论
pk_xz123456
- 粉丝: 2601
- 资源: 3661
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于51单片机开发板设计的六位密码锁
- course_s5_linux应用程序开发篇.pdf
- course_s4_ALINX_ZYNQ_MPSoC开发平台Linux驱动教程V1.04.pdf
- course_s0_Xilinx开发环境安装教程.pdf
- 多边形框架物体检测20-YOLO(v5至v11)、COCO、CreateML、Paligemma、TFRecord、VOC数据集合集.rar
- course_s1_ALINX_ZYNQ_MPSoC开发平台FPGA教程V1.01.pdf
- course_s3_ALINX_ZYNQ_MPSoC开发平台Linux基础教程V1.05.pdf
- rwer456456567567
- AXU2CGB-E开发板用户手册.pdf
- 数据库设计与关系理论-C.J.+Date.epub
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功