AGENTIC RETRIEVAL-AUGMENTED GENERATION: A SURVEY ON AGENTIC RAG
智能体增强检索生成:智能体RAG综述
Abul Ehtesham The Davey Tree Expert Company Kent, OH, USA abul.ehtesham@davey.com
Abul Ehtesham The Davey Tree Expert Company 美国俄亥俄州肯特市 abul.ehtesham@davey.com
ABSTRACT
摘要
Large Language Models (LLMs) have revolutionized artificial intelligence (AI) by enabling humanlike text generation and natural language understanding. However, their reliance on static training data limits their ability to respond to dynamic, real-time queries, resulting in outdated or inaccurate outputs. Retrieval-Augmented Generation (RAG) has emerged as a solution, enhancing LLMs by integrating real-time data retrieval to provide con textually relevant and up-to-date responses. Despite its promise, traditional RAG systems are constrained by static workflows and lack the adaptability required for multi-step reasoning and complex task management.
大语言模型 (LLMs) 通过实现类人文本生成和自然语言理解,彻底改变了人工智能 (AI)。然而,它们对静态训练数据的依赖限制了其响应动态、实时查询的能力,导致输出过时或不准确。检索增强生成 (RAG) 作为一种解决方案应运而生,通过集成实时数据检索来增强大语言模型,以提供上下文相关且最新的响应。尽管其前景广阔,但传统的 RAG 系统受限于静态工作流程,缺乏多步推理和复杂任务管理所需的适应性。
Agentic Retrieval-Augmented Generation (Agentic RAG) transcends these limitations by embedding autonomous AI agents into the RAG pipeline. These agents leverage agentic design patterns reflection, planning, tool use, and multi-agent collaboration to dynamically manage retrieval strategies, iterative ly refine contextual understanding, and adapt workflows through clearly defined operational structures ranging from sequential steps to adaptive collaboration. This integration enables Agentic RAG systems to deliver unparalleled flexibility, s cal ability, and context-awareness across diverse applications.
AI智能体增强检索生成 (Agentic RAG) 通过将自主AI智能体嵌入RAG管道,超越了这些限制。这些智能体利用智能体设计模式(如反思、规划、工具使用和多智能体协作)来动态管理检索策略,迭代地优化上下文理解,并通过从顺序步骤到自适应协作的明确操作结构来调整工作流程。这种集成使Agentic RAG系统能够在各种应用中提供无与伦比的灵活性、可扩展性和上下文感知能力。
This survey provides a comprehensive exploration of Agentic RAG, beginning with its foundational principles and the evolution of RAG paradigms. It presents a detailed taxonomy of Agentic RAG architectures, highlights key applications in industries such as healthcare, finance, and education, and examines practical implementation strategies. Additionally, it addresses challenges in scaling these systems, ensuring ethical decision-making, and optimizing performance for real-world applications, while providing detailed insights into frameworks and tools for implementing Agentic RAG 1. The GitHub link for this survey is available at: https://github.com/asinghcsu/AgenticRAG-Survey.
本综述全面探讨了Agentic RAG,从其基本原理和RAG范式的演变开始。它详细介绍了Agentic RAG架构的分类,突出了在医疗、金融和教育等行业中的关键应用,并探讨了实际实施策略。此外,它还讨论了在扩展这些系统、确保道德决策以及优化实际应用性能方面面临的挑战,同时提供了关于实现Agentic RAG的框架和工具的详细见解。本综述的GitHub链接为:https://github.com/asinghcsu/AgenticRAG-Survey。
Keywords Large Language Models (LLMs) $\cdot$ Artificial Intelligence (AI) $\cdot$ Natural Language Understanding · Retrieval-Augmented Generation (RAG) $\cdot$ Agentic RAG $\cdot$ Autonomous AI Agents $\cdot$ Reflection $\cdot$ Planning $\cdot$ Tool Use $\cdot$ Multi-Agent Collaboration $\cdot$ Agentic Patterns $\cdot$ Contextual Understanding $\cdot$ Dynamic Adaptability $\cdot$ S cal ability $\cdot$ Real-Time Data Retrieval $\cdot$ Taxonomy of Agentic RAG $\cdot$ Healthcare Applications $\cdot$ Finance Applications $\cdot$ Educational Applications $\cdot$ Ethical AI Decision-Making $\cdot$ Performance Optimization $\cdot$ Multi-Step Reasoning
关键词 大语言模型 (LLMs) $\cdot$ 人工智能 (AI) $\cdot$ 自然语言理解 $\cdot$ 检索增强生成 (RAG) $\cdot$ Agentic RAG $\cdot$ 自主AI智能体 $\cdot$ 反思 $\cdot$ 规划 $\cdot$ 工具使用 $\cdot$ 多智能体协作 $\cdot$ Agentic模式 $\cdot$ 上下文理解 $\cdot$ 动态适应性 $\cdot$ 可扩展性 $\cdot$ 实时数据检索 $\cdot$ Agentic RAG分类 $\cdot$ 医疗应用 $\cdot$ 金融应用 $\cdot$ 教育应用 $\cdot$ 伦理AI决策 $\cdot$ 性能优化 $\cdot$ 多步推理
1 Introduction
1 引言
Large Language Models (LLMs) [1, 2] [3], such as OpenAI’s GPT-4, Google’s PaLM, and Meta’s LLaMA, have significantly transformed artificial intelligence (AI) with their ability to generate human-like text and perform complex natural language processing tasks. These models have driven innovation across diverse domains, including conversational agents [4], automated content creation, and real-time translation. Recent advancements have extended their capabilities to multimodal tasks, such as text-to-image and text-to-video generation [5], enabling the creation and editing of videos and images from detailed prompts [6], which broadens the potential applications of generative AI.
大语言模型 (LLMs) [1, 2] [3],如 OpenAI 的 GPT-4、Google 的 PaLM 和 Meta 的 LLaMA,凭借其生成类人文本和执行复杂自然语言处理任务的能力,显著改变了人工智能 (AI)。这些模型推动了多个领域的创新,包括对话代理 [4]、自动化内容创作和实时翻译。最近的进展将这些模型的能力扩展到多模态任务,如文本到图像和文本到视频生成 [5],使得通过详细提示创建和编辑视频及图像成为可能 [6],从而拓宽了生成式 AI 的潜在应用。
Despite these advancements, LLMs face significant limitations due to their reliance on static pre-training data. This reliance often results in outdated information, hallucinated responses [7], and an inability to adapt to dynamic, real-world scenarios. These challenges emphasize the need for systems that can integrate real-time data and dynamically refine responses to maintain contextual relevance and accuracy.
尽管取得了这些进展,大语言模型 (LLM) 由于依赖静态的预训练数据,仍面临显著的局限性。这种依赖通常导致信息过时、虚构的响应 [7],以及无法适应动态的现实世界场景。这些挑战强调了需要能够整合实时数据并动态优化响应的系统,以保持上下文相关性和准确性。
Retrieval-Augmented Generation (RAG) [8, 9] emerged as a promising solution to these challenges. By combining the generative capabilities of LLMs with external retrieval mechanisms [10], RAG systems enhance the relevance and timeliness of responses. These systems retrieve real-time information from sources such as knowledge bases [11], APIs, or the web, effectively bridging the gap between static training data and the demands of dynamic applications. However, traditional RAG workflows remain limited by their linear and static design, which restricts their ability to perform complex multi-step reasoning, integrate deep contextual understanding, and iterative ly refine responses.
检索增强生成 (Retrieval-Augmented Generation, RAG) [8, 9] 作为应对这些挑战的有力解决方案应运而生。通过将大语言模型的生成能力与外部检索机制 [10] 相结合,RAG 系统提升了响应的相关性和时效性。这些系统从知识库 [11]、API 或网络等来源检索实时信息,有效地弥合了静态训练数据与动态应用需求之间的差距。然而,传统的 RAG 工作流程仍受限于其线性和静态的设计,这限制了其执行复杂多步推理、整合深度上下文理解以及迭代优化响应的能力。
The evolution of agents [12] has significantly enhanced the capabilities of AI systems. Modern agents, including LLM-powered and mobile agents [13], are intelligent entities capable of perceiving, reasoning, and autonomously executing tasks. These agents leverage agentic patterns, such as reflection [14], planning [15], tool use, and multi-agent collaboration [16], to enhance decision-making and adaptability.
智能体 (agents) 的演进 [12] 显著提升了 AI 系统的能力。现代智能体,包括基于大语言模型的智能体和移动智能体 [13],是能够感知、推理并自主执行任务的智能实体。这些智能体利用反思 [14]、规划 [15]、工具使用和多智能体协作 [16] 等智能体模式来增强决策能力和适应性。
Furthermore, these agents employ agentic workflow patterns [12, 13], such as prompt chaining, routing, parallel iz ation, orchestrator-worker models, and evaluator-optimizer , to structure and optimize task execution. By integrating these patterns, Agentic RAG systems can efficiently manage dynamic workflows and address complex problem-solving scenarios. The convergence of RAG and agentic intelligence has given rise to Agentic Retrieval-Augmented Generation (Agentic RAG) [14], a paradigm that integrates agents into the RAG pipeline. Agentic RAG enables dynamic retrieval strategies, contextual understanding, and iterative refinement [15], allowing for adaptive and efficient information processing. Unlike traditional RAG, Agentic RAG employs autonomous agents to orchestrate retrieval, filter relevant information, and refine responses, excelling in scenarios requiring precision and adaptability. The overview of Agentic RAG is in figure 1.
此外,这些智能体采用了智能工作流模式 [12, 13],例如提示链、路由、并行化、协调者-工作者模型和评估者-优化者,以构建和优化任务执行。通过整合这些模式,Agentic RAG 系统能够高效管理动态工作流并应对复杂的解决问题的场景。RAG 和智能体智能的融合催生了 Agentic Retrieval-Augmented Generation (Agentic RAG) [14],这是一种将智能体整合到 RAG 管道中的范式。Agentic RAG 实现了动态检索策略、上下文理解和迭代优化 [15],从而实现了自适应和高效的信息处理。与传统的 RAG 不同,Agentic RAG 利用自主智能体来协调检索、过滤相关信息并优化响应,在需要精确性和适应性的场景中表现出色。Agentic RAG 的概述见图 1。
This survey explores the foundational principles, taxonomy, and applications of Agentic RAG. It provides a comprehensive overview of RAG paradigms, such as Naïve RAG, Modular RAG, and Graph RAG [16], alongside their evolution into Agentic RAG systems. Key contributions include a detailed taxonomy of Agentic RAG frameworks, applications across domains such as healthcare [17, 18], finance, and education [19], and insights into implementation strategies, benchmarks, and ethical considerations.
本调查探讨了Agentic RAG的基础原理、分类及应用。它全面概述了RAG范式,如Naïve RAG、Modular RAG和Graph RAG [16],以及它们如何演变为Agentic RAG系统。主要贡献包括Agentic RAG框架的详细分类、跨领域的应用(如医疗 [17, 18]、金融和教育 [19]),以及对实施策略、基准和伦理考量的深入见解。
The structure of this paper is as follows: Section 2 introduces RAG and its evolution, highlighting the limitations of traditional approaches. Section 3 elaborates on the principles of agentic intelligence and agentic patterns. Section 4 elaborates agentic workflow patterns. Section 5 provides a taxonomy of Agentic RAG systems, including single-agent, multi-agent, and graph-based frameworks. Section 6 examines applications of Agentic RAG, while Section 7 discusses implementation tools and frameworks. Section 8 focuses on benchmarks and dataset, and Section 9 concludes with future directions for Agentic RAG systems.
本文结构如下:第 2 节介绍 RAG 及其演进,强调传统方法的局限性。第 3 节详细阐述智能体智能 (Agentic Intelligence) 和智能体模式 (Agentic Patterns) 的原则。第 4 节详细阐述智能体工作流模式。第 5 节提供了智能体 RAG 系统的分类,包括单智能体、多智能体和基于图的框架。第 6 节探讨智能体 RAG 的应用,而第 7 节讨论实现工具和框架。第 8 节重点介绍基准测试和数据集,第 9 节总结智能体 RAG 系统的未来方向。
2 Foundations of Retrieval-Augmented Generation
2 检索增强生成的基础
2.1 Overview of Retrieval-Augmented Generation (RAG)
2.1 检索增强生成 (Retrieval-Augmented Generation, RAG) 概述
Retrieval-Augmented Generation (RAG) represents a significant advancement in the field of artificial intelligence, combining the generative capabilities of Large Language Models (LLMs) with real-time data retrieval. While LLMs have demonstrated remarkable capabilities in natural language processing, their reliance on static pre-trained data often results in outdated or incomplete responses. RAG addresses this limitation by dynamically retrieving relevant information from external sources and incorporating it into the generative process, enabling con textually accurate and up-to-date outputs.
检索增强生成 (RAG) 代表着人工智能领域的重大进步,它将大语言模型 (LLMs) 的生成能力与实时数据检索相结合。虽然 LLMs 在自然语言处理方面展示了卓越的能力,但它们对静态预训练数据的依赖常常导致过时或不完整的响应。RAG 通过动态地从外部来源检索相关信息并将其整合到生成过程中,解决了这一限制,从而实现了上下文准确且最新的输出。
Figure 1: An Overview of Agentic RAG
图 1: Agentic RAG 概览
2.2 Core Components of RAG
2.2 RAG的核心组件
The architecture of RAG systems integrates three primary components (Figure2):
RAG系统的架构集成了三个主要组件(图2):
2.3 Evolution of RAG Paradigms
2.3 RAG范式的演变
The field of Retrieval-Augmented Generation (RAG) has evolved significantly to address the increasing complexity of real-world applications, where contextual accuracy, s cal ability, and multi-step reasoning are critical. What began as simple keyword-based retrieval has transitioned into sophisticated, modular, and adaptive systems capable of integrating diverse data sources and autonomous decision-making processes. This evolution underscores the growing need for RAG systems to handle complex queries efficiently and effectively.
检索增强生成 (Retrieval-Augmented Generation, RAG) 领域已经显著发展,以应对现实应用中日益增长的复杂性,其中上下文准确性、可扩展性和多步推理至关重要。从最初基于关键词的简单检索,RAG 已转变为能够整合多样化数据源和自主决策过程的复杂、模块化和自适应系统。这一演变凸显了 RAG 系统高效处理复杂查询的日益增长的需求。
Figure 2: Core Components of RAG
图 2: RAG 的核心组件
This section examines the progression of RAG paradigms, presenting key stages of development—Naïve RAG, Advanced RAG, Modular RAG, Graph RAG, and Agentic RAG alongside their defining characteristics, strengths, and limitations. By understanding the evolution of these paradigms, readers can appreciate the advancements made in retrieval and generative capabilities and their application in various domains
本节探讨了 RAG (Retrieval-Augmented Generation) 范式的发展历程,介绍了其关键发展阶段——Naïve RAG、Advanced RAG、Modular RAG、Graph RAG 和 Agentic RAG,以及它们的特点、优势和局限性。通过了解这些范式的演变,读者可以更好地理解检索和生成能力的进步及其在各个领域的应用。
2.3.1 Naïve RAG
2.3.1 朴素 RAG
Naïve RAG [20] represents the foundational implementation of retrieval-augmented generation. Figure 3 illustrates the simple retrieve-read workflow of Naive RAG, focusing on keyword-based retrieval and static datasets.. These systems rely on simple keyword-based retrieval techniques, such as TF-IDF and BM25, to fetch documents from static datasets. The retrieved documents are then used to augment the language model’s generative capabilities.
Naïve RAG [20] 代表了检索增强生成的基础实现。图 3 展示了 Naive RAG 的简单检索-阅读工作流程,重点关注基于关键字的检索和静态数据集。这些系统依赖于简单的基于关键字的检索技术,例如 TF-IDF 和 BM25,从静态数据集中获取文档。检索到的文档随后用于增强语言模型的生成能力。
Figure 3: An Overview of Naive RAG.
图 3: Naive RAG 概述
Naïve RAG is characterized by its simplicity and ease of implementation, making it suitable for tasks involving fact-based queries with minimal contextual complexity. However, it suffers from several limitations:
朴素 RAG 因其简单易实现的特点,适合处理基于事实且上下文复杂度较低的查询任务。然而,它也存在一些局限性:
Despite these limitations, Naïve RAG systems provided a critical proof-of-concept for integrating retrieval with generation, laying the foundation for more sophisticated paradigms.
尽管存在这些限制,Naïve RAG 系统为将检索与生成结合提供了重要的概念验证,为更复杂的范式奠定了基础。
2.3.2 Advanced RAG
2.3.2 高级 RAG
Advanced RAG [20] systems build upon the limitations of Naïve RAG by incorporating semantic understanding and enhanced retrieval techniques. Figure 4 highlights the semantic enhancements in retrieval and the iterative, contextaware pipeline of Advanced RAG. These systems leverage dense retrieval models, such as Dense Passage Retrieval (DPR), and neural ranking algorithms to improve retrieval precision.
高级 RAG [20] 系统在朴素 RAG 的局限性基础上,通过结合语义理解和增强的检索技术进行了改进。图 4 展示了高级 RAG 在检索中的语义增强以及迭代、上下文感知的流程。这些系统利用密集检索模型(如 Dense Passage Retrieval (DPR))和神经排序算法来提高检索精度。
Figure 4: Overview of Advanced RAG
图 4: 高级 RAG 概览
Key features of Advanced RAG include:
高级 RAG 的关键特性包括:
These advancements make Advanced RAG suitable for applications requiring high precision and nuanced understanding, such as research synthesis and personalized recommendations. However, challenges such as computational overhead and limited s cal ability persist, particularly when dealing with large datasets or multi-step queries.
这些进步使得 Advanced RAG 适用于需要高精度和细致理解的应用,例如研究综合和个性化推荐。然而,计算开销和有限的可扩展性等挑战仍然存在,尤其是在处理大型数据集或多步查询时。
2.3.3 Modular RAG
2.3.3 模块化RAG
Modular RAG [20] represents the latest evolution in RAG paradigms, emphasizing flexibility and customization. These systems decompose the retrieval and generation pipeline into independent, reusable components, enabling domain-specific optimization and task adaptability. Figure 5 demonstrates the modular architecture, showcasing hybrid retrieval strategies, composable pipelines, and external tool integration.
模块化RAG [20]代表了RAG范式的最新演进,强调灵活性和可定制性。这些系统将检索和生成流水线分解为独立的、可重用的组件,从而实现领域特定的优化和任务适应性。图5展示了模块化架构,展示了混合检索策略、可组合的流水线以及外部工具集成。
Key innovations in Modular RAG include:
模块化 RAG 的关键创新包括:
For instance, a Modular RAG system designed for financial analytics might retrieve live stock prices via APIs, analyze historical trends using dense retrieval, and generate actionable investment insights through a tailored language model. This modularity and customization make Modular RAG ideal for complex, multi-domain tasks, offering both s cal ability and precision.
例如,专为金融分析设计的模块化 RAG 系统可以通过 API 获取实时股票价格,使用密集检索分析历史趋势,并通过定制的语言模型生成可操作的投资见解。这种模块化和定制化使模块化 RAG 成为处理复杂、多领域任务的理想选择,既具备可扩展性又保证了精确性。
Figure 5: Overview of Modular RAG
图 5: 模块化 RAG 概览
2.3.4 Graph RAG
2.3.4 Graph RAG
Graph RAG [16] extends traditional Retrieval-Augmented Generation systems by integrating graph-based data structures as illustrated in Figure 6. These systems leverage the relationships and hierarchies within graph data to enhance multihop reasoning and contextual enrichment. By incorporating graph-based retrieval, Graph RAG enables richer and more accurate generative outputs, particularly for tasks requiring relational understanding.
Graph RAG [16] 通过集成基于图的数据结构扩展了传统的检索增强生成 (Retrieval-Augmented Generation) 系统,如图 6 所示。这些系统利用图数据中的关系和层次结构来增强多跳推理和上下文丰富性。通过引入基于图的检索,Graph RAG 能够生成更丰富、更准确的输出,特别是在需要关系理解的任务中。
Graph RAG is characterized by its ability to:
图 RAG 的特点在于其能够:
However, Graph RAG has some limitations:
然而,Graph RAG存在一些局限性:
Graph RAG is well-suited for applications such as healthcare diagnostics, legal research, and other domains where reasoning over structured relationships is crucial.
Graph RAG 非常适合应用于医疗诊断、法律研究以及其他需要对结构化关系进行推理的领域。
2.3.5 Agentic RAG
2.3.5 智能 RAG
Agentic RAG represents a paradigm shift by introducing autonomous agents capable of dynamic decision-making and workflow optimization. Unlike static systems, Agentic RAG employs iterative refinement and adaptive retrieval strategies to address complex, real-time, and multi-domain queries. This paradigm leverages the modularity of retrieval and generation processes while introducing agent-based autonomy.
Agentic RAG 通过引入能够动态决策和工作流优化的自主智能体,代表了一种范式转变。与静态系统不同,Agentic RAG 采用迭代优化和自适应检索策略来处理复杂、实时和多领域的查询。这种范式利用检索和生成过程的模块化,同时引入基于智能体的自主性。
Figure 6: Overview of Graph RAG
图 6: Graph RAG 概述
Key characteristics of Agentic RAG include:
Agentic RAG 的关键特点包括:
• Autonomous Decision-Making: Agents independently evaluate and manage retrieval strategies based on query complexity. • Iterative Refinement: Incorporates feedback loops to improve retrieval accuracy and response relevance. • Workflow Optimization: Dynamically orchestrates tasks, enabling efficiency in real-time applications.
• 自主决策:AI智能体根据查询复杂度独立评估和管理检索策略。
• 迭代优化:引入反馈循环以提高检索准确性和响应相关性。
• 工作流优化:动态协调任务,提升实时应用的效率。
Despite its advancements, Agentic RAG faces some challenges:
尽管取得了进展,AI智能体 RAG 仍面临一些挑战:
Agentic RAG excels in domains like customer support, financial analytics, and adaptive learning platforms, where dynamic adaptability and contextual precision are paramount.
Agentic RAG 在客户支持、金融分析和自适应学习平台等领域表现出色,这些领域对动态适应性和上下文精确性要求极高。
2.4 Challenges and Limitations of Traditional RAG Systems
2.4 传统RAG系统的挑战与局限
Traditional Retrieval-Augmented Generation (RAG) systems have significantly expanded the capabilities of Large Language Models (LLMs) by integrating real-time data retrieval. However, these systems still face critical challenges that hinder their effectiveness in complex, real-world applications. The most notable limitations revolve around contextual integration, multi-step reasoning, and s cal ability and latency issues.
传统检索增强生成 (Retrieval-Augmented Generation, RAG) 系统通过集成实时数据检索,显著扩展了大语言模型 (LLMs) 的能力。然而,这些系统在复杂的实际应用中仍然面临关键的挑战,阻碍了其有效性。最显著的局限性集中在上下文整合、多步推理以及可扩展性和延迟问题上。
2.4.1 Contextual Integration
2.4.1 上下文集成
Even when RAG systems successfully retrieve relevant information, they often struggle to seamlessly incorporate it into generated responses. The static nature of retrieval pipelines and limited contextual awareness lead to fragmented, inconsistent, or overly generic outputs.
即使 RAG 系统成功检索到相关信息,它们也经常难以将其无缝整合到生成的响应中。检索管道的静态特性和有限的上下文感知导致输出碎片化、不一致或过于通用。
Example: A query such as, "What are the latest advancements in Alzheimer’s research and their implications for early-stage treatment?" might yield relevant research papers and medical guidelines. However, traditional RAG systems often fail to synthesize these findings into a coherent explanation that connects the new treatments to specific patient scenarios. Similarly, for a query like, "What are the best sustainable practices for small-scale agriculture in arid regions?", traditional systems might retrieve documents on general agricultural methods but overlook critical sustainability practices tailored to arid environments.
例如,对于查询“阿尔茨海默病研究的最新进展及其对早期治疗的影响是什么?”,可能会检索到相关的研究论文和医疗指南。然而,传统的 RAG 系统通常无法将这些发现综合成一个连贯的解释,将新疗法与具体的患者情境联系起来。同样,对于查询“在干旱地区,小规模农业的最佳可持续实践是什么?”,传统系统可能会检索到关于一般农业方法的文档,但忽视了针对干旱环境量身定制的关键可持续实践。
Table 1: Comparative Analysis of RAG Paradigms
Paradigm | Key Features | Strengths |
Naive RAG | · Keyword-based retrieval (e.g., TF-IDF, BM25) | · Simple and easy to implement · Suitable for fact-based queries |
Advanced RAG | · Dense retrieval models (e.g., DPR) · Neural ranking and re-ranking · Multi-hop retrieval | · High precision retrieval · Improved contextual relevance |
Modular RAG | · Hybrid retrieval (sparse and dense) · Tool and API integration · Composable, domain-specific pipelines | · High flexibility and customization · Suitable for diverse applications · Scalable |
Graph RAG | · Integration of graph-based structures · Multi-hop reasoning · Contextual enrichment via nodes | · Relational reasoning capabilities · Mitigates hallucinations · Ideal for structured data tasks |
Agentic RAG | · Autonomous agents · Dynamic decision-making ·Iterative refinement and work- flow optimization | · Adaptable to real-time changes ·Scalable for multi-domain tasks · High accuracy |
表 1: RAG 范式对比分析
范式 | 关键特性 | 优势 |
---|---|---|
朴素 RAG (Naive RAG) | · 基于关键词的检索(例如 TF-IDF、BM25) | · 简单易实现 · 适合基于事实的查询 |
高级 RAG (Advanced RAG) | · 密集检索模型(例如 DPR) · 神经排序与重排序 · 多跳检索 | · 高精度检索 · 上下文相关性提升 |
模块化 RAG (Modular RAG) | · 混合检索(稀疏与密集) · 工具与 API 集成 · 可组合的领域特定管道 | · 高度灵活和可定制 · 适合多样化应用 · 可扩展 |
图 RAG (Graph RAG) | · 基于图的结构集成 · 多跳推理 · 通过节点进行上下文丰富 | · 关系推理能力 · 减少幻觉 · 适合结构化数据任务 |
智能体 RAG (Agentic RAG) | · 自主智能体 · 动态决策 · 迭代优化与工作流优化 | · 适应实时变化 · 多领域任务可扩展 · 高准确性 |
2.4.2 Multi-Step Reasoning
2.4.2 多步推理
Many real-world queries require iterative or multi-hop reasoning—retrieving and synthesizing information across multiple steps. Traditional RAG systems are often ill-equipped to refine retrieval based on intermediate insights or user feedback, resulting in incomplete or disjointed responses.
许多现实世界中的查询需要迭代或多步推理——在多个步骤中检索和综合信息。传统的RAG系统通常无法根据中间洞察或用户反馈来优化检索,导致响应不完整或不连贯。
Example: A complex query like, "What lessons from renewable energy policies in Europe can be applied to developing nations, and what are the potential economic impacts?" demands the orchestration of multiple types of information, including policy data, contextual iz ation for developing regions, and economic analysis. Traditional RAG systems typically fail to connect these disparate elements into a cohesive response.
示例:一个复杂的查询,如“欧洲可再生能源政策中的哪些经验可以应用于发展中国家,以及潜在的经济影响是什么?”需要协调多种类型的信息,包括政策数据、发展中国家的背景信息和经济分析。传统的 RAG 系统通常无法将这些不同的元素连接成一个连贯的响应。
2.4.3 S cal ability and Latency Issues
2.4.3 可扩展性与延迟问题
As the volume of external data sources grows, querying and ranking large datasets becomes increasingly computationally intensive. This results in significant latency, which undermines the system’s ability to provide timely responses in real-time applications.
随着外部数据源数量的增长,查询和排序大型数据集的计算需求日益增加。这导致了显著的延迟,从而削弱了系统在实时应用中提供及时响应的能力。
Example: In time-sensitive settings such as financial analytics or live customer support, delays caused by querying multiple databases or processing large document sets can hinder the system’s overall utility. For example, a delay in retrieving market trends during high-frequency trading could result in missed opportunities.
示例:在金融分析或实时客户支持等时间敏感的场景中,查询多个数据库或处理大量文档集导致的延迟可能会影响系统的整体效用。例如,在高频交易中,检索市场趋势的延迟可能会导致错失机会。
2.5 Agentic RAG: A Paradigm Shift
2.5 Agentic RAG: 范式转变
Traditional RAG systems, with their static workflows and limited adaptability, often struggle to handle dynamic, multistep reasoning and complex real-world tasks. These limitations have spurred the integration of agentic intelligence, resulting in Agentic RAG. By incorporating autonomous agents capable of dynamic decision-making, iterative reasoning, and adaptive retrieval strategies, Agentic RAG builds on the modularity of earlier paradigms while overcoming their inherent constraints. This evolution enables more complex, multi-domain tasks to be addressed with enhanced precision and contextual understanding, positioning Agentic RAG as a cornerstone for next-generation AI applications. In particular, Agentic RAG systems reduce latency through optimized workflows and refine outputs iterative ly, tackling the very challenges that have historically hindered traditional RAG’s s cal ability and effectiveness.
传统 RAG 系统由于其静态的工作流程和有限的适应性,往往难以处理动态、多步推理和复杂的现实任务。这些局限性推动了智能体(AI Agent)的集成,从而催生了 Agentic RAG。通过引入能够进行动态决策、迭代推理和自适应检索策略的自主智能体,Agentic RAG 在早期范式的模块化基础上,克服了其固有的限制。这一演进使得更复杂、多领域的任务能够以更高的精度和上下文理解得以解决,将 Agentic RAG 定位为下一代人工智能应用的基石。特别是,Agentic RAG 系统通过优化工作流程降低了延迟,并通过迭代优化输出,解决了历史上阻碍传统 RAG 扩展能力和有效性的关键挑战。
3 Core Principles and Background of Agentic Intelligence
3 AI智能体核心原则与背景
Agentic Intelligence forms the foundation of Agentic Retrieval-Augmented Generation (RAG) systems, enabling them to transcend the static and reactive nature of traditional RAG. By integrating autonomous agents capable of dynamic decision-making, iterative reasoning, and collaborative workflows, Agentic RAG systems exhibit enhanced adaptability and precision. This section explores the core principles underpinning agentic intelligence.
智能体智能是智能体检索增强生成(Agentic RAG)系统的基础,使其能够超越传统RAG的静态和反应性特性。通过集成能够动态决策、迭代推理和协作工作流的自主智能体,Agentic RAG系统展现出更强的适应性和精确性。本节探讨了支撑智能体智能的核心原则。
Components of an AI Agent. In essence, an AI agent comprises (Figure. 7):
AI智能体的组成部分。本质上,一个AI智能体包括(图 7):
Figure 7: An Overview of AI Agents
图 7: AI智能体概览
Agentic Patterns [25, 26] provide structured methodologies that guide the behavior of agents in Agentic RetrievalAugmented Generation (RAG) systems. These patterns enable agents to dynamically adapt, plan, and collaborate, ensuring that the system can handle complex, real-world tasks with precision and s cal ability. Four key patterns underpin agentic workflows:
智能体模式 [25, 26] 为智能体在智能检索增强生成 (RAG) 系统中的行为提供了结构化方法论。这些模式使智能体能够动态适应、规划和协作,确保系统能够精确且可扩展地处理复杂的现实任务。智能体工作流基于四大关键模式:
3.0.1 Reflection
3.0.1 反思
Reflection is a foundational design pattern in agentic workflows, enabling agents to iterative ly evaluate and refine their outputs. By incorporating self-feedback mechanisms, agents can identify and address errors, inconsistencies, and areas for improvement, enhancing performance across tasks like code generation, text production, and question answering ( as shown in Figure 8). In practical use, Reflection involves prompting an agent to critique its outputs for correctness, style, and efficiency, then incorporating this feedback into subsequent iterations. External tools, such as unit tests or web searches, can further enhance this process by validating results and highlighting gaps.
反思是智能工作流程中的一个基础设计模式,使智能体能够迭代地评估和改进其输出。通过引入自我反馈机制,智能体可以识别并纠正错误、不一致性以及改进的领域,从而提升代码生成、文本生成和问答等任务的表现(如图 8 所示)。在实际应用中,反思包括提示智能体对其输出的正确性、风格和效率进行批评,然后将这些反馈纳入后续迭代中。外部工具(如单元测试或网络搜索)可以通过验证结果和突出差距来进一步增强这一过程。
In multi-agent systems, Reflection can involve distinct roles, such as one agent generating outputs while another critiques them, fostering collaborative improvement. For instance, in legal research, agents can iterative ly refine responses by re-evaluating retrieved case law, ensuring accuracy and comprehensiveness. Reflection has demonstrated significant performance improvements in studies like Self-Refine [27], Reflexion [28], and CRITIC [23].
在多智能体系统中,反思可以涉及不同的角色,例如一个智能体生成输出,而另一个智能体对其进行批判,从而促进协作改进。例如,在法律研究中,智能体可以通过重新评估检索到的判例法来迭代地精炼响应,确保准确性和全面性。反思在诸如 Self-Refine [27]、Reflexion [28] 和 CRITIC [23] 等研究中展示了显著的性能提升。
Figure 8: An Overview of Agentic Self- Reflection
图 8: AI智能体自我反思概览
3.0.2 Planning
3.0.2 规划
Planning [24] is a key design pattern in agentic workflows that enables agents to autonomously decompose complex tasks into smaller, manageable subtasks. This capability is essential for multi-hop reasoning and iterative problem-solving in dynamic and uncertain scenarios as shown in Figure 9a.
规划 [24] 是AI智能体工作流中的关键设计模式,它使智能体能够自主地将复杂任务分解为更小、可管理的子任务。这种能力对于在动态和不确定场景中进行多跳推理和迭代问题解决至关重要,如图 9a 所示。
3.0.3 Tool Use
3.0.3 工具使用
Tool Use enables agents to extend their capabilities by interacting with external tools, APIs, or computational resources as illustrated in 9b. This pattern allows agents to gather information, perform computations, and manipulate data beyond their pre-trained knowledge. By dynamically integrating tools into workflows, agents can adapt to complex tasks and provide more accurate and con textually relevant outputs.
工具使用使AI智能体能够通过与外部工具、API或计算资源交互来扩展其能力,如图9b所示。该模式允许AI智能体收集信息、执行计算并处理超出其预训练知识范围的数据。通过将工具动态集成到工作流中,AI智能体可以适应复杂任务,并提供更准确且与上下文相关的输出。
Modern agentic workflows incorporate tool use for a variety of applications, including information retrieval, computational reasoning, and interfacing with external systems. The implementation of this pattern has evolved significantly with advancements like GPT-4’s function calling capabilities and systems capable of managing access to numerous tools. These developments facilitate sophisticated workflows where agents autonomously select and execute the most relevant tools for a given task.
现代智能体工作流结合了多种应用的工具使用,包括信息检索、计算推理以及与外部系统的接口。随着GPT-4的函数调用能力和能够管理众多工具访问的系统的进步,这种模式的实现已经显著发展。这些进展促进了复杂的工作流,其中智能体能够自主选择并执行最相关的工具来完成特定任务。
While tool use significantly enhances agentic workflows, challenges remain in optimizing the selection of tools, particularly in contexts with a large number of available options. Techniques inspired by retrieval-augmented generation (RAG), such as heuristic-based selection, have been proposed to address this issue.
尽管工具使用显著增强了AI智能体工作流,但在优化工具选择方面仍存在挑战,尤其是在可用选项众多的场景下。启发式选择等受检索增强生成(RAG)启发的技术已被提出以解决这一问题。
3.0.4 Multi-Agent
3.0.4 多智能体 (Multi-Agent)
Multi-agent collaboration [29] is a key design pattern in agentic workflows that enables task specialization and parallel processing. Agents communicate and share intermediate results, ensuring the overall workflow remains efficient and coherent. By distributing subtasks among specialized agents, this pattern improves the s cal ability and adaptability of complex workflows. Multi-agent systems allow developers to decompose intricate tasks into smaller, manageable subtasks assigned to different agents. This approach not only enhances task performance but also provides a robust framework for managing complex interactions. Each agent operates with its own memory and workflow, which can include the use of tools, reflection, or planning, enabling dynamic and collaborative problem-solving (see Figure 10).
多智能体协作 [29] 是智能体工作流中的关键设计模式,它实现了任务专业化和并行处理。智能体通过通信和共享中间结果,确保整体工作流保持高效和连贯。通过将子任务分配给专门的智能体,这种模式提高了复杂工作流的可扩展性和适应性。多智能体系统允许开发者将复杂任务分解为更小、更易于管理的子任务,并分配给不同的智能体。这种方法不仅提升了任务性能,还为管理复杂的交互提供了稳健的框架。每个智能体都有其独立的内存和工作流,其中可能包括工具使用、反思或规划,从而实现动态和协作式的问题解决(见图 10)。
Figure 9: Overview of Agentic Planning and Tool Use
图 9: AI智能体规划与工具使用概览
While multi-agent collaboration offers significant potential, it is a less predictable design pattern compared to more mature workflows like Reflection and Tool Use. Nevertheless, emerging frameworks such as AutoGen, Crew AI, and LangGraph are providing new avenues for implementing effective multi-agent solutions.
尽管多智能体协作具有巨大的潜力,但与更成熟的工作流程(如反思和工具使用)相比,它是一种不太可预测的设计模式。然而,新兴框架如 AutoGen、Crew AI 和 LangGraph 正在为实施有效的多智能体解决方案提供新的途径。
Figure 10: An Overview of MultiAgent
图 10: 多智能体概览
These design patterns form the foundation for the success of Agentic RAG systems. By structuring workflows—from simple, sequential steps to more adaptive, collaborative processes—these patterns enable systems to dynamically adapt their retrieval and generative strategies to the diverse and ever-changing demands of real-world environments. Leveraging these patterns, agents are capable of handling iterative, context-aware tasks that significantly exceed the capabilities of traditional RAG systems.
这些设计模式构成了Agentic RAG系统成功的基础。通过构建从简单、顺序的步骤到更具适应性、协作性的流程的工作流,这些模式使系统能够动态调整其检索和生成策略,以应对现实环境中多样且不断变化的需求。利用这些模式,AI智能体能够处理迭代的、上下文感知的任务,这些任务远超传统RAG系统的能力。
4 Agentic Workflow Patterns: Adaptive Strategies for Dynamic Collaboration
4 智能工作流模式:动态协作的自适应策略
4.1 Prompt Chaining: Enhancing Accuracy Through Sequential Processing
4.1 Prompt Chaining:通过顺序处理提升准确性
Prompt chaining [12, 13] decomposes a complex task into multiple steps, where each step builds upon the previous one. This structured approach improves accuracy by simplifying each subtask before moving forward. However, it may increase latency due to sequential processing.
提示链 (Prompt chaining) [12, 13] 将一个复杂任务分解为多个步骤,其中每个步骤都建立在前一个步骤的基础上。这种结构化方法通过简化每个子任务来提高准确性,但由于顺序处理,可能会增加延迟。
Figure 11: Illustration of Prompt Chaining Workflow
图 11: 提示链工作流程示意图
When to Use: This workflow is most effective when a task can be broken down into fixed subtasks, each contributing to the final output. It is particularly useful in scenarios where step-by-step reasoning enhances accuracy.
使用时机:当任务可以分解为固定的子任务,每个子任务都对最终输出有贡献时,此工作流程最为有效。在逐步推理能提高准确性的场景中尤为有用。
Example Applications:
示例应用:
• Generating marketing content in one language and then translating it into another while preserving nuances. • Structuring document creation by first generating an outline, verifying its completeness, and then developing the full text.
• 以一种语言生成营销内容,然后将其翻译成另一种语言,同时保留细微差别。 • 通过首先生成大纲、验证其完整性,然后开发全文来结构化文档创建。
4.2 Routing:Directing Inputs to Specialized Processes
4.2 路由:将输入定向到专用处理流程
Routing [12, 13] involves classifying an input and directing it to an appropriate specialized prompt or process. This method ensures distinct queries or tasks are handled separately, improving efficiency and response quality.
路由 (Routing) [12, 13] 涉及对输入进行分类并将其引导至适当的专用提示或处理过程。该方法确保不同的查询或任务被分开处理,从而提高效率和响应质量。
Figure 12: Illustration Routing Workflow
图 12: 路由工作流程示意图
When to Use: Ideal for scenarios where different types of input require distinct handling strategies, ensuring optimized performance for each category.
使用场景:适用于需要根据不同输入类型采取不同处理策略的场景,以确保每种类别的最佳性能。
Example Applications:
示例应用:
• Directing customer service queries into categories such as technical support, refund requests, or general inquiries. • Assigning simple queries to smaller models for cost efficiency, while complex requests go to advanced models.
• 将客户服务查询分类为技术支持、退款请求或一般咨询。• 将简单查询分配给较小的模型以节省成本,而复杂请求则交给高级模型处理。
4.3 Parallel iz ation: Speeding Up Processing Through Concurrent Execution
4.3 并行化:通过并发执行加速处理
Parallel iz ation [12, 13] divides a task into independent processes that run simultaneously, reducing latency and improving throughput. It can be categorized into sectioning (independent subtasks) and voting (multiple outputs for accuracy).
并行化 [12, 13] 将任务划分为同时运行的独立进程,从而减少延迟并提高吞吐量。它可以分为分段(独立的子任务)和投票(多个输出以提高准确性)。
Figure 13: Illustration of Parallel iz ation Workflow
图 13: 并行化工作流程示意图
When to Use: Useful when tasks can be executed independently to enhance speed or when multiple outputs improve confidence.
使用场景:适用于任务可以独立执行以提高速度,或多个输出可以提升置信度的情况。
Example Applications:
示例应用:
• Sectioning: Splitting tasks like content moderation, where one model screens input while another generates a response. • Voting: Using multiple models to cross-check code for vulnerabilities or analyze content moderation decisions.
• 分段处理:将任务拆分,例如内容审核,一个模型筛选输入,另一个生成响应。
• 投票机制:使用多个模型交叉检查代码漏洞或分析内容审核决策。
4.4 Orchestrator-Workers: Dynamic Task Delegation
4.4 协调器-工作者:动态任务委派
This workflow [12, 13] features a central orchestrator model that dynamically breaks tasks into subtasks, assigns them to specialized worker models, and compiles the results. Unlike parallel iz ation, it adapts to varying input complexity.
该工作流 [12, 13] 的核心是一个中央协调器模型,它动态地将任务分解为子任务,分配给专门的工人模型,并汇总结果。与并行化不同,它能适应不同的输入复杂度。
Figure 14: Illustration of Orchestrator-Workers Workflow
图 14: Orchestrator-Workers 工作流程示意图
When to Use: Best suited for tasks requiring dynamic decomposition and real-time adaptation, where subtasks are not predefined.
适用场景:最适合需要动态分解和实时适应的任务,其中子任务未预定义。
Example Applications:
示例应用:
• Automatically modifying multiple files in a codebase based on the nature of requested changes.
• 根据请求更改的性质自动修改代码库中的多个文件。
• Conducting real-time research by gathering and synthesizing relevant information from multiple sources.
• 通过从多个来源收集和综合相关信息进行实时研究。
4.5 Evaluator-Optimizer: Refining Output Through Iteration
4.5 评估器-优化器:通过迭代优化输出
The evaluator-optimizer [12, 13] workflow iterative ly improves content by generating an initial output and refining it based on feedback from an evaluation model.
评估器-优化器 [12, 13] 工作流程通过生成初始输出并根据评估模型的反馈进行迭代改进内容。
Figure 15: Illustration of Evaluator-Optimizer Workflow
图 15: 评估-优化工作流程示意图
When to Use: Effective when iterative refinement significantly enhances response quality, especially when clear evaluation criteria exist.
何时使用:当迭代优化显著提升响应质量时,尤其是在存在明确评估标准的情况下。
Example Applications:
示例应用:
4.6 Taxonomy of Agentic RAG Systems
4.6 AI智能体RAG系统分类
Agentic Retrieval-Augmented Generation (RAG) systems can be categorized into distinct architectural frameworks based on their complexity and design principles. These include single-agent architectures, multi-agent systems, and hierarchical agentic architectures. Each framework is tailored to address specific challenges and optimize performance for diverse applications. This section provides a detailed taxonomy of these architectures, highlighting their characteristics, strengths, and limitations.
基于代理的检索增强生成 (RAG) 系统可以根据其复杂性和设计原则分为不同的架构框架。这些框架包括单智能体架构、多智能体系统和分层代理架构。每个框架都是为了应对特定挑战并优化不同应用的性能而设计的。本节详细介绍了这些架构的分类,重点介绍了它们的特征、优势和局限性。
4.6.1 Single-Agent Agentic RAG: Router
4.6.1 单智能体 AI智能体 RAG: Router
A Single-Agent Agentic RAG: [30] serves as a centralized decision-making system where a single agent manages the retrieval, routing, and integration of information (as shown in Figure. 16). This architecture simplifies the system by consolidating these tasks into one unified agent, making it particularly effective for setups with a limited number of tools or data sources.
单智能体代理 RAG:[30] 作为一个集中决策系统,其中单个智能体管理信息的检索、路由和整合(如图 16 所示)。这种架构通过将这些任务整合到一个统一的智能体中,简化了系统,特别适用于工具或数据源有限的设置。
Workflow
工作流
- Data Integration and LLM Synthesis: Once the relevant data is retrieved from the chosen sources, it is passed to a Large Language Model (LLM). The LLM synthesizes the gathered information, integrating insights from multiple sources into a coherent and con textually relevant response.
数据集成与大语言模型合成:从选定来源检索到相关数据后,将其传递给大语言模型 (LLM)。LLM 对收集到的信息进行合成,将来自多个来源的见解整合成一个连贯且上下文相关的响应。
-
Output Generation: Finally, the system delivers a comprehensive, user-facing answer that addresses the original query. This response is presented in an actionable, concise format and may optionally include references or citations to the sources used.
-
输出生成:最后,系统生成一个全面的、面向用户的答案,以解决原始查询。该响应以可操作、简洁的格式呈现,并可选择包含对所使用来源的引用或引文。
Figure 16: An Overview of Single Agentic RAG
图 16: 单智能体 RAG 概述
Key Features and Advantages.
主要特性与优势
• Centralized Simplicity: A single agent handles all retrieval and routing tasks, making the architecture straightforward to design, implement, and maintain.
• 集中化简洁性:单个智能体处理所有检索和路由任务,使架构设计、实现和维护变得简单明了。
• Efficiency & Resource Optimization: With fewer agents and simpler coordination, the system demands fewer computational resources and can handle queries more quickly.
• 效率与资源优化:更少的智能体和更简单的协调意味着系统需要更少的计算资源,并且能够更快地处理查询。
• Dynamic Routing: The agent evaluates each query in real-time, selecting the most appropriate knowledge source (e.g., structured DB, semantic search, web search).
• 动态路由 (Dynamic Routing):AI智能体实时评估每个查询,选择最合适的知识源(例如,结构化数据库、语义搜索、网络搜索)。
• Versatility Across Tools: Supports a variety of data sources and external APIs, enabling both structured and unstructured workflows.
跨工具多用途性:支持多种数据源和外部 API,能够处理结构化和非结构化工作流。
• Ideal for Simpler Systems: Suited for applications with well-defined tasks or limited integration requirements (e.g., document retrieval, SQL-based workflows).
• 适用于简单系统:适合具有明确任务或有限集成需求的应用(例如,文档检索、基于SQL的工作流)。
Prompt: Can you tell me the delivery status of my order?
提示:您能告诉我我的订单配送状态吗?
System Process (Single-Agent Workflow):
系统进程(单智能体工作流):
1. Query Submission and Evaluation:
1. 查询提交与评估
• The user submits the query, which is received by the coordinating agent. • The coordinating agent analyzes the query and determines the most appropriate sources of information.
• 用户提交查询,协调AI智能体接收该查询。
• 协调AI智能体分析查询并确定最合适的信息来源。
2. Knowledge Source Selection:
- 知识源选择:
3. Data Integration and LLM Synthesis:
- 数据集成与大语言模型合成:
• The relevant data is passed to the LLM, which synthesizes the information into a coherent response.
• 相关数据被传递给大语言模型,由其将信息合成为连贯的响应。
4. Output Generation:
4. 输出生成:
• The system generates an actionable and concise response, providing live tracking updates and potential alternatives.
• 系统生成一个可操作且简洁的响应,提供实时跟踪更新和潜在的替代方案。
Response:
响应:
Integrated Response: “Your package is currently in transit and expected to arrive tomorrow evening. The live tracking from UPS indicates it is at the regional distribution center.”
集成响应:“您的包裹目前正在运输中,预计明晚到达。UPS的实时跟踪显示它已到达区域配送中心。”
4.7 Multi-Agent Agentic RAG Systems:
4.7 多智能体 AI智能体 RAG 系统:
Multi-Agent RAG [30] represents a modular and scalable evolution of single-agent architectures, designed to handle complex workflows and diverse query types by leveraging multiple specialized agents (as shown in Figure 17). Instead of relying on a single agent to manage all tasks—reasoning, retrieval, and response generation—this system distributes responsibilities across multiple agents, each optimized for a specific role or data source.
多智能体RAG [30] 代表了单智能体架构的模块化和可扩展演进,旨在通过利用多个专门化的智能体(如图 17 所示)来处理复杂的工作流和多样化的查询类型。该系统不再依赖单一智能体来管理所有任务(如推理、检索和响应生成),而是将职责分配给多个智能体,每个智能体都针对特定角色或数据源进行了优化。
Workflow
工作流
-
Query Submission: The process begins with a user query, which is received by a coordinator agent or master retrieval agent. This agent acts as the central orchestrator, delegating the query to specialized retrieval agents based on the query’s requirements.
-
查询提交:该过程从用户查询开始,协调代理或主检索代理接收查询。该代理作为中央协调者,根据查询需求将查询委派给专门的检索代理。
-
Specialized Retrieval Agents: The query is distributed among multiple retrieval agents, each focusing on a specific type of data source or task. Examples include:
-
专用检索智能体:查询被分配到多个检索智能体中,每个智能体专注于特定类型的数据源或任务。例如:
-
Tool Access and Data Retrieval: Each agent routes the query to the appropriate tools or data sources within its domain, such as:
-
工具访问与数据检索:每个AI智能体将查询路由到其领域内的适当工具或数据源,例如:
• Vector Search: For semantic relevance. • Text-to-SQL: For structured data. • Web Search: For real-time public information. • APIs: For accessing external services or proprietary systems
• 向量搜索 (Vector Search): 用于语义相关性。
• 文本到 SQL (Text-to-SQL): 用于结构化数据。
• 网络搜索 (Web Search): 用于实时公共信息。
• API (APIs): 用于访问外部服务或专有系统
The retrieval process is executed in parallel, allowing for efficient processing of diverse query types.
检索过程并行执行,支持高效处理多种查询类型。
Figure 17: An Overview of Multi-Agent Agentic RAG Systems
图 17: 多智能体代理 RAG 系统概览
- Data Integration and LLM Synthesis: Once retrieval is complete, the data from all agents is passed to a Large Language Model (LLM). The LLM synthesizes the retrieved information into a coherent and con textually relevant response, integrating insights from multiple sources seamlessly.
数据整合与大语言模型合成:检索完成后,来自所有智能体的数据将传递给大语言模型。大语言模型将检索到的信息合成为一个连贯且上下文相关的响应,无缝整合来自多个来源的见解。
- Output Generation: The system generates a comprehensive response, which is delivered back to the user in an actionable and concise format.
输出生成:系统生成一个全面的响应,并以可操作且简洁的格式返回给用户。
Key Features and Advantages.
关键特性和优势
Challenges
挑战
Prompt: What are the economic and environmental impacts of renewable energy adoption in Europe?
提示:欧洲采用可再生能源的经济和环境影响是什么?
System Process (Multi-Agent Workflow):
系统进程(多智能体工作流):
Response:
响应:
Integrated Response: “Adopting renewable energy in Europe has led to a $20%$ reduction in greenhouse gas emissions over the past decade, according to EU policy reports. Economically, renewable energy investments have generated approximately 1.2 million jobs, with significant growth in solar and wind sectors. Recent academic studies also highlight potential trade-offs in grid stability and energy storage costs.”
综合响应:"根据欧盟政策报告,欧洲采用可再生能源在过去十年中使温室气体排放减少了 20%。从经济角度来看,可再生能源投资创造了约 120 万个就业岗位,其中太阳能和风能领域增长显著。最近的学术研究还强调了电网稳定性和储能成本方面的潜在权衡。"
4.8 Hierarchical Agentic RAG Systems
4.8 分层AI智能体RAG系统
Hierarchical Agentic RAG: [14] systems employ a structured, multi-tiered approach to information retrieval and processing, enhancing both efficiency and strategic decision-making as shown in Figure 18. Agents are organized in a hierarchy, with higher-level agents overseeing and directing lower-level agents. This structure enables multi-level decision-making, ensuring that queries are handled by the most appropriate resources.
分层AI智能体RAG: [14] 系统采用结构化、多层次的信息检索和处理方法,提高了效率和战略决策能力,如图 18 所示。AI智能体按层次组织,高层AI智能体监督和指导低层AI智能体。这种结构实现了多层次决策,确保查询由最合适的资源处理。
Figure 18: An illustration of Hierarchical Agentic RAG
图 18: 分层代理 RAG 示意图
Workflow
工作流程
Key Features and Advantages.
关键特性与优势
Challenges
挑战
• Coordination Complexity: Maintaining robust inter-agent communication across multiple levels can increase orchestration overhead. • Resource Allocation: Efficiently distributing tasks among tiers to avoid bottlenecks is non-trivial.
• 协调复杂性:在多个层级间维持强大的智能体间通信会增加编排开销。
• 资源分配:高效地在各层级间分配任务以避免瓶颈并非易事。
Use Case: Financial Analysis System
用例:金融分析系统
Prompt: What are the best investment options given the current market trends in renewable energy?
提示:在当前可再生能源市场趋势下,最佳投资选择是什么?
System Process (Hierarchical Agentic Workflow):
系统流程(分层智能体工作流):
Response:
响应:
Integrated Response: “Based on current market data, renewable energy stocks have shown a $15%$ growth over the past quarter, driven by supportive government policies and heightened investor interest. Analysts suggest that wind and solar sectors, in particular, may experience continued momentum, while emerging technologies like green hydrogen present moderate risk but potentially high returns.”
综合回应:
“根据当前市场数据,受政府支持政策和投资者兴趣增加的推动,可再生能源股票在过去一个季度增长了15%。分析师指出,尤其是风能和太阳能行业可能会继续保持增长势头,而绿色氢等新兴技术虽然存在一定风险,但可能带来高回报。”
4.9 Agentic Corrective RAG
4.9 自主修正 RAG
Corrective RAG : introduces mechanisms to self-correct retrieval results, enhancing document utilization and improving response generation quality as demonstrated in Figure 19. By embedding intelligent agents into the workflow, Corrective RAG [31] [32] ensures iterative refinement of context documents and responses, minimizing errors and maximizing relevance.
Corrective RAG:引入自我纠正检索结果的机制,提高文档利用率和响应生成质量,如图 19 所示。通过将 AI智能体嵌入工作流,Corrective RAG [31] [32] 确保上下文文档和响应的迭代优化,最大限度地减少错误并提高相关性。
Key Idea of Corrective RAG: The core principle of Corrective RAG lies in its ability to evaluate retrieved documents dynamically, perform corrective actions, and refine queries to enhance the quality of generated responses. Corrective RAG adjusts its approach as follows:
Corrective RAG 的核心思想:Corrective RAG 的核心原则在于其能够动态评估检索到的文档,执行纠正操作,并优化查询以提高生成响应的质量。Corrective RAG 的调整方式如下:
Figure 19: Overview of Agentic Corrective RAG
图 19: Agentic Corrective RAG 概述
• Dynamic Retrieval from External Sources: When context is insufficient, the External Knowledge Retrieval Agent performs web searches or accesses alternative data sources to supplement the retrieved documents.
• 动态检索外部资源:当上下文信息不足时,外部知识检索智能体 (External Knowledge Retrieval Agent) 会执行网络搜索或访问其他数据源来补充检索到的文档。
• Response Synthesis: All validated and refined i