[论文翻译]GRAPHAGENT: 图语言助手智能体


原文地址:https://arxiv.org/pdf/2412.17029


GRAPHAGENT: AGENTIC GRAPH LANGUAGE ASSISTANT

GRAPHAGENT: 图语言助手智能体

ABSTRACT

摘要

Real-world data is represented in both structured (e.g., graph connections) and unstructured (e.g., textual, visual information) formats, encompassing complex relationships that include explicit links (such as social connections and user behaviors) and implicit interdependencies among semantic entities, often illustrated through knowledge graphs. In this work, we propose GraphAgent, an automated agent pipeline that addresses both explicit graph dependencies and implicit graphenhanced semantic inter-dependencies, aligning with practical data scenarios for predictive tasks (e.g., node classification) and generative tasks (e.g., text generation). GraphAgent comprises three key components: (i) a Graph Generator Agent that builds knowledge graphs to reflect complex semantic dependencies; (ii) a Task Planning Agent that interprets diverse user queries and formulates corresponding tasks through agentic self-planning; and (iii) a Task Execution Agent that efficiently executes planned tasks while automating tool matching and invocation in response to user queries. These agents collaborate seamlessly, integrating language models with graph language models to uncover intricate relational information and data semantic dependencies. Through extensive experiments on various graph-related predictive and text generative tasks on diverse datasets, we demonstrate the effectiveness of our GraphAgent across various settings. We have made our proposed GraphAgent open-source at: https://github.com/HKUDS/GraphAgent.

现实世界中的数据以结构化(如图形连接)和非结构化(如文本、视觉信息)格式表示,涵盖了包括显式链接(如社交连接和用户行为)和语义实体之间的隐式相互依赖关系的复杂关系,通常通过知识图谱进行说明。在本研究中,我们提出了GraphAgent,一个自动化智能体管道,旨在解决显式图形依赖和隐式图形增强的语义相互依赖关系,与实际数据场景中的预测任务(如节点分类)和生成任务(如文本生成)保持一致。GraphAgent由三个关键组件组成:(i) 一个图形生成智能体,用于构建知识图谱以反映复杂的语义依赖关系;(ii) 一个任务规划智能体,用于解释多样化的用户查询并通过智能体自我规划制定相应的任务;(iii) 一个任务执行智能体,用于高效执行计划任务,同时自动匹配和调用工具以响应用户查询。这些智能体无缝协作,将语言模型与图形语言模型集成,以揭示复杂的关系信息和数据语义依赖关系。通过对各种数据集上的图形相关预测和文本生成任务进行广泛实验,我们证明了GraphAgent在各种设置中的有效性。我们已将提出的GraphAgent开源,地址为:https://github.com/HKUDS/GraphAgent

1 INTRODUCTION

1 引言

Real-world information exists in a complex ecosystem of interconnected data types. Structured data, particularly graph-based connections, captures explicit relationships such as social networks and user interaction patterns (Fey et al., 2023). Complementing this, unstructured data - including text and visual content - reveals implicit semantic relationships between entities (Zhong & Mottin, 2023). The integration of these diverse data formats has become crucial for modern applications, as it enables more comprehensive and nuanced analysis of complex real-world scenarios (Lu et al., 2024).

现实世界的信息存在于一个由互连数据类型组成的复杂生态系统中。结构化数据,特别是基于图的连接,捕捉了诸如社交网络和用户交互模式等明确关系 (Fey et al., 2023)。与此相辅相成的是,非结构化数据——包括文本和视觉内容——揭示了实体之间的隐含语义关系 (Zhong & Mottin, 2023)。这些多样化数据格式的集成对于现代应用变得至关重要,因为它能够对复杂的现实场景进行更全面和细致的分析 (Lu et al., 2024)。

Graph serves as an effective means of representing relational information across various domains. In academic networks, papers are interconnected through explicit citations, with each paper represented as a node in a graph and edges indicating these citations Chen et al. (2023); Wang et al. (2022). This structure enables researchers to trace the influence of one paper on another, showcasing the evolution of ideas. Additionally, the papers’ content provides unstructured data for analyzing themes, methodologies, and findings. By integrating structured citation data with unstructured text, researchers can identify trends and derive valuable insights, leading to applications such as knowledge summaries and scientific question-answering, which can be framed as Graph-enhanced Text Generative Tasks.

图是表示跨领域关系信息的有效手段。在学术网络中,论文通过显式引用相互连接,每篇论文表示为图中的一个节点,边表示这些引用 (Chen et al., 2023; Wang et al., 2022)。这种结构使研究人员能够追踪一篇论文对另一篇论文的影响,展示思想的演变。此外,论文的内容为分析主题、方法和发现提供了非结构化数据。通过将结构化引用数据与非结构化文本相结合,研究人员可以识别趋势并得出有价值的见解,从而应用于知识总结和科学问答等任务,这些任务可以视为图增强文本生成任务。

In e-commerce scenarios, customer interactions form structured behavior data that can be analyzed in conjunction with unstructured data sources, such as product reviews and descriptions Shuai et al. (2022); Li et al. (2023). This integrated approach enables businesses to gain deeper insights into consumer behavior patterns and improve recommendation accuracy. Specifically, by integrating user behavior graphs with rich textual information, these user-item interaction forecasting challenges can be effectively approached as Graph-related Predictive Tasks.

在电子商务场景中,客户互动形成了结构化行为数据,这些数据可以与产品评论和描述等非结构化数据源结合分析 (Shuai et al., 2022; Li et al., 2023)。这种集成方法使企业能够更深入地了解消费者行为模式,并提高推荐准确性。具体而言,通过将用户行为图与丰富的文本信息相结合,这些用户-物品互动预测挑战可以有效地作为图相关预测任务来处理。

Existing graph learning methods have become essential frameworks for analyzing and learning from graph data (Hamilton, 2020). These methods focus on learning embeddings for nodes and edges, mapping structural information into a latent representation space (Yang et al., 2020). Among these, Graph Neural Networks (GNNs) stand out as state-of-the-art (SOTA) approaches (Dai et al., 2022; Liu et al., 2022). GNNs employ a message-passing mechanism that allows nodes to exchange information with their neighbors, effectively capturing the graph’s structural characteristics and enhancing representation learning. However, they primarily focus on explicit graph connections, often neglecting the complex semantic dependencies associated with linked textual data. Additionally, GNNs generally have limited generalization capabilities for real-world graph mining tasks (Xia & Huang, 2024; Mao et al., 2024). They often require training task-specific models, which complicates automation and reduces effectiveness in zero-shot scenarios. In practical applications, the ability to process both structured and unstructured data, particularly with unseen new data, is crucial.

现有的图学习方法已成为分析和学习图数据的重要框架 (Hamilton, 2020)。这些方法专注于学习节点和边的嵌入,将结构信息映射到潜在表示空间中 (Yang et al., 2020)。其中,图神经网络 (Graph Neural Networks, GNNs) 作为最先进 (SOTA) 的方法脱颖而出 (Dai et al., 2022; Liu et al., 2022)。GNNs 采用消息传递机制,允许节点与其邻居交换信息,有效捕捉图的结构特征并增强表示学习。然而,它们主要关注显式的图连接,往往忽略了与链接文本数据相关的复杂语义依赖关系。此外,GNNs 在现实世界的图挖掘任务中通常具有有限的泛化能力 (Xia & Huang, 2024; Mao et al., 2024)。它们通常需要训练特定任务的模型,这使自动化变得复杂,并降低了在零样本场景中的有效性。在实际应用中,处理结构化和非结构化数据的能力,特别是处理未见过的数据,至关重要。

Inspired by the recent success of large language models (LLMs), researchers are striving to enhance the generalization capabilities of graph learning models by enabling LLMs to comprehend graph structural information. Notable examples include GraphGPT (Tang et al., 2024a) and LLaGA (Chen et al., 2024a), which convert graph-structured data into tokens suitable for LLM input. However, these approaches are primarily designed for conventional graph learning tasks, such as node classification and link prediction. This narrow focus limits their broader application in effectively handling both structured and unstructured data in a more flexible

受大语言模型 (LLMs) 近期成功的启发,研究人员正致力于通过让 LLMs 理解图结构信息来增强图学习模型的泛化能力。典型的例子包括 GraphGPT (Tang et al., 2024a) 和 LLaGA (Chen et al., 2024a),它们将图结构数据转换为适合 LLM 输入的 Token。然而,这些方法主要针对传统的图学习任务,如节点分类和链接预测。这种局限性限制了它们在更灵活地处理结构化和非结构化数据方面的广泛应用。


Figure 1: GraphAgent processes both structured and unstructured data, adapting seamlessly to various downstream tasks across diverse scenarios.

图 1: GraphAgent 处理结构化和非结构化数据,无缝适应各种场景下的下游任务。

and efficient manner. In light of these limitations, an important question arises: How can we empower individuals without any background in graph theory or machine learning to analyze their graph data using natural language and obtain the desired predictions and insights?

鉴于这些限制,一个重要的问题出现了:我们如何能够让没有任何图论或机器学习背景的个人使用自然语言分析他们的图数据,并获得所需的预测和见解?

The Presented Work. In this paper, we aim to establish a fully automated analysis framework capable of handling a wide variety of data types, including both structured and unstructured data. Our framework, GraphAgent, is designed to address diverse user needs, encompassing both graph-related predictive and generative tasks. Built on an agentic architecture, GraphAgent allows users to interact with it using natural language. This intuitive and comprehensive approach thoroughly empowers all individuals to obtain predictions and insights from graph-structured data, tailored to their specific requirements, without requiring specialized knowledge in graph learning.

本文工作。在本文中,我们旨在建立一个全自动的分析框架,能够处理多种数据类型,包括结构化和非结构化数据。我们的框架 GraphAgent 旨在满足用户多样化的需求,涵盖与图相关的预测和生成任务。基于智能体架构构建的 GraphAgent 允许用户使用自然语言与其交互。这种直观且全面的方法使所有个体能够根据其特定需求,从图结构数据中获得预测和洞察,而无需具备图学习的专业知识。

To achieve our objective, several key challenges must be addressed: i) Constructing Potential Semantic Relationships: How to derive latent semantic connections from complex data. ii) Automating Query Understanding and Task Formulation: How to automatically interpret user query prompts, formulate them into specific tasks (e.g., predictive or generative tasks), and effectively plan those tasks. iii) Efficient Task Execution: How to accurately and effectively implement the formulated tasks and return correct results. To tackle these challenges, our proposed model is designed with an advanced framework comprising three main components: a Graph Generator Agent that constructs Semantic Knowledge Graphs (SKGs) from user text, a Task Planning Agent that interprets queries and formulates tasks, and a Graph Action Agent that automates the task execution.

为实现我们的目标,必须解决几个关键挑战:i) 构建潜在语义关系:如何从复杂数据中推导出潜在的语义连接。ii) 自动化查询理解与任务制定:如何自动解释用户查询提示,将其制定为具体任务(例如预测性或生成性任务),并有效规划这些任务。iii) 高效任务执行:如何准确有效地执行制定的任务并返回正确结果。为应对这些挑战,我们提出的模型设计了一个先进的框架,包含三个主要组件:一个从用户文本中构建语义知识图谱(SKG)的图生成器智能体(Graph Generator Agent),一个解释查询并制定任务的任务规划智能体(Task Planning Agent),以及一个自动化任务执行的图操作智能体(Graph Action Agent)。

To summarize, this work presents the following contributions:

总结来说,本工作提出了以下贡献:

• Complex Practical Data Integration. Our framework provides robust handling of real-world scenarios by seamlessly merging structured and unstructured data with graph-based entity relationships. This unified approach enables dual capabilities - supporting both predictive analytics and text generation tasks. By allowing natural language interactions, users can directly query and analyze complex data structures, streamlining information extraction and improving accessibility. • Multi-Agent Workflow. This work introduces GraphAgent, an advanced automated graph language assistant that enhances the integration of structured and unstructured data analysis. It autonomously constructs semantic knowledge graphs (SKGs) from text, formulates predictive and generative tasks from user queries, and efficiently executes these tasks. This seamless collaboration enables GraphAgent to uncover complex relational information and semantic dependencies, significantly improving usability and accessibility in graph analysis.

• 复杂的实际数据集成。我们的框架通过将结构化和非结构化数据与基于图的实体关系无缝融合,提供了对现实场景的稳健处理。这种统一方法实现了双重能力——既支持预测分析,也支持文本生成任务。通过允许自然语言交互,用户可以直接查询和分析复杂的数据结构,简化信息提取并提高可访问性。
• 多智能体工作流。本工作引入了 GraphAgent,这是一种先进的自动化图语言助手,增强了结构化和非结构化数据分析的集成。它能够从文本中自主构建语义知识图 (SKG),根据用户查询制定预测和生成任务,并高效执行这些任务。这种无缝协作使 GraphAgent 能够揭示复杂的关系信息和语义依赖,显著提高了图分析的可用性和可访问性。


Figure 2: The overall framework of the proposed GraphAgent.

图 2: 提出的 GraphAgent 的整体框架。

• Experimental Evaluation. We validated our model on both structured and unstructured data, showing strong performance across graph predictive tasks and new graph-related text generative tasks. Additionally, we conducted ablation experiments to assess the effectiveness of key modules. It is important to note that our entire agent framework employs relatively small open-source large language models (e.g., LLaMA-8B), yet our model still exhibits significant advantages compared to current state-of-the-art closed-source models (e.g., GPT-4, Gemini) for generation tasks.

• 实验评估。我们在结构化和非结构化数据上验证了我们的模型,展示了在图预测任务和新图相关文本生成任务中的强大性能。此外,我们还进行了消融实验以评估关键模块的有效性。值得注意的是,我们的整个智能体框架采用了相对较小的开源大语言模型(例如 LLaMA-8B),但我们的模型在生成任务中仍然表现出与当前最先进的闭源模型(例如 GPT-4、Gemini)相比的显著优势。

2 METHODOLOGY

2 方法论

2.1 PRELIMINARIES

2.1 预备知识

Graph-empowered Agents. Our GraphAgent proposes an automated agentic pipeline that addresses graph predictive and text generation tasks. It can be formulated as $\mathcal{Y},=,f(\mathcal{O};\mathrm{LLM})$ , where the agentic function $f(\cdot)$ incorporates an Observation $\scriptscriptstyle\mathcal{O}$ that includes both the structured data (e.g., explicit graph connections) or unstructured data (e.g., textual information). The agent then produces an Action $\boldsymbol{\wp}$ , which can involve predictions (e.g., node classifications) or text generation tasks (e.g., summarizing text with implicit entity interdependencies). The workflow of GraphAgent leverages the capabilities of LLMs to enhance its effectiveness in both predictive and generative tasks.

图赋能智能体。我们的 GraphAgent 提出了一种自动化的智能体管道,用于解决图预测和文本生成任务。它可以表示为 $\mathcal{Y},=,f(\mathcal{O};\mathrm{LLM})$,其中智能体函数 $f(\cdot)$ 包含一个观察 $\scriptscriptstyle\mathcal{O}$,该观察既包括结构化数据(例如显式图连接),也包括非结构化数据(例如文本信息)。然后,智能体生成一个动作 $\boldsymbol{\wp}$,该动作可能涉及预测(例如节点分类)或文本生成任务(例如总结具有隐式实体相互依赖关系的文本)。GraphAgent 的工作流程利用了大语言模型的能力,以增强其在预测和生成任务中的有效性。

Graph-Structured Data. In our GraphAgent, both structured and unstructured data are represented as graphs, differing only in the explicitness or implicit ness of the entity-wise relationships. To accommodate the diversity of graph data, we utilize heterogeneous graphs to represent the input data. Specifically, a heterogeneous graph is denoted as $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{N},\mathcal{R})$ , where $\nu$ is the set of all entities, and $\mathcal{E}$ is the set of all edges connecting pairs of entities. The sets $\mathcal{N}$ and $\mathcal{R}$ represent the types of nodes and edges, respectively. For each edge, a meta-type attribute can be retrieved in the form $(n_{h},r_{i},n_{t})$ , denoting the meta-types of the head node $n_{h}$ , relation $r_{i}$ , and tail node $n_{t}$ , respectively.

图结构数据。在我们的 GraphAgent 中,结构化和非结构化数据都被表示为图,唯一的区别在于实体间关系的显式或隐式。为了适应图数据的多样性,我们使用异构图来表示输入数据。具体来说,异构图表示为 $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{N},\mathcal{R})$,其中 $\nu$ 是所有实体的集合,$\mathcal{E}$ 是连接实体对的所有边的集合。集合 $\mathcal{N}$ 和 $\mathcal{R}$ 分别表示节点和边的类型。对于每条边,可以以 $(n_{h},r_{i},n_{t})$ 的形式检索元类型属性,分别表示头节点 $n_{h}$、关系 $r_{i}$ 和尾节点 $n_{t}$ 的元类型。

2.2 GRAPH GENERATION AGENT

2.2 图生成智能体

To uncover the rich contextual information within unstructured data, GraphAgent designs a Graph Generation Agent that automatically constructs meaningful Semantic Knowledge Graphs (SKGs) from any type of textual input. For example, for a paper abstract that includes the sentence, “Contrastively trained text-image models have the remarkable ability to perform zero-shot classification”, the model can extract relevant entity nodes such as “text-image models” and “zero-shot classification”.

为了揭示非结构化数据中丰富的上下文信息,GraphAgent 设计了一个图生成智能体,能够从任何类型的文本输入中自动构建有意义的语义知识图 (Semantic Knowledge Graphs, SKGs)。例如,对于包含句子“对比训练的文本-图像模型具有执行零样本分类的显著能力”的论文摘要,模型可以提取出相关的实体节点,如“文本-图像模型”和“零样本分类”。

Iterative Two-Phase Graph Generation Workflow. To capture complex implicit entity-wise dependencies, our graph generation agent operates through an automated two-phase workflow: (1) Scaffold Knowledge Entity Extraction and (2) Knowledge Description Augmentation. The first phase is dedicated to identifying key knowledge entities or concepts referred to as scaffold knowledge nodes from the provided text, regardless of its format. Specifically, this phase can be formulated as:

迭代式两阶段图生成工作流。为了捕捉复杂的隐式实体间依赖关系,我们的图生成智能体通过一个自动化的两阶段工作流程进行操作:(1) 骨架知识实体提取和 (2) 知识描述增强。第一阶段致力于从提供的文本中识别出关键的知识实体或概念,这些被称为骨架知识节点,无论文本的格式如何。具体来说,这一阶段可以表述为:

\begin{array}{r}{\mathcal{V}_{\mathrm{scaffold}}^{k=0}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys\_sk}},\mathbf{g}_{s}),}\end{array}

where $\mathbf{g}{s}$ represents the input unstructured text data, while $\mathbf{x}{\mathrm{sys_sk}}$ denotes the system prompt for extracting scaffold knowledge nodes. We adopt an iterative approach to graph generation to capture both high-level and fine-grained semantic dependencies among multi-grained entities. For example, in an academic paper, high-level entities might include "Machine Learning," while fine-grained entities could be "Self-Supervised Learning" and "Graph Neural Network". Specifically, $\bar{\nu}_{\mathrm{scaffc}}^{k=0}$ ld refers to the generated vertices during the initial iteration ( $k=0$ ).

其中 $\mathbf{g}{s}$ 表示输入的非结构化文本数据,而 $\mathbf{x}{\mathrm{sys_sk}}$ 表示用于提取骨架知识节点的系统提示。我们采用迭代的方法来生成图,以捕捉多粒度实体之间的高层次和细粒度语义依赖关系。例如,在学术论文中,高层次实体可能包括“机器学习”,而细粒度实体可能是“自监督学习”和“图神经网络”。具体来说,$\bar{\nu}_{\mathrm{scaffc}}^{k=0}$ 指的是初始迭代($k=0$)期间生成的顶点。

The second phase of knowledge augmentation centers on enhancing and enriching the textual descriptions of the generated entity nodes to ensure accurate, comprehensive, and con textually appropriate language modeling. This critical step ensures that each entity is represented with sufficient detail and semantic clarity. Formally, we define this phase as follows:

知识增强的第二阶段侧重于增强和丰富生成的实体节点的文本描述,以确保准确、全面且上下文适当的语言建模。这一关键步骤确保每个实体都能以足够的细节和语义清晰度进行表示。正式地,我们将这一阶段定义如下:

\mathcal{C}_{\mathrm{scaffold}}^{k=0}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys\_ka}},\mathbf{g}_{s},\gamma_{\mathrm{scaffold}}^{k=0}).

where $\mathcal{C}{\mathrm{scaffold}}^{k=0}$ denotes the node-specific descriptions, while $\mathbf{x}{\mathrm{sys_ka}}$ denotes the system prompt for knowledge augmentation. To iterative ly execute this two-phase workflow, GraphAgent uses the textual augmentation output from the previous round as the implicit graph input for the next round:

其中 $\mathcal{C}{\mathrm{scaffold}}^{k=0}$ 表示节点特定的描述,而 $\mathbf{x}{\mathrm{sys_ka}}$ 表示用于知识增强的系统提示。为了迭代执行这个两阶段工作流,GraphAgent 使用上一轮的文本增强输出作为下一轮的隐式图输入:

\begin{array}{r l}&{\mathcal{V}_{\mathrm{scaffold}}^{k=j}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys}_{-}\mathrm{sk}},\mathcal{C}_{\mathrm{scaffold}}^{k=j-1})}\\ &{\mathcal{C}_{\mathrm{scaffold}}^{k=j}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys}_{-}\mathrm{ka}},\mathcal{C}_{\mathrm{scaffold}}^{k=j-1},\mathcal{V}_{\mathrm{scaffold}}^{k=j-1}).}\end{array}

set: $\nu_{\mathrm{skg}}=\breve{\bigcup}\nu_{\mathrm{scaffold}}^{k}$ and $\mathcal{C}_{\mathrm{skg}}=\bigcup\dot{\mathcal{C}}_{\mathrm{scaffold}}^{k}$ . The relationships among these nodes, denoted as $\mathcal{E}_{\mathrm{skg}}$ are establis hed based on their deri vation: if a new node is generated from the textual description of a node in the previous iteration, we connect these two nodes in the semantic knowledge graph. The system prompts used for graph generation are detailed in Table 6, which is presented in the Appendix.

集合:$\nu_{\mathrm{skg}}=\breve{\bigcup}\nu_{\mathrm{scaffold}}^{k}$ 和 $\mathcal{C}_{\mathrm{skg}}=\bigcup\dot{\mathcal{C}}_{\mathrm{scaffold}}^{k}$。这些节点之间的关系,表示为 $\mathcal{E}_{\mathrm{skg}}$,是基于它们的派生建立的:如果一个新节点是从前一次迭代中某个节点的文本描述生成的,我们就在语义知识图中连接这两个节点。用于图生成的系统提示详见附录中的表 6。

2.3 TASK PLANNING AGENT

2.3 任务规划智能体 (Task Planning Agent)

With both structured and unstructured data represented as graphs, GraphAgent employs a task planning agent to automatically interpret user queries and transform the graph data into a unified embedding structure. This facilitates easier utilization by the subsequent predictive and generative modules. Input-output examples of the task planning agent is provided in Table 3 in the Appendix.

将结构化和非结构化数据表示为图后,GraphAgent 使用任务规划智能体自动解释用户查询,并将图数据转换为统一的嵌入结构。这有助于后续的预测和生成模块更轻松地利用这些数据。任务规划智能体的输入输出示例见附录中的表 3。

2.3.1 Intent Identification and Task Formulation

2.3.1 意图识别与任务制定

The task planning agent is initially tasked with formulating meaningful predictive or generative tasks based on the user query prompt. Given a user query prompt $\mathbf{x}{\mathrm{usr_p}}$ and a predefined system prompt for task parsing $\mathbf{x}{\mathrm{sys_tp}}$ , the task planning agent formulates the intended task as follows:

任务规划智能体最初的任务是基于用户查询提示制定有意义的预测性或生成性任务。给定用户查询提示 $\mathbf{x}{\mathrm{usr_p}}$ 和预定义的任务解析系统提示 $\mathbf{x}{\mathrm{sys_tp}}$ ,任务规划智能体按如下方式制定预期任务:

\begin{array}{r}{\mathbf{g}_{s},\mathbf{x}_{\mathrm{usr\_ann}},\mathbf{t}_{\mathrm{usr}}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys\_tp}},\mathbf{x}_{\mathrm{usr\_p}}),}\end{array}```

This intent identification and task formulation procedure generates three fundamental types of task attributes within our agent architecture, which is specifically defined as follows:

此意图识别和任务制定程序在我们的智能体架构中生成了三种基本类型的任务属性,具体定义如下:

• Source graph $\mathbf{g}_{s}$ represented by formatted files, textual graph descriptions, or plain documents.

• 源图 $\mathbf{g}_{s}$ 由格式化文件、文本图描述或纯文本文档表示。

• Task type $\mathbf{t}{\mathrm{usr}}$ is inferred from the query prompt and can be one of "predictive predefined", "predictive wild", or "open generation". This task type symbol is used to automatically select system prompt templates during training or inference for different tasks. • User annotation $\mathbf{x}{\mathrm{usr_ann}}$ includes additional task information, such as task descriptions, label candidates for predictive tasks, and generation requirements for generative tasks.

• 任务类型 $\mathbf{t}{\mathrm{usr}}$ 是从查询提示中推断出来的,可以是“预测预定义”、“预测开放”或“开放生成”中的一种。此任务类型符号用于在训练或推理期间自动选择不同任务的系统提示模板。
• 用户注释 $\mathbf{x}
{\mathrm{usr_ann}}$ 包括额外的任务信息,例如任务描述、预测任务的标签候选以及生成任务的生成要求。

To construct grounded graph tokens that can be understood by the subsequent action agent, the task planning agent follows two stages: i) Graph-Token Grounding—converting graphs with nodes and edges into grounded Python objects; ii) Graph Token iz ation—generating tokens from the input that preserve complex interdependencies among graph-structured entities.

为了构建后续动作智能体能够理解的基于图的Token,任务规划智能体遵循两个阶段:i) 图-Token 基础化 (Graph-Token Grounding) —— 将带有节点和边的图转换为基于Python语言的对象;ii) 图Token化 (Graph Tokenization) —— 从输入中生成Token,保留图结构实体之间的复杂相互依赖关系。

2.3.2 Graph-Token Grounding

2.3.2 图-Token 对齐

Our framework reads graph nodes and edges and converts them into grounded Python objects using a graph-building and wrapping tool. Notably, our model can handle diverse graph inputs, regardless of whether an explicit graph with predefined nodes and edges is present. For simplicity, we will demonstrate a scenario where the user uploads a predefined graph. For example, the query prompt might be: "...I want to know which category is correct for the node with ID [305]..." with uploaded graph files such as ["node_list.txt", "edge_list.txt"]. To build a grounded graph object in Python, we utilize the graph-building and wrapping tool $(\mathrm{GBW_Tool(\cdot)})$ with PyG (Fey & Lenssen, 2019) to add nodes and construct edges. Since user-uploaded graphs can have arbitrary node and edge types, we standardize the graphs as heterogeneous graphs, where $s_{i}$ and $r_{i}$ represent the types of each node and edge, respectively. Formally, a heterogeneous graph is constructed as:

我们的框架读取图节点和边,并使用图构建和封装工具将它们转换为具体的 Python 对象。值得注意的是,我们的模型可以处理各种图输入,无论是否存在具有预定义节点和边的显式图。为了简化,我们将演示用户上传预定义图的场景。例如,查询提示可能是:“...我想知道 ID 为 [305] 的节点属于哪个类别...” 并上传图文件,如 ["node_list.txt", "edge_list.txt"]。为了在 Python 中构建具体的图对象,我们利用图构建和封装工具 $(\mathrm{GBW_Tool(\cdot)})$ 和 PyG (Fey & Lenssen, 2019) 来添加节点并构建边。由于用户上传的图可以具有任意节点和边类型,我们将图标准化为异构图,其中 $s_{i}$ 和 $r_{i}$ 分别表示每个节点和边的类型。形式上,异构图构建如下:

\begin{array}{r l}&{\mathcal{G}^{\mathrm{exp}}=\mathrm{GBW}_{-}\mathrm{Tool}(\mathcal{V},\mathcal{E},\mathcal{N},\mathcal{R})}\\ &{\mathcal{G}^{\mathrm{skg}}=\mathrm{GBW}_{-}\mathrm{Tool}(\mathcal{V}_{\mathrm{skg}},\mathcal{E}_{\mathrm{skg}},\mathcal{N}_{\mathrm{skg}},\mathcal{R}_{\mathrm{skg}})}\end{array}```

where $\mathcal{V},\mathcal{E},\mathcal{N},\mathcal{R}$ represent the nodes, edges, node types, and edge types of the explicit graph, respectively. They are obtained by parsing the graph input $\mathbf{g}{s}$ . Similarly, $\bar{\mathcal{V}}{\mathrm{skg}},\mathcal{E}{\mathrm{skg}},\mathcal{N}{\mathrm{skg}},\mathcal{R}_{\mathrm{skg}}$ denote the corresponding graph components generated by the aforementioned Graph Generation Agent. This graph grounding module enables our model to convert graph data from various representations and forms into unified Python objects, facilitating their subsequent utilization.

其中,$\mathcal{V},\mathcal{E},\mathcal{N},\mathcal{R}$ 分别表示显式图的节点、边、节点类型和边类型。它们是通过解析图输入 $\mathbf{g}{s}$ 获得的。类似地,$\bar{\mathcal{V}}{\mathrm{skg}},\mathcal{E}{\mathrm{skg}},\mathcal{N}{\mathrm{skg}},\mathcal{R}_{\mathrm{skg}}$ 表示由上述图生成智能体生成的相应图组件。该图基础模块使我们的模型能够将各种表示和形式的图数据转换为统一的 Python 对象,便于后续使用。

2.3.3 Graph Token iz ation

2.3.3 图 Token 化

The Task Planning Agent converts discrete nodes and edges into embedded representations suitable for action agents based on graph LLMs. This token iz ation process consists of two stages: first, encoding the graph into embeddings, and second, retrieving the nodes and their neighbors to create input graph tokens. For the embedding process, we employ a pre-trained text encoder $f_{\mathrm{text_enc}}$ and a pre-trained GNN $f_{\mathrm{gnn}}$ . Graph tokens are generated by initially encoding the textual features c of the graph nodes and their meta types using the text encoder, followed by modeling geometric features.

任务规划智能体将离散的节点和边转换为适合基于图大语言模型的动作智能体的嵌入表示。这个 Token 化过程包括两个阶段:首先,将图编码为嵌入表示;其次,检索节点及其邻居以创建输入图 Token。在嵌入过程中,我们使用预训练的文本编码器 $f_{\mathrm{text_enc}}$ 和预训练的图神经网络 $f_{\mathrm{gnn}}$。图 Token 的生成首先通过文本编码器对图节点的文本特征 c 及其元类型进行编码,然后对几何特征进行建模。

\begin{array}{r l}&{\mathbf{e}_{i}^{\mathrm{text}}=f_{\mathrm{text\_encoder}}(\mathbf{c}_{i});\;\mathbf{e}_{s_{i}|r_{i}}^{\mathrm{text}}=f_{\mathrm{text\_encoder}}(\mathbf{c}_{s_{i}|r_{i}})}\\ &{\mathbf{e}_{i}^{\mathrm{gnn}}=f_{\mathrm{gnn}}(\mathbf{e}_{i}^{\mathrm{text}},\mathbf{e}_{s_{i}}^{\mathrm{text}},\mathbf{e}_{r_{i}}^{\mathrm{text}},\mathcal{V},\mathcal{E}).}\end{array}

For each central node $i$ in our heterogeneous graph, we systematically apply a graph sampling tool to create the subgraph input for the subsequent action agent, which can be formulated as follows:

对于我们异构图中的每个中心节点 $i$,我们系统地应用图采样工具来创建后续动作智能体的子图输入,其公式如下:

[\mathbf{e}_{N_{i}}^{\mathrm{gnn}}]=\mathrm{Sampling}_{-}\mathrm{Tool}(\mathcal{G},\mathbf{E}^{\mathrm{gnn}},i)

2.4 GRAPH ACTION AGENT

2.4 图动作智能体

To enhance the capabilities of graph encoding and prediction/generation, we incorporate a trainable Graph Action Agent into our GraphAgent framework, based on the Graph LLM architecture (Tang et al., 2024b; Chen et al., 2024a). This Graph Action Agent is specifically trained to optimize performance for both predictive and text generation tasks involving graph data.

为了增强图编码和预测/生成的能力,我们在GraphAgent框架中引入了一个可训练的图操作智能体(Graph Action Agent),该框架基于图大语言模型(Graph LLM)架构(Tang et al., 2024b; Chen et al., 2024a)。这个图操作智能体经过专门训练,以优化涉及图数据的预测和文本生成任务的性能。

2.4.1 Cross-Task Graph Agent

2.4.1 跨任务图智能体

The graph action agent is capable of handling two categories of diverse tasks, as shown below. The details on the system prompt builder and examples of system prompts are shown in Table 6.

图动作智能体能够处理以下两类不同的任务。系统提示构建器的详细信息及系统提示示例见表 6。

• Predictive Graph-Language Tasks. These tasks focus on generating predictions based on user prompts, utilizing both structured and unstructured data. Examples include node classification and link prediction for explicit graph data, as well as document classification based on extracted implicit semantic knowledge graphs (SKGs), such as categorizing news articles. When using implicit SKGs to complement explicit graphs, the graph generator agent uses the observed explicit nodes as initial scaffold nodes to build the SKG. Specifically, for these tasks, our model constructs a system prompt that effectively guides the LLM toward task-specific objectives:

• 预测性图语言任务。这些任务侧重于基于用户提示生成预测,利用结构化和非结构化数据。示例包括显式图数据的节点分类和链接预测,以及基于提取的隐式语义知识图 (SKG) 的文档分类,例如对新闻文章进行分类。当使用隐式 SKG 来补充显式图时,图生成器智能体使用观察到的显式节点作为初始支架节点来构建 SKG。具体来说,对于这些任务,我们的模型构建了一个系统提示,有效地引导大语言模型实现任务特定目标:

{\bf x}_{\mathrm{sys\mathrm{-}p r e d\mathrm{\underline{{~i~}}}}}=f_{\mathrm{sys}}({\bf t}_{\mathrm{usr}},{\bf x}_{\mathrm{usr\mathrm{\underline{{~ann}}}}},{\bf g}_{s}),

where the prompt builder function $f_{\mathrm{sys}}$ creates an appropriate system prompt based on the task type and user annotations, incorporating $\mathbf{g}_{s}$ for node or graph textual information. The predictive graph-language tasks are then defined as follows:

其中提示构建函数 $f_{\mathrm{sys}}$ 根据任务类型和用户注释创建适当的系统提示,并结合 $\mathbf{g}_{s}$ 用于节点或图文本信息。预测性图语言任务定义如下:

\mathbf{y}_{\mathrm{pred}},\mathbf{y}_{\mathrm{reasoning}}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys\mathrm{.}p r e d\mathrm{.}i}},\{\mathcal{G}^{\mathrm{exp}}|\mathcal{G}^{\mathrm{skg}}\}),

where ${\mathcal G^{\mathrm{exp}}|\mathcal G^{\mathrm{skg}}}$ indicates that the agent can utilize either $\mathcal{G}^{\mathrm{exp}},\mathcal{G}^{\mathrm{skg}}$ or both. In this context, the LLM generates accurate predictions and reasoning in response to the user’s query prompt.

其中 ${\mathcal G^{\mathrm{exp}}|\mathcal G^{\mathrm{skg}}}$ 表示智能体可以使用 $\mathcal{G}^{\mathrm{exp}}$、$\mathcal{G}^{\mathrm{skg}}$ 或两者。在这种情况下,大语言模型会根据用户的查询提示生成准确的预测和推理。

• Generative Graph-Language Tasks. The discovered SKGs can serve as robust and comprehensive references for generative language tasks, such as text generation and sum mari z ation. These openended tasks are typically prompted in a direct text format that implicitly contains knowledge, without the need for predefined graphs. For example, to summarize a news article, an SKG $\mathcal{G^{\mathrm{skg}}}$ is automatically constructed from the article’s content, which includes rich entities and connections that aid in the sum mari z ation task. Additionally, a system prompt is automatically generated to enhance the content generation quality using the graph-structured information, as follows:

• 生成式图语言任务。发现的SKG可以作为生成式语言任务(如文本生成和摘要)的稳健且全面的参考。这些开放式任务通常以直接文本格式提示,隐式包含知识,无需预定义的图。例如,为了总结一篇新闻文章,SKG $\mathcal{G^{\mathrm{skg}}}$ 会自动从文章内容中构建,其中包含丰富的实体和连接,有助于摘要任务。此外,系统提示会自动生成,以利用图结构信息提高内容生成质量,如下所示:

\begin{array}{r l r}&{\mathbf{x}_{\mathrm{sys\mathrm{-}g e n\mathrm{-}i}}=f_{\mathrm{sys}}(\mathbf{t}_{\mathrm{usr}},\mathbf{x}_{\mathrm{usr\mathrm{-}a n n}},\mathbf{g}_{s})}&\\ &{\qquad\mathbf{y}_{\mathrm{gen}}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys\mathrm{-}g e n\mathrm{-}i}},\mathcal{G}^{\mathrm{skg}}),}\end{array}

where $\mathbf{y}_{\mathrm{gen}}$ represents the generated textual output, with input parameters consistent with those used in predictive tasks. In this context, the LLM focuses on producing coherent and con textually accurate content based on both text and graph inputs.

其中 $\mathbf{y}_{\mathrm{gen}}$ 表示生成的文本输出,输入参数与预测任务中使用的参数一致。在这种情况下,大语言模型专注于基于文本和图输入生成连贯且上下文准确的内容。

2.4.2 Graph-Instruction Alignment

2.4.2 图-指令对齐

To teach our agent in comprehending graph-structured data, we implement graph-instruction alignment in the initial fine-tuning stage. Inspired by the work of Tang et al. (2024b), we utilize the efficient, effective, and easily scalable task of graph-instruction matching as our alignment target. Specifically, we present a set of graph token-instruction pairs:

为了教导我们的AI智能体理解图结构数据,我们在初始的微调阶段实施了图-指令对齐。受Tang等人 (2024b) 工作的启发,我们采用了高效、有效且易于扩展的图-指令匹配任务作为对齐目标。具体来说,我们提供了一组图Token-指令对:

\mathcal{D}^{g}=[(\mathbf{e}_{0},s_{0}),(\mathbf{e}_{1},s_{1}),...];\;\mathcal{D}^{c}=[(\mathbf{c}_{0},\mathbf{c}_{s_{0}}),(\mathbf{c}_{1},\mathbf{c}_{s_{1}}),...],

where $({\bf e}{i},s{i})$ denotes the i-th graph token with meta type $s_{i}$ , and $(\mathbf{c}{i},\mathbf{c}{s_{i}})$ denotes the text description of the i-th graph token and its meta type, correspondingly. We devise two general tasks to achieve fine-grained and comprehensive alignment between the graph tokens and the textual instructions:

其中 $({\bf e}{i},s{i})$ 表示第 i 个图 Token 及其元类型 $s_{i}$,而 $(\mathbf{c}{i},\mathbf{c}{s_{i}})$ 分别表示第 i 个图 Token 及其元类型的文本描述。我们设计了两个通用任务,以实现图 Token 与文本指令之间的细粒度和全面对齐:

• Intra-type alignment. This alignment task aims to strengthen the capability of LLMs to interpret graph embedding tokens of certain meta-types through promoting their alignment with the relevant texts. This is conducted by training LLMs to output correct sequence of the texts given a sequence of graph tokens. Specifically, we construct a dataset $\mathcal{D}^{\mathrm{intra}}$ with each entry consists of two sequences of graph tokens and texts, separately: $d_{i}^{\mathrm{intra}}=([(\mathbf{e}{j},s{i}),...],[(\mathbf{c}{k},\mathbf{c}{s_{i}}),...])$ . Then, we train the alignment with a next-token-prediction Cross-Entropy objective as follows:

• 类型内对齐。该对齐任务旨在通过促进大语言模型与相关文本的对齐,增强其解释特定元类型的图嵌入 Token 的能力。具体来说,我们通过训练大语言模型在给定图 Token 序列的情况下输出正确的文本序列来实现这一点。我们构建了一个数据集 $\mathcal{D}^{\mathrm{intra}}$,其中每个条目由两个序列组成,分别是图 Token 和文本序列:$d_{i}^{\mathrm{intra}}=([(\mathbf{e}{j},s{i}),...],[(\mathbf{c}{k},\mathbf{c}{s_{i}}),...])$。然后,我们使用下一个 Token 预测的交叉熵目标进行对齐训练,如下所示:

\begin{array}{r}{\mathrm{{argmin}}_{\Theta}\mathrm{{CE\mathrm{\mathrm{\mathrm{-}}L o s s}}}(d_{i}^{\mathrm{{intra}}}[0]|\mathrm{{LLM}}(d_{i}^{\mathrm{{intra}}}[1])),}\end{array}

where $\Theta$ denotes the learnable parameters of the large language model $\mathrm{LLM}(\cdot)$ . And indices [0] and [1] indicate the text sequence and the graph token sequence, respectively.

其中 $\Theta$ 表示大语言模型 $\mathrm{LLM}(\cdot)$ 的可学习参数。索引 [0] 和 [1] 分别表示文本序列和图 Token 序列。

• Inter-type alignment. As introducing multiple meta-types in the alignment task can further empower the LLM’s comprehension of complex heterogeneous relations, we devise anthor alignment training objective using inter-type graph tokens. Technically, the dataset $\mathcal{D}^{\mathrm{inter}}$ is constructed by sampling entries that consist graph tokens of different meta-types in the first sequence: $d_{i}^{\mathrm{inter}},=,(\bar{[}({\bf e}{m},\bar{s}{m}),({\bf e}{n},s{n}),\ldots],[({\bf c}{n},\bar{{\bf c}{s_{n}}}),({\bf c}{q},{\bf c}{s_{q}}),\ldots])$ . Then, the LLM is trained to predict the text sequence and the meta-type sequence of the provided graph tokens:

• 跨类型对齐。由于在对齐任务中引入多种元类型可以进一步增强大语言模型对复杂异构关系的理解,我们设计了另一种使用跨类型图 Token 的对齐训练目标。技术上,数据集 $\mathcal{D}^{\mathrm{inter}}$ 通过采样包含不同元类型的图 Token 的条目来构建:$d_{i}^{\mathrm{inter}},=,(\bar{[}({\bf e}{m},\bar{s}{m}),({\bf e}{n},s{n}),\ldots],[({\bf c}{n},\bar{{\bf c}{s_{n}}}),({\bf c}{q},{\bf c}{s_{q}}),\ldots])$。然后,训练大语言模型以预测提供的图 Token 的文本序列和元类型序列:

\begin{array}{r}{\mathrm{argmin}_{\Theta}\mathrm{CE\mathrm{\_Loss}}(d_{i}^{\mathrm{inter}}[0]|d_{i}^{\mathrm{inter}}[1])).}\end{array}

2.4.3 Agent Task Finetuning

2.4.3 AI智能体任务微调

To enhance GraphAgent’s performance on different agent tasks, we propose to finetune the action agent with diverse graph-language instructions covering different agent tasks. Recall that with the task planning agent we have the user requested task $\mathbf{t}\in\mathcal{T}$ from the query prompt. For each $\mathbf{t}$ in the instruction dataset, we pair it with a special systematic prompt to distinguish between various tasks during training. The systematic prompt contains brief description of the task being handled. Formally, the agent task finetuning dataset is constructed as:

为了提升 GraphAgent 在不同智能体任务上的表现,我们提出通过涵盖不同智能体任务的多样化图-语言指令来微调动作智能体。回顾一下,通过任务规划智能体,我们从查询提示中获取了用户请求的任务 $\mathbf{t}\in\mathcal{T}$。对于指令数据集中的每个 $\mathbf{t}$,我们将其与一个特殊的系统提示配对,以在训练过程中区分不同的任务。系统提示包含对所处理任务的简要描述。形式上,智能体任务微调数据集的构建如下:

\mathcal{D}^{m u l t i}=\big\{\big(\{\big(\mathbf{x}_{\mathrm{pred}},\mathbf{x}_{\mathrm{reasoning}}\big)\big|\mathbf{x}_{\mathrm{gen}}\big\},\big\{\mathcal{G}^{\mathrm{exp}}\big|\mathcal{G}^{\mathrm{skg}}\big\},\mathbf{t}_{i},\mathbf{a}_{i}\big)\big\},

Table 1: Dataset details for training and evaluation. "NC" is short for node classification.

表 1: 训练和评估数据集的详细信息。"NC" 是节点分类的缩写。

IMDB ACM Arxiv-Papers ICLR-PeerReviews RelatedWorkGeneration GovReportSummarization
任务类型 预测 预测 预测 预测 生成 生成
子任务 NC NC 论文分类 论文评审预测 文本生成 文本摘要
预定义图? × × ×
训练样本数 2,400 5,175 3,141 4,155
评估样本数 1,000 500 500 500 304
Token 数 10M 0.8M 30M 45M 93M 2M
预定义图节点数 11,616 10,942
SKG 来源 人物实体 论文 论文 论文, 评审 多篇论文 文档
SKG 节点数 57,120 20,388 153,555 161,592 875,921 15,621

For each instruction-output pair, the graph provided can be a explicit graph, an automatically discovered SKG, or both. For predictive tasks, the output includes both a prediction and its reasoning, while for generative tasks, the output is the gold-standard objective.

对于每个指令-输出对,提供的图可以是一个显式图、自动发现的SKG,或两者兼有。对于预测任务,输出包括预测及其推理,而对于生成任务,输出则是黄金标准目标。

Further, to facilitate a smooth learning curve for multi-tasking the graph language model, we take inspiration from curriculum learning techniques (Xu et al., 2020; Bengio et al., 2009) and sort our training tasks into different difficulty levels. We start training with easier tasks to build the model’s foundational graph-language understanding. As training progresses, we gradually introduce more complex tasks to refine the model’s capabilities. The details are demonstrated in Table 8.

此外,为了促进图语言模型在多任务学习中的平滑学习曲线,我们借鉴了课程学习技术 (Xu et al., 2020; Bengio et al., 2009) 的灵感,将训练任务按难度级别排序。我们从较简单的任务开始训练,以建立模型的基础图语言理解能力。随着训练的进行,我们逐渐引入更复杂的任务,以提升模型的能力。具体细节如表 8 所示。

3 EVALUATION

3 评估

In this section, the effectiveness of our proposed GraphAgent framework is assessed through a detailed evaluation framework centered around several key Research Questions (RQs):

在本节中,我们通过围绕几个关键研究问题 (RQs) 的详细评估框架来评估我们提出的 GraphAgent 框架的有效性:

• RQ1: How effectively does our GraphAgent capture both graph relational information and the textual semantic inter-dependencies necessary for graph-related predictive tasks? • RQ2: How effective is GraphAgent at performing predictive tasks by capturing the complex but implicitly textual semantic inter-dependencies preserved within the textual data? • RQ3: How does our GraphAgent perform in graph-enhanced text generation tasks with implicit dependency understanding when compared to state-of-the-art large language models (LLMs)? • RQ4: What effects do the key components of our GraphAgent framework have on its overall performance, as demonstrated by the ablation studies?

• RQ1: 我们的 GraphAgent 在捕捉图关系信息和图相关预测任务所需的文本语义相互依赖性方面有多有效?
• RQ2: GraphAgent 在通过捕捉文本数据中保留的复杂但隐含的文本语义相互依赖性来执行预测任务时有多有效?
• RQ3: 与最先进的大语言模型 (LLMs) 相比,我们的 GraphAgent 在具有隐含依赖性理解的图增强文本生成任务中表现如何?
• RQ4: 通过消融研究,我们的 GraphAgent 框架的关键组件对其整体性能有何影响?

3.1 EXPERIMENTAL SETTINGS

3.1 实验设置

3.1.1 Implementation Details

3.1.1 实现细节

In our GraphAgent framework, the task planning agent and the graph generation agent are both powered by GPT3.5-Turbo. We enhance their performance in tackling user queries, planning tasks, and discovering semantic knowledge graphs (SKGs) by incorporating few-shot examples into the system prompts of the large language model (LLM). For graph grounding, we effectively utilize PyG to transform structural information into graph objects. In line with established practices, we employ Sentence-BERT (all-mpnet-base-v2) for text-attributed graph embedding, ensuring a robust semantic representation. For the graph action agent, we build it using Llama3-8b Llama Team (2024) as the foundational language model. To connect the textual semantic representation space with the graph-structural representation space Liu et al. (2024); Tang et al. (2024a), we incorporate a learnable adaptation linear layer. Additionally, we implement a heterogeneous graph model Tang et al. (2024b) that has been pre-trained using data from text-graph node pairs. The nodes are encoded with embeddings from the pre-trained model, projected through the learnable adaptation layer, and ultimately processed by the LLM along with relevant language tokens. This integrated approach facilitates seamless interaction between language understanding and graph-based reasoning.

在我们的 GraphAgent 框架中,任务规划智能体和图生成智能体均由 GPT3.5-Turbo 驱动。我们通过在大语言模型 (LLM) 的系统提示中加入少样本示例,提升了它们在处理用户查询、规划任务和发现语义知识图谱 (SKGs) 方面的性能。对于图接地 (graph grounding),我们有效地利用 PyG 将结构信息转换为图对象。按照既定实践,我们使用 Sentence-BERT (all-mpnet-base-v2) 进行文本属性图嵌入,以确保强大的语义表示。对于图动作智能体,我们基于 Llama3-8b Llama Team (2024) 构建其基础语言模型。为了连接文本语义表示空间和图结构表示空间 Liu et al. (2024); Tang et al. (2024a),我们引入了一个可学习的自适应线性层。此外,我们还实现了一个异构图模型 Tang et al. (2024b),该模型已使用文本-图节点对数据进行预训练。节点通过预训练模型的嵌入进行编码,并通过可学习的自适应层进行投影,最终由 LLM 与相关语言 Token 一起处理。这种集成方法促进了语言理解和基于图的推理之间的无缝交互。

3.1.2 Datasets

3.1.2 数据集

To ensure usability across a diverse range of graph agent tasks, we utilize various datasets for evaluating the performance of our GraphAgent. A summary of these datasets is provided in Table 1.

为了确保在各种图智能体任务中的可用性,我们利用多个数据集来评估 GraphAgent 的性能。这些数据集的总结如表 1 所示。

• Graph-Related Predictive Tasks. For tasks that involve explicit graph relational information, we utilize two benchmark datasets: IMDB Fu et al. (2020) and ACM Wang et al. (2019a). In contrast, for predictive tasks that do not depend on explicit graph structures, we have curated two additional datasets: Arxiv-Papers He et al. (2023) and ICLR-Peer Reviews1. The Arxiv-Papers dataset comprises published papers from Arxiv in 2023, from which we randomly sampled a subset. This dataset is created by analyzing the titles and abstracts of these papers to classify whether they are likely to be accepted. The ICLR-Peer Reviews dataset features pairs of papers and their corresponding reviews from ICLR 2024, specifically focusing on borderline cases that pose challenges in determining acceptance. This dataset is used for both training and testing purposes.

• 图相关的预测任务。对于涉及显式图关系信息的任务,我们使用了两个基准数据集:IMDB (Fu et al., 2020) 和 ACM (Wang et al., 2019a)。相比之下,对于不依赖于显式图结构的预测任务,我们整理了两个额外的数据集:Arxiv-Papers (He et al., 2023) 和 ICLR-Peer Reviews1。Arxiv-Papers 数据集包含 2023 年 Arxiv 上发表的论文,我们从中随机抽取了一个子集。该数据集通过分析这些论文的标题和摘要来分类它们是否可能被接受。ICLR-Peer Reviews 数据集则包含 ICLR 2024 的论文及其对应的评审,特别关注那些在决定是否接受时具有挑战性的边缘案例。该数据集用于训练和测试目的。

• Graph-Enhanced Text Generation. To demonstrate the text generation capabilities of model, we evaluate its performance in generating related work for research papers and summarizing lengthy documents using graph-enhanced semantic dependencies. First, we collected datasets from the ACL and EMNLP conferences, covering the years 2020 to 2023, including both the "main" and "findings" tracks. We extracted the related work sections from these papers and organized them into approximately 5,000 topic-content pairs. For generating related work, GraphAgent takes a list of paper titles and their corresponding abstracts—input that can be provided by users. Using this information, scaffold knowledge graphs are created and subsequently processed by the Graph Action Agent, which comprehends the data to produce comprehensive related work for the specified papers. Second, we utilize the GovReport dataset 2 to evaluate GraphAgent as a language assistant for document sum mari z ation. This dataset comprises detailed reports from government research agencies, including the Congressional Research Service and the U.S. Government Accountability Office. It necessitates the sum mari z ation of longer documents, maintaining richer context and semantic interdependencies, unlike other sum mari z ation datasets.

• 图增强文本生成。为了展示模型的文本生成能力,我们评估了其在生成研究论文相关工作部分和总结长篇文档方面的表现,利用图增强的语义依赖关系。首先,我们从 ACL 和 EMNLP 会议中收集了 2020 年至 2023 年的数据集,包括“主”轨道和“发现”轨道。我们从这些论文中提取了相关工作部分,并将其组织成大约 5,000 个主题-内容对。对于生成相关工作,GraphAgent 接收用户提供的论文标题及其对应摘要列表。利用这些信息,创建支架知识图,随后由图动作代理处理,理解数据以生成指定论文的全面相关工作。其次,我们使用 GovReport 数据集 2 来评估 GraphAgent 作为文档摘要的语言助手。该数据集包含来自政府研究机构的详细报告,包括国会研究服务处和美国政府问责办公室。与其他摘要数据集不同,它需要对较长文档进行摘要,同时保持更丰富的上下文和语义依赖关系。

3.1.3 Baseline Methods

3.1.3 基线方法

We incorporate a diverse range of baseline models from various research domains to ensure a comprehensive comparison. Specifically, we examine methods for graph-related predictive tasks, including homogeneous GNNs, heterogeneous models, and graph LLMs. Additionally, we utilize and compare state-of-the-art large language models—both open-source and closed-source—alongside retrieval-augmented generation (RAG) systems for enhanced text generation.

我们整合了来自不同研究领域的多种基线模型,以确保全面的比较。具体来说,我们研究了与图相关的预测任务的方法,包括同质图神经网络 (GNN)、异质模型和图大语言模型。此外,我们还使用并比较了最先进的大语言模型——包括开源和闭源模型——以及检索增强生成 (RAG) 系统,以增强文本生成能力。

• Graph-Related Predictive Tasks. We consider baseline methods from three key areas: i) Homogeneous GNNs, which include SAGE Hamilton et al. (2017) and GAT Velickovic et al. (2018) as representative models; ii) Heterogeneous Graph Models, featuring the specialists such as HAN Wang et al. (2019b), HGT Hu et al. (2020), and HetGNN Zhang et al. (2019); and iii) Graph LLMs, for which we adopt HiGPT Tang et al. (2024b), a state-of-the-art heterogeneous graph language model that is particularly well-suited for managing complex heterogeneous structures.

• 图相关预测任务。我们考虑了三个关键领域的基线方法:i) 同质图神经网络 (Homogeneous GNNs),包括 SAGE (Hamilton et al., 2017) 和 GAT (Velickovic et al., 2018) 作为代表性模型;ii) 异质图模型 (Heterogeneous Graph Models),包括 HAN (Wang et al., 2019b)、HGT (Hu et al., 2020) 和 HetGNN (Zhang et al., 2019) 等专家模型;iii) 图大语言模型 (Graph LLMs),我们采用了 HiGPT (Tang et al., 2024b),这是一种最先进的异质图语言模型,特别适合处理复杂的异质结构。

• Graph-Enhanced Text Generation. We utilize a variety of state-of-the-art large language models (LLMs), categorized as follows: i) Open-Source LLMs include the Llama 3 series Llama Team (2024), Mistral $\mathrm{NeMo}^{3}$ , and Qwen2-72b Yang et al. (2024); ii) Closed-Source Commercial LLMs consist of Deepseek-Chat-V2, GPT4o-mini, and Gemini-1.5-Flash, using their API services for empirical results; iii) LLM-empowered RAG Systems. We also compare GraphAgent with GraphRAG4, which enhances LLMs through graph-based retrieval-augmented generation.

• 图增强文本生成。我们利用多种先进的大语言模型 (LLM),分类如下:i) 开源 LLM 包括 Llama 3 系列 Llama Team (2024)、Mistral $\mathrm{NeMo}^{3}$ 和 Qwen2-72b Yang et al. (2024);ii) 闭源商业 LLM 包括 Deepseek-Chat-V2、GPT4o-mini 和 Gemini-1.5-Flash,使用其 API 服务获取实证结果;iii) LLM 赋能的 RAG 系统。我们还将 GraphAgent 与 GraphRAG4 进行比较,后者通过基于图的检索增强生成来增强 LLM。

3.1.4 Evaluation Protocols

3.1.4 评估协议

We implement comprehensive and consistent training strategies across all models. We apply full fine-tuning for our model and all baseline models requiring supervised fine-tuning. For model selection, we utilize validation sets with early-stopping for predictive tasks, while monitoring training loss decreasing rate for alignment training and generative tasks. To ensure fair comparison, we maintain consistent feature encoder (all-mpnet-base-v2) across all models including GNNs and Graph LLMs. We use identical prompt templates across all LLM-based models, with GraphLLMs receiving additional graph tokens for embedding injection and basic meta type descriptions (detailed in Table 6). The iterative steps are set to 2 for discovering two-hop knowledge graphs per query prompt.

我们在所有模型中实施全面且一致的训练策略。对于我们的模型以及所有需要监督微调的基线模型,我们应用全量微调。在模型选择方面,我们利用验证集进行预测任务的早停,同时监控训练损失下降率以进行对齐训练和生成任务。为了确保公平比较,我们在所有模型(包括 GNN 和 Graph LLM)中保持一致的特性编码器(all-mpnet-base-v2)。我们在所有基于 LLM 的模型中使用相同的提示模板,GraphLLM 接收额外的图 token 用于嵌入注入和基本元类型描述(详见表 6)。每次查询提示发现两跳知识图谱的迭代步骤设置为 2。

Table 2: Zero-shot learning performance evaluation: We assess our model’s transfer capabilities by training on IMDB dataset with few-shot learning, then evaluating node classification performance on ACM dataset under zero-shot conditions, utilizing both graph structural and textual information.

表 2: 零样本学习性能评估:我们通过在 IMDB 数据集上进行少样本学习来训练模型,然后在零样本条件下利用图结构和文本信息评估 ACM 数据集上的节点分类性能,以此来评估模型的迁移能力。

指标 训练集 SAGE GAT HAN HGT HetGNN HiGPT GraphAgent 提升
Micro-F1 (%) IMDB-1 32.93±4.18 335.67±0.5334.07±1.11 32.40±0.1437.43±4.34 445.40±0.89 51.21±1.32 12.8%
Micro-F1 (%) IMDB-40 31.73±0.05 23.93±1.44 26.97±1.94 435.60±0.99 31.80±0.1650.50±0.77 74.98±1.24 48.5%
Macro-F1 (%) IMDB-1 26.47±2.69 29.08±1.31 22.50±4.16 16.31±0.05 31.39±4.68 841.77±1.24 46.82±1.43 12.1%
Macro-F1 (%) IMDB-40 31.17±0.17 21.41±0.71 23.13±1.32 27.49±1.22 231.44±0.17 45.85±0.89 74.98±1.12 63.5%
AUC (%) IMDB-1 49.34±2.475 53.18±2.95 559.69±0.82 64.10±1.25 7.4%
AUC (%) IMDB-40 48.67±0.1343.20±1.08 45.45±1.46 551.48±0.43 48.72±0.06 63.60±0.51 80.90±1.01 27.2%

For evaluation, we adopt different metrics based on task types. In graph-related predictive tasks with ground truth, we use Micro-F1 (Mi-F1), Macro-F1 (Ma-F1), and AUC metrics. For graph-enhanced generative tasks that are open-ended, we primarily rely on the PPL score using state-of-the-art models (Llama3-70b, Qwen2-72b) to measure fluency, rather than reference-based similarity metrics which can be misleading due to their limitations in text generation evaluation. Additionally, we incorporate the LLM-as-judge approach for better approximation of human judgment. This comprehensive evaluation framework ensures robust and meaningful comparison across different model architectures while addressing the limitations of conventional evaluation metrics for generative tasks.

为了评估,我们根据任务类型采用不同的指标。在有真实标签的图相关预测任务中,我们使用 Micro-F1 (Mi-F1)、Macro-F1 (Ma-F1) 和 AUC 指标。对于开放式的图增强生成任务,我们主要依赖使用最先进模型(Llama3-70b、Qwen2-72b)的 PPL 分数来衡量流畅性,而不是基于参考的相似性指标,因为这些指标在文本生成评估中存在局限性,可能会产生误导。此外,我们还引入了 LLM-as-judge 方法,以更好地近似人类判断。这种全面的评估框架确保了在不同模型架构之间进行稳健且有意义的比较,同时解决了生成任务中传统评估指标的局限性。

3.2 GRAPH PREDICTION TASK WITH EXPLICIT AND IMPLICIT GRAPH CONTEXTS (RQ1)

3.2 显式和隐式图上下文下的图预测任务 (RQ1)

We investigate GraphAgent’s performance on graph-related prediction tasks, specifically node classification with explicit graph structures. Our approach enhances existing methods by automatically incorporating a semantic knowledge graph from node text, utilizing both the semantic KG and explicit graph connections as dual sources for graph token input. Following recent works Tang et al. (2024a;b); Chen et al. (2024a), we employ a fully zero-shot evaluation framework to better assess real-world applicability. Our experimental setup involves training models on the IMDB dataset under few-shot settings (1 shot and 40 shots), then evaluating performance on 1,000 previously unseen nodes from the ACM dataset. For our method and other LLM-enhanced approaches, we incorporate Chain-of-Thought Wei et al. (2022) for inference augmentation.

我们研究了 GraphAgent 在图相关预测任务中的表现,特别是具有显式图结构的节点分类任务。我们的方法通过自动从节点文本中融入语义知识图谱 (Semantic Knowledge Graph),利用语义知识图谱和显式图连接作为图 Token 输入的双重来源,增强了现有方法。根据最近的研究 Tang et al. (2024a;b); Chen et al. (2024a),我们采用完全零样本评估框架,以更好地评估实际应用中的适用性。我们的实验设置包括在 IMDB 数据集上以少样本设置(1 样本和 40 样本)训练模型,然后在 ACM 数据集的 1,000 个未见过的节点上评估性能。对于我们的方法和其他大语言模型增强方法,我们融入了 Chain-of-Thought Wei et al. (2022) 进行推理增强。

The results summarized in Table 2 demonstrate that our agent-based approach GraphAgent significantly advances the state-of-the-art in predictive graph tasks. Specifically, GraphAgent achieves an average improvement of over $28%$ across all metrics compared to the previous state-of-the-art graph language model, HiGPT. These substantial improvements stem from the synergistic integration of several key components: a graph generation agent, an automated task planning agent, and dual fine-tuning mechanisms (graph-text alignment and agent task fine-tuning). Together, these components enable GraphAgent to excel at constructing rich semantic knowledge graphs, capturing comprehensive inter-dependencies, and understanding complex relationships in both structured and unstructured graph contexts. This architecture translates into superior performance across downstream tasks.

表 2 中总结的结果表明,我们基于 AI智能体的方法 GraphAgent 在预测图任务中显著推进了现有技术水平。具体而言,与之前最先进的图语言模型 HiGPT 相比,GraphAgent 在所有指标上平均提升了超过 $28%$。这些显著的改进源于几个关键组件的协同整合:图生成 AI智能体、自动化任务规划 AI智能体以及双重微调机制(图-文本对齐和 AI智能体任务微调)。这些组件共同使 GraphAgent 能够出色地构建丰富的语义知识图谱,捕捉全面的相互依赖性,并理解结构化和非结构化图上下文中的复杂关系。这种架构转化为下游任务的卓越性能。

3.3 GRAPH PREDICTION WITH IMPLICIT SEMANTIC INTERDEPENDENCIES (RQ2)

3.3 具有隐式语义依赖关系的图预测 (RQ2)

Figure 3: Performance comparison with state-ofthe-art LLMs on complex graph prediction tasks involving implicit semantic relationships. Results marked with * indicate statistical significance $({\mathsf{p}}{<}0.01)$ compared to the second-best performer.

图 3: 在涉及隐式语义关系的复杂图预测任务中,与最先进的大语言模型的性能比较。标记为 * 的结果表示与第二佳表现者相比具有统计显著性 $({\mathsf{p}}{<}0.01)$。

方法 模型大小 Arxiv-Papers ICLR-PeerReviews
Mi-F1 Ma-F1 AUC Mi-F1 Ma-F1 AUC
开源大语言模型
Llama3-8b 8B 0.514 0.289 0.527 0.402 0.394 0.502
Mistral-Nemo 12B 0.510 0.292 0.615 0.272 0.246 0.380
Llama3-70b 70B 0.630 0.330 0.635 0.434 0.421 0.551
Qwen2-72b 72B 0.632 0.472 0.700 0.344 0.277 0.509
基于API的商业大语言模型
Deepseek-Chat-V2 236B→→21B 30.746 0.580 0.757 0.362 0.312 0.516
GPT4o-mini 0.592 0.343 0.634 0.692* 0.592 0.591
Gemini-1.5-Flash 0.748 0.504 0.714 0.684 0.487 0.533
微调的大语言模型
Llama3-8bFinetuned 8B 0.794 0.593 0.736 0.620 0.554 0.553
GraphRAG实现
Llama3-8b+ GraphRAG 8B 0.516 0.288 0.601 0.430 0.427 0.517
Llama3-70b+GraphRAG 70B 0.603 0.324 0.623 0.308 0.296 0.401
GraphAgent-TaskExpert 8B 0.820 0.620 0.768 0.686 0.620* 0.615*
GraphAgent-General 8B 0.840* 0.621* 0.769* 0.667 0.604 0.607
GraphAgent-Zero-Shot 8B 0.739 0.512 0.701 0.538 0.531 0.563

We evaluate GraphAgent’s effectiveness on predictive tasks that require understanding complex semantic interdependencies, comparing against state-of-the-art LLMs. For these tasks, GraphAgent constructs semantic knowledge graphs (SKGs) by extracting implicit relational patterns through its dual-agent system of task planning and graph generation. The resulting SKG nodes act as semantic anchors, enriching the input representation through embedded and tokenized forms. Our empirical evaluation on Arxiv-Papers and ICLR-Peer Reviews datasets (Table 3) demonstrates GraphAgent’s capabilities across three configurations: task-specific (GraphAgent-Task Expert), comprehensive (GraphAgent-General), and zero-shot generalization (GraphAgent-Zero-Shot). Unlike conventional GNNs and GraphLLMs that require explicit graph structures, GraphAgent competes directly with leading LLMs of various scales, including fine-tuned and GraphRAG-augmented variants. The experimental results reveal three distinct advantages of our approach:

我们在需要理解复杂语义依赖关系的预测任务上评估了 GraphAgent 的有效性,并与最先进的大语言模型进行了比较。在这些任务中,GraphAgent 通过其任务规划和图生成的双智能体系统提取隐式关系模式,构建语义知识图谱 (SKG)。生成的 SKG 节点作为语义锚点,通过嵌入和 Token 化的形式丰富了输入表示。我们在 Arxiv-Papers 和 ICLR-Peer Reviews 数据集上的实证评估(表 3)展示了 GraphAgent 在三种配置下的能力:任务特定(GraphAgent-Task Expert)、全面(GraphAgent-General)和零样本泛化(GraphAgent-Zero-Shot)。与需要显式图结构的传统 GNN 和 GraphLLM 不同,GraphAgent 直接与各种规模的领先大语言模型竞争,包括微调和 GraphRAG 增强的变体。实验结果揭示了我们方法的三个显著优势:

• Superior Performance with Smaller Model Size. Despite having only 8B parameters, GraphAgent consistently outperforms larger LLMs, including Llama3-70b and Qwen2-72b, achieving a $31.9%$ improvement across all metrics on both datasets. By explicitly capturing complex interdependencies via semantic graph structures while maintaining contextual awareness across different semantic levels, GraphAgent effectively integrates both local and global information patterns. This architectural approach enables robust handling of intricate reasoning tasks, where both detailed semantic relationships and broader contextual coherence are crucial for accurate predictions.

• 小模型尺寸下的卓越性能。尽管仅有 8B 参数,GraphAgent 始终优于更大的大语言模型,包括 Llama3-70b 和 Qwen2-72b,在两个数据集的所有指标上实现了 31.9% 的提升。通过语义图结构显式捕捉复杂的相互依赖关系,同时保持跨不同语义层次的上下文感知,GraphAgent 有效地整合了局部和全局信息模式。这种架构方法使其能够稳健地处理复杂的推理任务,在这些任务中,详细的语义关系和更广泛的上下文连贯性对于准确预测至关重要。

• Robust Generalization Through Multi-task and Zero-shot Learning. GraphAgent exhibits exceptional adaptability and robust performance across different learning scenarios. The multitask variant, GraphAgent-General, demonstrates superior performance compared to task-specific models on Arxiv-Papers, showcasing enhanced comprehension and reasoning capabilities over text-graph pairs through self-constructed SKGs. While there is a modest performance trade-off on ICLR-Peer Reviews, the multi-task model maintains competitive results comparable to specialized versions. Notably, GraphAgent shows impressive zero-shot generalization: even with domain transfer challenges, our 8B model achieves performance parity with state-of-the-art LLMs like Deepseek-Chat-V2 and Gemini-1.5-Flash. These findings demonstrate how our approach of integrating semantic knowledge graphs and specialized tuning techniques can significantly enhance model capabilities through structured knowledge representation.

• 通过多任务和零样本学习实现鲁棒泛化。GraphAgent 在不同的学习场景中表现出卓越的适应性和鲁棒性能。多任务变体 GraphAgent-General 在 Arxiv-Papers 上展示了优于特定任务模型的性能,通过自构建的 SKG(语义知识图谱)增强了对文本-图对的理解和推理能力。尽管在 ICLR-Peer Reviews 上存在轻微的性能折衷,但多任务模型仍保持了与专用版本相当的竞争力。值得注意的是,GraphAgent 展示了令人印象深刻的零样本泛化能力:即使在面临领域转移挑战时,我们的 8B 模型也能与 Deepseek-Chat-V2 和 Gemini-1.5-Flash 等最先进的大语言模型达到性能持平。这些发现表明,通过整合语义知识图谱和专用调优技术,我们的方法能够通过结构化知识表示显著增强模型能力。

• Superior Performance over Vanilla SFT and GraphRAG. Comparative experiments demonstrate GraphAgent’s significant advantages over both vanilla supervise fine-tuning (SFT) LLMs and GraphRAG implementations. This performance gain can be attributed to two key factors: First, compared to vanilla supervised fine-tuning SFT LLMs, GraphAgent effectively leverages the LLM’s knowledge base through our semantic KG integration paradigm, leading to enhanced performance. Second, while GraphRAG uses the same knowledge references, GraphAgent’s graph embedding token approach provides a more efficient and consolidated knowledge representation. This not only reduces input token overhead but also helps mitigate LLM hallucination through structured knowledge encoding, ultimately resulting in more reliable and robust performance.

• 相较于普通 SFT 和 GraphRAG 的卓越性能。对比实验表明,GraphAgent 相较于普通的监督微调 (SFT) 大语言模型和 GraphRAG 实现具有显著优势。这一性能提升可归因于两个关键因素:首先,与普通的监督微调 SFT 大语言模型相比,GraphAgent 通过我们的语义知识图谱 (KG) 集成范式有效利用了大语言模型的知识库,从而提升了性能。其次,尽管 GraphRAG 使用了相同的知识参考,但 GraphAgent 的图嵌入 Token 方法提供了更高效和整合的知识表示。这不仅减少了输入 Token 的开销,还通过结构化知识编码帮助减轻了大语言模型的幻觉问题,最终实现了更可靠和稳健的性能。

3.4 GRAPH-ENHANCED TEXT GENERATION (RQ3)

3.4 图增强文本生成 (RQ3)

We evaluate GraphAgent’s performance on graph-enhanced text generation tasks using both perplexity (PPL) metrics and LLM-based assessment. Results for our evaluated text generation tasks are presented in Table 4 and Figure 6, while zero-shot generalization results on GovReport data are shown in Table 5.

我们使用困惑度 (PPL) 指标和基于大语言模型的评估方法,评估了 GraphAgent 在图增强文本生成任务中的表现。评估的文本生成任务结果如表 4 和图 6 所示,而在 GovReport 数据上的零样本泛化结果如表 5 所示。

• Enhanced Generation Quality via Lower Perplexity. Table 4 demonstrates GraphAgent’s superior performance with lower perplexity scores as compared to baselines, as validated by both Llama3-70b and Qwen2-72b. The generated content exhibits enhanced fluency and clarity compared to larger LLMs. We observe that both SFT and GraphRAG variants show performance degradation, indicating that neither simple input-output fine-tuning nor direct knowledge injection through prompts can effectively

• 通过降低困惑度提升生成质量。表 4 展示了 GraphAgent 在困惑度得分上相较于基线模型的优越性能,这一点已通过 Llama3-70b 和 Qwen2-72b 的验证。与更大的大语言模型相比,生成的内容表现出更高的流畅性和清晰度。我们观察到,无论是 SFT 还是 GraphRAG 变体,其性能均有所下降,这表明简单的输入输出微调或通过提示直接注入知识都无法有效提升生成质量。

Figure 4: Performances on ACL-EMNLP related works content generation. Light grey denotes that the score is computed with the same-family model.

图 4: ACL-EMNLP 相关工作内容生成的性能。浅灰色表示分数是用同系列模型计算的。

方法 模型大小 PPL-Llama3-70b 均值 PPL-Llama3-70b 最大值 PPL-Qwen2-72b 均值 PPL-Qwen2-72b 最大值
开源大语言模型
Llama3-8b 8B 7.016 13.061 7.491 12.787
Mistral-Nemo 12B 7.367 15.967 6.872 12.065
Llama3-70b 70B 6.168 14.436 5.877 12.897
Qwen2-72b 72B 6.043 11.675 5.325 11.302
基于API的商业大语言模型
Deepseek-Chat-V2 236B→21B 5.632 13.483 5.144 10.337
GPT4o-mini - 7.277 15.480 6.818 13.267
Gemini-1.5-Flash - 5.188 10.399 5.377 10.779
微调的大语言模型
Llama3-8b 微调 8B 7.682 19.452 7.629 18.757
GraphRAG 实现
Llama3-8b + GraphRAG 8B 7.098 18.092 6.539 14.722
Llama3-70b + GraphRAG 70B 6.590 14.827 6.135 14.163
GraphAgent-TaskExpert 8B 3.805 10.316 4.069 11.685
GraphAgent-General 8B 3.618* 8.000* 3.867* 8.775*

capture the complex reasoning patterns required for understanding intricate contextual relationships. In contrast, our approach leverages automatically constructed semantic knowledge graphs to substantially enhance the model’s reasoning and comprehension capabilities.

捕捉理解复杂上下文关系所需的复杂推理模式。相比之下,我们的方法利用自动构建的语义知识图谱,显著增强了模型的推理和理解能力。

• Superior Generation Quality via LLM-based Evaluation. To rigorously validate our model’s alignment with human preferences for the generated content, we employed the LLM-as-judge methodology Zheng et al. (2024), which demonstrates stronger correlation with human judgment compared to traditional metrics like BLEU Papineni et al. (2002) and ROUGE Lin (2004). Using GPT-4 as the judge (evaluation prompts detailed in Table 4), we compared GraphAgent against several strong baselines: Llama3-8b, Llama3-8b fine-tuned, Mistral Nemo, and Llama3-70b.

• 通过基于大语言模型的评估实现卓越生成质量。为了严格验证我们的模型与人类对生成内容偏好的对齐程度,我们采用了 Zheng 等人 (2024) 提出的 LLM-as-judge 方法,该方法相比传统的 BLEU Papineni 等人 (2002) 和 ROUGE Lin (2004) 等指标,与人类判断的相关性更强。我们使用 GPT-4 作为评判者(评估提示详见表 4),将 GraphAgent 与几个强大的基线模型进行了比较:Llama3-8b、Llama3-8b 微调版、Mistral Nemo 和 Llama3-70b。

Evaluation on 200 samples from the text generation test set (Figure 6) demonstrates GraphAgent’s superior performance: achieving $114%$ quality improvement over Llama3-8b and $45%$ over Llama3- 70b. GraphAgent generates higher quality content in $67%$ of cases compared to same-sized models and outperforms leading open-source models in $58%$ of instances, despite having only 8B parameters and requiring minimal additional input overhead. These results validate our GraphAgent’s effectiveness in leveraging semantic knowledge graphs for enhanced text generation capabilities.

在文本生成测试集的 200 个样本上的评估(图 6)展示了 GraphAgent 的卓越性能:相比 Llama3-8b 实现了 114% 的质量提升,相比 Llama3-70b 提升了 45%。GraphAgent 在 67% 的情况下生成的内容质量优于同规模模型,并且在 58% 的实例中表现优于领先的开源模型,尽管它仅有 8B 参数且需要极少的额外输入开销。这些结果验证了 GraphAgent 在利用语义知识图谱增强文本生成能力方面的有效性。

• Cross-domain Performance on Document Sum mari z ation. The effectiveness of GraphAgent extends beyond academic writing to document sum mari z ation tasks, as demonstrated in our graph-enhanced text generation evaluation on GovReport data (Table 7 shown in Appendix). Notably, without any task-specific optimization, GraphAgent exhibits strong structural reasoning abilities by generating well-organized summaries (highlighted in green ). This successful transfer of capabilities across domains underscores the model’s robust generalization poten

• 跨领域文档摘要性能。GraphAgent 的有效性不仅限于学术写作,还扩展到文档摘要任务,正如我们在 GovReport 数据上的图增强文本生成评估中所展示的那样(附录中的表 7)。值得注意的是,在没有进行任何任务特定优化的情况下,GraphAgent 通过生成结构良好的摘要(以绿色突出显示)展示了强大的结构推理能力。这种跨领域能力的成功转移突显了模型的强大泛化潜力。

Figure 5: GovReport sum mari z ation performance. Evaluation scores are presented with same-family model comparisons highlighted in light grey.

图 5: GovReport 摘要性能。评估分数以浅灰色突出显示同系列模型的比较。

方法 模型大小 PPL-Llama3-70b 均值 PPL-Llama3-70b 最大值 PPL-Qwen2-72b 均值 PPL-Qwen2-72b 最大值
Llama3-8b 8B 9.476 25.355 7.564 17.443
Mistral-Nemo 12B 9.333 28.537 7.194 19.347
Llama3-70b 70B 6.473 14.724 5.629 11.813
Qwen2-72b 72B 7.134 16.075 5.494 11.294
Deepseek-Chat-V2 236B→21B 8.246 21.176 7.311 18.092
GPT4o-mini - 10.332 23.300 6.576 10.213
Gemini-1.5-Flash - 7.374 18.408 6.133 9.237
GraphAgent-General 8B 6.736 20.362 5.936 27.196

tial. Experimental results from Table 5 shown in Appendix demonstrate GraphAgent’s competitive performance in zero-shot generative tasks with graphs. The model achieves significantly lower perplexity (PPL) scores compared to same-sized counterparts like Llama3-8b and even the larger Mistral-Nemo. Moreover, GraphAgent matches the fluency levels of leading closed-source and open-source LLMs in generating GovReport summaries. These findings suggest that our approach of automatically extracting and leveraging semantic knowledge graphs from input content, combined with diverse multi-task graph-based training, enables robust zero-shot performance.

表 5 中的实验结果(见附录)展示了 GraphAgent 在图生成任务中的零样本竞争性能。与相同规模的模型(如 Llama3-8b)甚至更大的 Mistral-Nemo 相比,该模型实现了显著更低的困惑度 (PPL) 分数。此外,GraphAgent 在生成 GovReport 摘要时,与领先的闭源和开源大语言模型的流畅度相当。这些发现表明,我们通过从输入内容中自动提取并利用语义知识图谱,结合多样化的多任务图训练,能够实现强大的零样本性能。

3.5 QUALITATIVE ANALYSIS OF GRAPH-ENHANCED TEXT GENERATION TASKS

3.5 图增强文本生成任务的定性分析

We evaluated GraphAgent against Llama3-8b and Llama3-70b on two distinct graph-enhanced text generation tasks, with results presented in Tables 8 and 7 (Appendix). The experiments demonstrate GraphAgent’s significant performance advantages over Llama3-8b while achieving comparable results to the much larger Llama3-70b. Notably, in academic writing tasks (Table 8), GraphAgent effectively leverages knowledge graphs to capture citation relationships and research development paths, producing well-organized summaries (highlighted in green ). In contrast, Llama3-8b exhibits no- table limitations in both instruction following and citation formatting accuracy (highlighted in

我们在两个不同的图增强文本生成任务上对 GraphAgent 与 Llama3-8b 和 Llama3-70b 进行了评估,结果如表 8 和表 7(附录)所示。实验表明,GraphAgent 相比 Llama3-8b 具有显著的性能优势,同时在与更大的 Llama3-70b 相比时取得了相当的结果。值得注意的是,在学术写作任务中(表 8),GraphAgent 有效地利用知识图谱捕捉引用关系和研究发展路径,生成了结构良好的摘要(以绿色高亮显示)。相比之下,Llama3-8b 在指令遵循和引用格式准确性方面表现出明显的局限性(以高亮显示)。


Figure 6: Comparative evaluation results: GPT4o as judge assessing our proposed GraphAgent framework against state-of-the-art open-source LLMs.

图 6: 对比评估结果:GPT4o 作为评判者,评估我们提出的 GraphAgent 框架与最先进的开源大语言模型。

This section presents our automatically generated semantic knowledge graphs (SKGs) through two visualized examples in Tables 9 and 10 from GovReport and Arxiv datasets. We visualize each SKG at two levels: $k=0$ hop showing high-level aspect nodes (highlighted in green ) and $k=1$ hop displaying keyword nodes (highlighted in blue ), along with augmented textual attributes for $k,=,0$ scaffold nodes. These examples demonstrate our Graph Generation Agent’s capability in extracting crucial information, revealing multi-hop relationships, and generating coherent semantic representations. For the related work text generation, while containing paper-based SKGs as multiple sub-graphs, individual papers follow similar SKG patterns to the Arxiv examples.

本节通过表 9 和表 10 中的两个可视化示例展示了我们从 GovReport 和 Arxiv 数据集中自动生成的语义知识图谱 (SKG)。我们在两个层次上对每个 SKG 进行可视化:$k=0$ 跳显示高层方面节点(以绿色高亮显示),$k=1$ 跳显示关键词节点(以蓝色高亮显示),同时为 $k,=,0$ 的支架节点增加了文本属性。这些示例展示了我们的图生成智能体在提取关键信息、揭示多跳关系以及生成连贯语义表示方面的能力。对于相关工作文本生成,虽然包含基于论文的 SKG 作为多个子图,但个别论文遵循与 Arxiv 示例类似的 SKG 模式。

Table 3: Illustrative Examples of GraphAgent’s Multi-Agent Pipeline.

表 3: GraphAgent 多智能体管道的示例

[Agent (Task Planning Agent)]

[智能体 (任务规划智能体)]

Sure! I will first decompose your request as follows for further actions: <graph source: [uploaded files]>, , : <"Can you tell me which category does the movie with the target node? Is it action, comedy or drama?">

好的!我将首先分解您的请求以便进一步操作:<图来源:[上传的文件]>,<任务类型:预测预定义>,<用户注释>:<"你能告诉我目标节点的电影属于哪个类别吗?是动作、喜剧还是剧情?">

[Agent (Task Planning Agent)]

[智能体 (任务规划智能体)]

Since pre-defined graphs are given, I will try to ground the edge and node list files into readable objects. Then I will encode the graph and index the neighbors for node ID [7] for further actions.

由于预定义的图已经给出,我将尝试将边和节点列表文件转换为可读对象。然后我将对图进行编码,并为节点 ID [7] 索引邻居,以便进一步操作。

[Function Call: $\mathrm{GBW\mathrm{_{-}T o o l(\cdot)}}\mathrm{]}\mathrm{\rightarrow}\mathrm{\mathcal{G}^{\mathrm{exp}}\mathrm{,~}}\mathbf{E}^{\mathrm{gnn}}$

函数调用:<图接地> $\mathrm{GBW\mathrm{_{-}T o o l(\cdot)}}\mathrm{]}\mathrm{\rightarrow}\mathrm{\mathcal{G}^{\mathrm{exp}}\mathrm{,~}}\mathbf{E}^{\mathrm{gnn}}$

[Agent (Graph Generation Agent)]

[智能体 (图生成智能体)]

Now the graph tokens for the pre-defined graph are ready. I will further try to discover a semantic knowledge graph for the target node and the sampled neighbors.

现在预定义图的图 Token 已准备就绪。我将进一步尝试为目标节点和采样的邻居发现一个语义知识图。

[Agent (Graph Action Agent)]

[智能体 (图动作智能体)]

Since we already have the pre-defined graph and the built SKG, I will construct an appropriate system prompt according to the user request.

由于我们已经有了预定义的图和构建的SKG,我将根据用户请求构建一个适当的系统提示。

: You are a powerful assistant in accomplishing diverse user required tasks with the help of graphs... [Agent (Graph Action Agent)]

: 你是一个强大的助手,能够借助图表完成用户所需的各种任务... [AI智能体 (Graph Action Agent)]

According to the provided information, the neighbor nodes and the semantic knowledge graph nodes, the most likely category for the movie is ... The reasoning process behind is...

根据提供的信息,邻居节点和语义知识图谱节点,电影最可能的类别是……其背后的推理过程是……

Table 3 demonstrates the multi-agent collaboration workflow using the IMDB movie classification dataset as an example. The process begins when the Task Planning Agent receives a user request, performing request decomposition and task planning, while also processing graph data in scenarios with pre-defined graphs. Next, the Graph Generation Agent constructs a task-relevant semantic knowledge graph. Finally, the Multi-Task Agent automatically constructs a system prompt based on the task type and executes task inference. This workflow highlights GraphAgent’s versatility in automatically adapting to different scenarios, whether handling pre-defined graphs or generating new ones, while maintaining consistent performance across various task types.

表 3 展示了使用 IMDB 电影分类数据集作为示例的多智能体协作工作流程。该流程始于任务规划智能体接收到用户请求时,执行请求分解和任务规划,同时在有预定义图的场景中处理图数据。接下来,图生成智能体构建与任务相关的语义知识图。最后,多任务智能体根据任务类型自动构建系统提示并执行任务推理。该工作流程突显了 GraphAgent 在自动适应不同场景时的多功能性,无论是处理预定义的图还是生成新的图,同时在不同任务类型中保持一致的性能。

3.6 ABLATION STUDY

3.6 消融研究

To evaluate each component in GraphAgent, we conducted an ablation study with the following variants: (-) SKG: Removes the graph generation agent and excludes semantic knowledge graph tokens from LLM input. (-) Alignment: Omits the graph-instruction alignment tuning described in Section 2.4.2, training directly with instruction input-output pairs. (-) Cur. Strategy: Eliminates the curriculum learning strat- egy for agent task training (Section 2.4.3), in- stead training all tasks simultaneously across all epochs. Figure 7 presents the comparative results between GraphAgent and its variants on

为了评估 GraphAgent 中的每个组件,我们进行了以下变体的消融实验:
(-) SKG:移除图生成智能体,并从大语言模型输入中排除语义知识图谱 Token。
(-) Alignment:省略第 2.4.2 节中描述的图-指令对齐微调,直接使用指令输入-输出对进行训练。
(-) Cur. Strategy:取消智能体任务训练的课程学习策略(第 2.4.3 节),改为在所有轮次中同时训练所有任务。

图 7 展示了 GraphAgent 及其变体在实验中的对比结果。

Figure 7: Ablation study comparing GraphAgent with its variants on both graph-related prediction and graph-enhanced text generation tasks.

图 7: 对比 GraphAgent 及其变体在图相关预测和图增强文本生成任务上的消融研究。

both predictive and generative tasks. Our analysis reveals two key findings:

我们的分析揭示了两个关键发现:

• For predictive tasks, semantic knowledge graphs generated by the graph generation agent show the strongest impact, as their supplementary information substantially enhances model performance. In contrast, for generative tasks, the alignment component proves crucial for maintaining high performance, likely because these tasks demand sophisticated reasoning capabilities, making alignment tuning essential for developing deeper graph-instruction understanding.

• 对于预测任务,图生成智能体生成的语义知识图谱显示出最强的影响,因为它们的补充信息显著提升了模型性能。相比之下,对于生成任务,对齐组件对于保持高性能至关重要,这可能是因为这些任务需要复杂的推理能力,使得对齐调优对于发展更深层次的图-指令理解至关重要。

• The curriculum training strategy shows consistent improvements across both task types. By enabling gradual progression from simpler predictive tasks to more complex generative ones, this approach allows the model to more effectively assimilate knowledge from various graph-instruction pairs, resulting in more robust overall performance.

• 课程训练策略在两种任务类型上均显示出持续的改进。通过从较简单的预测任务逐步过渡到更复杂的生成任务,这种方法使模型能够更有效地吸收来自各种图-指令对的知识,从而实现更稳健的整体性能。

4 RELATED WORK

4 相关工作

Graph Representation Learning enables analysis of complex relationships through specialized graph embedding techniques Chen et al. (2020); Wu et al. (2020). Graph Neural Networks serve as its foundation, capturing node dependencies through message-passing mechanisms Dwivedi et al. (2023); Huang et al. (2024a). Key architectures include Graph Convolutional Networks (GCNs)Kipf & Welling (2017); Jin et al. (2020); Wu et al. (2024), which use localized convolutions for neighbor aggregation, and Graph Attention Networks (GAT)Veli c ko vic´ et al. (2018); Zhang et al. (2022); Hao et al. (2023), which incorporate attention mechanisms to weigh neighboring nodes’ importance. In our GraphAgent, GNNs act as graph tokenizers, facilitating effective integration with LLMs.

图表示学习 (Graph Representation Learning) 通过专门的图嵌入技术实现对复杂关系的分析 Chen et al. (2020); Wu et al. (2020)。图神经网络 (Graph Neural Networks, GNNs) 是其基础,通过消息传递机制捕捉节点之间的依赖关系 Dwivedi et al. (2023); Huang et al. (2024a)。关键架构包括图卷积网络 (Graph Convolutional Networks, GCNs) Kipf & Welling (2017); Jin et al. (2020); Wu et al. (2024),它使用局部卷积进行邻居聚合;以及图注意力网络 (Graph Attention Networks, GAT) Veli c ko vic´ et al. (2018); Zhang et al. (2022); Hao et al. (2023),它结合注意力机制来加权邻居节点的重要性。在我们的 GraphAgent 中,GNNs 充当图分词器,促进与大语言模型的有效集成。

Graph Language Models. With the success of Large Language Models (LLMs), recent studies have focused on enhancing the generalization capabilities of graph models by integrating LLMs with Graph Neural Networks (GNNs) Tang et al. (2024b). For instance, GraphGPT Tang et al. (2024a) enables LLMs to understand graph structural information by combining a graph encoder with an LLM through an alignment projector. LLaGA Chen et al. (2024b) enhances LLM capabilities for graph data by reorganizing nodes into structure-aware sequences. Additionally, ZeroG Li et al. (2024) has been developed for zero-shot transfer learning in graph learning, leveraging language models to achieve effective cross-dataset generalization. However, most current graph language models primarily focus on capturing the topological information of explicit graph connections for standard representation learning tasks. In this work, we introduce a fully automated and easy-to-use agent framework that goes beyond traditional graph language models. Our framework is designed to tackle complex real-world data scenarios, which often involve both explicit relational graph connections and implicit graph-enhanced semantic dependencies. This allows us to address various downstream applications, including both graph-related predictive and text generative tasks.

图语言模型。随着大语言模型 (LLMs) 的成功,最近的研究集中在通过将大语言模型与图神经网络 (GNNs) 结合来增强图模型的泛化能力 (Tang et al., 2024b)。例如,GraphGPT (Tang et al., 2024a) 通过将图编码器与大语言模型结合,使大语言模型能够理解图结构信息。LLaGA (Chen et al., 2024b) 通过将节点重新组织为结构感知序列,增强了大语言模型对图数据的处理能力。此外,ZeroG (Li et al., 2024) 被开发用于图学习中的零样本迁移学习,利用语言模型实现跨数据集的有效泛化。然而,当前大多数图语言模型主要关注于捕捉显式图连接的拓扑信息,以用于标准的表示学习任务。在本工作中,我们引入了一个完全自动化且易于使用的智能体框架,超越了传统的图语言模型。我们的框架旨在应对复杂的现实世界数据场景,这些场景通常涉及显式关系图连接和隐式图增强的语义依赖。这使得我们能够处理各种下游应用,包括图相关的预测任务和文本生成任务。

LLM-empowered Agents. LLM-empowered agents enhance user interactions by connecting complex data with intuitive communication. They utilize LLMs to efficiently integrate diverse information, allowing them to handle a broader range of tasks Shinn et al. (2023); Xie et al. (2023). For example, language-based assistants use LLMs to combine reasoning with task-specific actions, improving performance in language understanding and decision-making Yao et al. (2023); Jimenez et al. (2024). Vision-based assistants analyze visual data with LLMs to provide contextual insights, enhancing interactions with visual information Koh et al. (2024); Hong et al. (2024). Embodied agents particularly benefit from LLMs, gaining the ability to navigate complex environments and engage meaningfully with users, which enhances their application in robotics and smart systems Brehmer et al. (2024); Huang et al. (2024b). However, a gap remains for agents that can understand relational data alongside rich textual information. This work aims to address that gap by developing an automated framework that integrates relational and textual data for various predictive and generative tasks.

LLM赋能的智能体。LLM赋能的智能体通过将复杂数据与直观的沟通方式相结合,增强了用户交互体验。它们利用大语言模型(LLM)高效整合多样化的信息,从而能够处理更广泛的任务 [Shinn et al. (2023); Xie et al. (2023)]。例如,基于语言的助手使用LLM将推理与特定任务的动作相结合,提升了语言理解和决策能力 [Yao et al. (2023); Jimenez et al. (2024)]。基于视觉的助手则利用LLM分析视觉数据,提供上下文洞察,增强了与视觉信息的交互 [Koh et al. (2024); Hong et al. (2024)]。具身智能体尤其受益于LLM,获得了在复杂环境中导航并与用户进行有意义互动的能力,从而提升了它们在机器人和智能系统中的应用 [Brehmer et al. (2024); Huang et al. (2024b)]。然而,目前仍缺乏能够同时理解关系数据和丰富文本信息的智能体。本研究旨在通过开发一个自动化框架,整合关系数据和文本数据,以应对各种预测和生成任务,填补这一空白。

5 CONCLUSION

5 结论

This work introduces a multi-agent framework that seamlessly integrates graph-based reasoning with advanced language modeling, effectively addressing complex language assistant scenarios involving both relational and textual data. The proposed GraphAgent features a dynamic pipeline that automates the understanding of graph-enhanced relational and textual semantics for both predictive and generative tasks. The framework consists of three key components: a graph generator agent that uncovers intricate semantic interdependencies, a task planning agent that interprets user queries, and a task execution agent that efficiently carries out tasks. This innovative agentic workflow enhances the adaptability of large language models to diverse datasets and significantly improves performance in benchmarking graph prediction tasks as well as in open-ended text generation tasks. In future work, we plan to extend our framework to incorporate visual information from multi-modal data, enabling it to better understand and generate content that integrates relational, textual and visual elements.

本工作介绍了一种多智能体框架,该框架将基于图的推理与先进的语言建模无缝集成,有效解决了涉及关系和文本数据的复杂语言助手场景。提出的 GraphAgent 具有一个动态管道,能够自动理解图增强的关系和文本语义,用于预测和生成任务。该框架由三个关键组件组成:一个揭示复杂语义相互依赖关系的图生成智能体,一个解释用户查询的任务规划智能体,以及一个高效执行任务的任务执行智能体。这种创新的智能体工作流程增强了大语言模型对多样化数据集的适应性,并显著提高了在图预测基准任务和开放式文本生成任务中的性能。在未来的工作中,我们计划扩展我们的框架,以整合多模态数据中的视觉信息,使其能够更好地理解和生成整合了关系、文本和视觉元素的内容。

REFERENCES

参考文献

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In ICML, pp. 41–48, 2009.

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 课程学习 (Curriculum Learning). In ICML, pp. 41–48, 2009.

Johann Brehmer, Joey Bose, Pim De Haan, and Taco S Cohen. Edgi: E qui variant diffusion for planning with embodied agents. NeurIPS, 36, 2024.

Johann Brehmer、Joey Bose、Pim De Haan 和 Taco S Cohen。Edgi:用于具身智能体规划的等变扩散。NeurIPS,36,2024。

Bo Chen, Jing Zhang, Fanjin Zhang, Tianyi Han, Yuqing Cheng, Xiaoyan Li, Yuxiao Dong, and Jie Tang. Web-scale academic name disambiguation: the whoiswho benchmark, leader board, and toolkit. In KDD, pp. 3817–3828, 2023.

Bo Chen, Jing Zhang, Fanjin Zhang, Tianyi Han, Yuqing Cheng, Xiaoyan Li, Yuxiao Dong, and Jie Tang. 网络规模的学术名称消歧:WhoisWho 基准、排行榜和工具包。在 KDD 中,第 3817-3828 页,2023。

Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. Simple and deep graph convolutional networks. In ICML, pp. 1725–1735. PMLR, 2020.

Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. 简单且深度的图卷积网络。在 ICML 中,第 1725–1735 页。PMLR, 2020.

Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, and Zhangyang Wang. Llaga: Large language and graph assistant. ICML, 2024a.

Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, 和 Zhangyang Wang. Llaga: 大语言与图助手. ICML, 2024a.

Runjin Chen, Tong Zhao, AJAY KUMAR JAISWAL, Neil Shah, and Zhangyang Wang. Llaga: Large language and graph assistant. In ICML, 2024b.

Runjin Chen, Tong Zhao, AJAY KUMAR JAISWAL, Neil Shah, 和 Zhangyang Wang. Llaga: 大语言与图助手. 在 ICML, 2024b.

Enyan Dai, Wei Jin, Hui Liu, and Suhang Wang. Towards robust graph neural networks for noisy graphs with sparse labels. In WSDM, pp. 181–191, 2022.

Enyan Dai, Wei Jin, Hui Liu, 和 Suhang Wang. 面向稀疏标签噪声图的鲁棒图神经网络. 在 WSDM, 第 181–191 页, 2022.

Vijay Prakash Dwivedi, Chaitanya K Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. Benchmarking graph neural networks. JMLR, 24(43):1–48, 2023.

Vijay Prakash Dwivedi, Chaitanya K Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, 和 Xavier Bresson. 图神经网络基准测试. JMLR, 24(43):1–48, 2023.

Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.

Matthias Fey 和 Jan Eric Lenssen. 使用 PyTorch Geometric 进行快速图表示学习. arXiv 预印本 arXiv:1903.02428, 2019.

Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, and Jure Leskovec. Relational deep learning: Graph representation learning on relational databases. arXiv preprint arXiv:2312.04615, 2023.

Matthias Fey, Weihua Hu, Kexin Huang, Jan Eric Lenssen, Rishabh Ranjan, Joshua Robinson, Rex Ying, Jiaxuan You, 和 Jure Leskovec. 关系深度学习: 关系数据库上的图表示学习. arXiv 预印本 arXiv:2312.04615, 2023.

Xinyu Fu, Jiani Zhang, Ziqiao Meng, and Irwin King. Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In Proceedings of the web conference 2020, pp. 2331–2341, 2020.

Xinyu Fu, Jiani Zhang, Ziqiao Meng, 和 Irwin King. Magnn: 基于元路径聚合的异构图神经网络用于异构图嵌入. 在《2020年网络会议论文集》中, 第2331–2341页, 2020.

William L Hamilton. Graph representation learning. Morgan & Claypool Publishers, 2020.

William L Hamilton. 图表示学习. Morgan & Claypool Publishers, 2020.

William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NeurIPS, pp. 1024–1034, 2017.

William L. Hamilton, Zhitao Ying, 和 Jure Leskovec. 大图上的归纳表示学习. 在 NeurIPS, 第 1024–1034 页, 2017.

Qianyue Hao, Wenzhen Huang, Tao Feng, Jian Yuan, and Yong Li. Gat-mf: Graph attention mean field for very large scale multi-agent reinforcement learning. In KDD, pp. 685–697, 2023.

Qianyue Hao, Wenzhen Huang, Tao Feng, Jian Yuan, 和 Yong Li. Gat-mf: 用于超大规模多智能体强化学习的图注意力均值场。在 KDD 中,第 685–697 页,2023。

Xiaoxin He, Xavier Bresson, Thomas Laurent, Adam Perold, Yann LeCun, and Bryan Hooi. Harnessing explanations: Llm-to-lm interpreter for enhanced text-attributed graph representation learning. arXiv preprint arXiv:2305.19523, 2023.

Xiaoxin He, Xavier Bresson, Thomas Laurent, Adam Perold, Yann LeCun, and Bryan Hooi. 利用解释:LLM-to-LM 解释器用于增强文本属性图表示学习。arXiv 预印本 arXiv:2305.19523, 2023.

Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, et al. Cogagent: A visual language model for gui agents. In CVPR, pp. 14281–14290, 2024.

Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, 等. Cogagent: 用于 GUI 智能体的视觉语言模型. 在 CVPR 中, 第 14281–14290 页, 2024.

Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. Heterogeneous graph transformer. In WWW, pp. 2704–2710. ACM / IW3C2, 2020.

Ziniu Hu, Yuxiao Dong, Kuansan Wang, 和 Yizhou Sun. 异质图Transformer. 在 WWW, 第 2704–2710 页. ACM / IW3C2, 2020.

Kexin Huang, Ying Jin, Emmanuel Candes, and Jure Leskovec. Uncertainty quant if i cation over graph with con formalized graph neural networks. NeurIPS, 36, 2024a.

Kexin Huang, Ying Jin, Emmanuel Candes, 和 Jure Leskovec. 基于图神经网络的图不确定性量化. NeurIPS, 36, 2024a.

Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, et al. Grounded decoding: Guiding text generation with grounded models for embodied agents. NeurIPS, 36, 2024b.

Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, 等. 基于基础模型的解码:为具身智能体引导文本生成. NeurIPS, 36, 2024b.

A APPENDIX

A 附录

A.1 DETAILED IMPLEMENTATION OF GRAPHAGENT

A.1 GRAPHAGENT 的详细实现

To ensure reproducibility of our experimental results, we provide comprehensive implementation details and technical specifications of our GraphAgent framework in this section.

为确保实验结果的复现性,我们在本节中提供了 GraphAgent 框架的全面实现细节和技术规格。

A.1.1 System prompts of GraphAgent

A.1.1 GraphAgent 的系统提示

Tables 6 and 4 present the comprehensive system prompts used in our framework. Specifically, Table 6 details the system prompts for the three core components: the Task Planning Agent, the Graph Generation Agent, and the task-specific prompt builders for the Multi-Task Agent. Additionally, Table 4 outlines the system prompts employed in our LLM-as-judge evaluation protocol.

表 6 和表 4 展示了我们框架中使用的完整系统提示。具体来说,表 6 详细说明了三个核心组件的系统提示:任务规划 AI智能体 (Task Planning Agent)、图生成 AI智能体 (Graph Generation Agent) 以及多任务 AI智能体 (Multi-Task Agent) 的任务特定提示构建器。此外,表 4 概述了我们在 LLM-as-judge 评估协议中使用的系统提示。

Table 4: System Prompts for LLM-based Performance Evaluation.

表 4: 基于大语言模型的性能评估系统提示。

You are a professional researcher in computer science, AI. You are good at reading and judging written drafts of research papers. Now, your task is to judge between two paragraphs of "related work" section of the same topic. You have to decide which paragraph is written in a better way in terms of the following criteria: 1. It should strictly cover all the references provided. It is not acceptable if only part of the references is cited. 2. It is encouraged that the written content discusses how the related works differ from each other. 3. It is encouraged that detailed technical information is discussed for each reference. 4. It is encouraged that concise, neutral, and objective language is used. Here are the two paragraphs with the same topic : A:; B:<content $b>$ Directly give your answers as “‘A is better“‘, “‘B is better“‘ or “‘On par“‘(Use this very sparingly). Then, give a very short reasoning and reflection on why you think so.

你是一位计算机科学和人工智能领域的专业研究员,擅长阅读和评判研究论文的书面草稿。现在,你的任务是评判同一主题下“相关工作”部分的两段文字。你需要根据以下标准决定哪一段写得更好:

  1. 它应严格涵盖所有提供的参考文献。如果只引用了部分文献,则不可接受。
  2. 鼓励在写作内容中讨论相关工作的差异。
  3. 鼓励为每篇参考文献讨论详细的技术信息。
  4. 鼓励使用简洁、中立和客观的语言。

以下是同一主题 下的两段文字:
A: ; B:

直接给出你的答案:“‘A 更好’”、“‘B 更好’”或“‘相当’”(请谨慎使用此选项)。然后,简要说明你为何如此认为。

A.1.2 Curriculum training strategy

A.1.2 课程训练策略

We employ a curriculum learning strategy to effectively train our graph language model for multi-task scenarios. As shown in Table 8, the training process begins with fundamental tasks to establish basic graph-language understanding, then progressively introduces more challenging components - advancing from predictive tasks to generative tasks. This graduated approach

我们采用课程学习策略来有效训练我们的图语言模型以应对多任务场景。如表 8 所示,训练过程从基础任务开始,以建立基本的图-语言理解,然后逐步引入更具挑战性的组件——从预测任务推进到生成任务。这种渐进式方法

Figure 8: Curriculum Learning Strategy: Training Epochs and Data Mixing Ratios.

ensures robust model development and optimal performance across diverse task requirements.

图 8: 课程学习策略:训练周期和数据混合比例。

对齐数据 (Alignment Data) 预测数据 (Predictive Data) 生成数据 (Generative Data)
周期1 (Epoch1) 10% 70% 20%
周期2 (Epoch2) 5% 60% 35%
周期3 (Epoch3) 0% 50% 50%
之后 (Afterwards) 0% 40% 60%

确保模型在不同任务需求下的稳健发展和最佳性能。

System prompt for intent and task parsing [xsystem_tp]

意图和任务解析的系统提示 [xsystem_tp]

You are very powerful assistant for graph-related tasks for diverse user inputs. You can do great in parsing the following important properties from the user input: 1. "graph source". This is either the uploaded file paths if the user uploads pre-defined graph for the task, or the user input contents as texts or documents that contain knowledge. 2. "graph task". the graph task type to handle, must be one of "predictive predefined", "predictive wild", "open generation". You should infer the graph task to handle from the user input. 3. "user annotations". Any additional information the user provided in the query prompt. Could be task description, label candidates or specific requirements. You are provided with two realistic examples to help you excel in the task: .

你是处理多样化用户输入的图相关任务的强大助手。你能够很好地从用户输入中解析以下重要属性:1. "图源"。这要么是用户为任务上传的预定义图的上传文件路径,要么是包含知识的用户输入内容,如文本或文档。2. "图任务"。要处理的图任务类型,必须是"预测预定义"、"预测未知"或"开放生成"之一。你应该从用户输入中推断出要处理的图任务。3. "用户注释"。用户在查询提示中提供的任何额外信息。可能是任务描述、标签候选或特定要求。你提供了两个现实示例来帮助你出色完成任务:<少样本示例>。

System prompt for scaffold knowledge node extraction at the 0-th step of the Graph Generation Agent [x system sk 0]

图生成智能体第0步的支架知识节点提取系统提示 [x system sk 0]

You are very powerful assistant for graph-related tasks for diverse user inputs. You can do great in detecting and extracting the important scaffold nodes from the user input. A list of scaffold nodes reflect the top-level concepts or entities in the content, that are useful to form a knowledge graph for the content. You should carefully examine the input content to decide your extraction strategy. 1. For a general long document of a certain scenario, consider several most high-level aspects that are useful to grasp the key concepts in the document. Do not propose too specific concepts as scaffold nodes. It is very vital to be general and be abstract in your proposed scaffold nodes. 2. For inputs that are more formatted and contain specific entities, relationships, or concepts, you can directly adopt the key entities or concepts listed in the input as scaffold nodes. In this case, it is essential to accurately concentrate on the high-level formatted concepts or entities. For your output, use auto-increment ids to number the scaffold nodes, and infer the general type for each. You are provided with several examples to help you excel in the task: .

你是处理多样化用户输入的图相关任务的强大助手。你能够出色地从用户输入中检测和提取重要的骨架节点。骨架节点列表反映了内容中的顶级概念或实体,这些对于构建内容的知识图谱非常有用。你应该仔细检查输入内容以决定提取策略。1. 对于某个场景下的通用长文档,考虑几个最有用的高层方面,以把握文档中的关键概念。不要提出过于具体的概念作为骨架节点。在你提出的骨架节点中,保持通用性和抽象性非常重要。2. 对于格式更规范且包含特定实体、关系或概念的输入,你可以直接采用输入中列出的关键实体或概念作为骨架节点。在这种情况下,准确关注高层格式化的概念或实体至关重要。对于你的输出,使用自动递增的ID来编号骨架节点,并推断每个节点的通用类型。你提供了几个示例来帮助你出色完成任务:<少样本示例>。

System prompt for scaffold knowledge node extraction after the 0-th step of the Graph Generation Agent [x system sk 1]

图生成智能体第0步后的支架知识节点提取系统提示 [x system sk 1]

You are very powerful assistant for graph-related tasks for diverse user inputs. You can do great in detecting and extracting the important scaffold nodes from the user input. A list of scaffold nodes should be informative and representative of the key points in the text, that are useful to form a knowledge graph for the content. You should carefully examine the input content to decide your extraction strategy. You also need to provide a description of the extracted keywords for each scaffold node. The description should be detailed and informative, and can contain two parts: 1) a brief description of the keywords based on the contexts in the text, and 2) a detailed description of the keywords based on your own knowledge. You are provided with several examples to help you excel in the task: .

你是处理多样化用户输入的图相关任务的强大助手。你能够出色地检测和提取用户输入中的重要骨架节点。骨架节点列表应具有信息性,并能代表文本中的关键点,这些关键点有助于为内容构建知识图谱。你需要仔细检查输入内容以决定提取策略。你还需要为每个骨架节点提供提取关键词的描述。描述应详细且信息丰富,可以包含两部分:1) 基于文本上下文的简要描述,以及 2) 基于你自身知识的详细描述。你提供了几个示例来帮助你出色完成任务:<少样本示例>。

System prompt for knowledge description augmentation of the Graph Generation Agent [xsystem_ka]

图生成智能体知识描述增强的系统提示 [xsystem_ka]

You are a powerful assistant in generating information textual descriptions for a list of scaffold nodes. Each scaffold node represents a high-level key point or topic in the text, and your goal is to provide comprehensive and detailed texts related to each scaffold node. The texts can be from your own knowledge base with references to the original input content. Texts should be detailed and you should never miss any important information. You can never miss any node in the input. You should parse corresponding texts for each scaffold node in the input. You should always return the same number of scaffold nodes as the input. You are provided with several examples to help you excel in the task: .

你是一位强大的助手,能够为一系列脚手架节点生成信息文本描述。每个脚手架节点代表文本中的一个高层次关键点或主题,你的目标是为每个脚手架节点提供全面且详细的文本。这些文本可以来自你自己的知识库,并参考原始输入内容。文本应详细,且绝不能遗漏任何重要信息。你绝不能遗漏输入中的任何节点。你应为输入中的每个脚手架节点解析相应的文本。你应始终返回与输入相同数量的脚手架节点。你提供了几个示例来帮助你更好地完成任务:<少样本示例>。

System prompt builder template for graph multi task agent [xsystem_ka]

图多任务AI智能体的系统提示构建模板 [xsystem_ka]

You are a powerful assistant in accomplishing diverse user required tasks with the help of structured knowledge as graphs. The current user requested task is of type: $<\mathbf{t}{\mathrm{user}}>$ . The detailed request or provided information is: $<!\mathbf{x}{\mathrm{user_ann}}$ , $\mathbf{g}{s}\mathrm{>}$ . [If predictive in the wild or open generation:] For the required task, a heterogeneous knowledge graph is built to assist you as useful and informative knowledge references. There are <num. of meta types> types of nodes and edges in the graph, separately: . The graph tokens for each type are: [: ]. [If predictive with pre-defined graphs:] For the required task, a pre-defined heterogeneous graph is provided as information reference. There are <num. of meta types> types of nodes and edges in the graph, separately: . The graph tokens for each type are: [: ]. Additionally, a heterogeneous knowledge graph is also constructed to augment your knowledge for the task. There are <num. of meta types> types of nodes and edges in the graph, separately: . The graph tokens for each type are: [: ]. Please generate response that satisfies the user’s request. $<!{\bf x}{\mathrm{user_ann}}!>$ . Provide concise reasoning if the task involves certain prediction.

你是一位强大的助手,能够借助结构化知识(如图谱)完成用户多样化的任务需求。当前用户请求的任务类型为:$<\mathbf{t}{\mathrm{user}}>$。详细请求或提供的信息为:$<!\mathbf{x}{\mathrm{user_ann}}$,$\mathbf{g}{s}\mathrm{>}$。[如果是预测或开放生成任务:] 对于所需任务,构建了一个异质知识图谱作为有用且信息丰富的知识参考。图谱中共有<num. of meta types>种节点和边的类型,分别为:。每种类型的图谱Token为:[: ]。[如果是基于预定义图谱的预测任务:] 对于所需任务,提供了一个预定义的异质图谱作为信息参考。图谱中共有<num. of meta types>种节点和边的类型,分别为:。每种类型的图谱Token为:[: ]。此外,还构建了一个异质知识图谱以增强你对该任务的知识。图谱中共有<num. of meta types>种节点和边的类型,分别为:。每种类型的图谱Token为:[: ]。请生成满足用户请求的响应。$<!{\bf x}{\mathrm{user_ann}}!>$。如果任务涉及某些预测,请提供简洁的推理。

System prompt for intent and task parsing [xsystem_tp]

意图和任务解析的系统提示 [xsystem_tp]

You are very powerful assistant for graph-related tasks for diverse user inputs. You can do great in parsing the following important properties from the user input: 1. "graph source". This is either the uploaded file paths if the user uploads pre-defined graph for the task, or the user input contents as texts or documents that contain knowledge. 2. "graph task". the graph task type to handle, must be one of "predictive predefined", "predictive wild", "open generation". You should infer the graph task to handle from the user input. 3. "user annotations". Any additional information the user provided in the query prompt. Could be task description, label candidates or specific requirements. You are provided with two realistic examples to help you excel in the task: .

你是处理图形相关任务的强大助手,能够应对多样化的用户输入。你可以出色地从用户输入中解析以下重要属性:1. "图形来源"。这可以是用户为任务上传的预定义图形的文件路径,也可以是包含知识的用户输入内容,如文本或文档。2. "图形任务"。要处理的图形任务类型,必须是"预测预定义"、"预测未知"或"开放生成"中的一种。你应该从用户输入中推断出要处理的图形任务。3. "用户注释"。用户在查询提示中提供的任何额外信息。可能是任务描述、标签候选或特定要求。你提供了两个现实示例来帮助你出色完成任务:<少样本示例>。

System prompt for scaffold knowledge node extraction at the 0-th step of the Graph Generation Agent [x system sk 0]

图生成智能体第0步的支架知识节点提取系统提示 [x system sk 0]

You are very powerful assistant for graph-related tasks for diverse user inputs. You can do great in detecting and extracting the important scaffold nodes from the user input. A list of scaffold nodes reflect the top-level concepts or entities in the content, that are useful to form a knowledge graph for the content. You should carefully examine the input content to decide your extraction strategy. 1. For a general long document of a certain scenario, consider several most high-level aspects that are useful to grasp the key concepts in the document. Do not propose too specific concepts as scaffold nodes. It is very vital to be general and be abstract in your proposed scaffold nodes. 2. For inputs that are more formatted and contain specific entities, relationships, or concepts, you can directly adopt the key entities or concepts listed in the input as scaffold nodes. In this case, it is essential to accurately concentrate on the high-level formatted concepts or entities. For your output, use auto-increment ids to number the scaffold nodes, and infer the general type for each. You are provided with several examples to help you excel in the task: .

你是处理图形相关任务的强大助手,能够应对多样化的用户输入。你擅长从用户输入中检测和提取重要的骨架节点。骨架节点列表反映了内容中的顶层概念或实体,这些对于构建内容的知识图谱非常有用。你应该仔细检查输入内容,以决定提取策略。1. 对于某个场景中的一般长文档,考虑几个最有用的高层方面,以抓住文档中的关键概念。不要提出过于具体的概念作为骨架节点。在你提出的骨架节点中,保持概括性和抽象性至关重要。2. 对于格式更规范且包含特定实体、关系或概念的输入,你可以直接采用输入中列出的关键实体或概念作为骨架节点。在这种情况下,准确关注高层格式化的概念或实体至关重要。对于你的输出,使用自增ID为骨架节点编号,并推断每个节点的通用类型。你提供了几个示例来帮助你出色完成任务:<少样本示例>。

System prompt for scaffold knowledge node extraction after the 0-th step of the Graph Generation Agent [x system sk 1]

图生成智能体第0步后的支架知识节点提取系统提示 [x system sk 1]

You are very powerful assistant for graph-related tasks for diverse user inputs. You can do great in detecting and extracting the important scaffold nodes from the user input. A list of scaffold nodes should be informative and representative of the key points in the text, that are useful to form a knowledge graph for the content. You should carefully examine the input content to decide your extraction strategy. You also need to provide a description of the extracted keywords for each scaffold node. The description should be detailed and informative, and can contain two parts: 1) a brief description of the keywords based on the contexts in the text, and 2) a detailed description of the keywords based on your own knowledge. You are provided with several examples to help you excel in the task: .

你是处理多样化用户输入的图相关任务的强大助手。你能够出色地检测和提取用户输入中的重要骨架节点。骨架节点列表应具有信息性,并能代表文本中的关键点,这些关键点对于构建内容的知识图谱非常有用。你应该仔细检查输入内容,以决定提取策略。你还需要为每个骨架节点提供提取关键词的描述。描述应详细且信息丰富,可以包含两部分:1) 基于文本上下文的简要描述,2) 基于你自身知识的详细描述。你提供了几个示例来帮助你出色完成任务:<少样本示例>。

System prompt for knowledge description augmentation of the Graph Generation Agent [xsystem_ka]

图生成智能体知识描述增强的系统提示 [xsystem_ka]

You are a powerful assistant in generating information textual descriptions for a list of scaffold nodes. Each scaffold node represents a high-level key point or topic in the text, and your goal is to provide comprehensive and detailed texts related to each scaffold node. The texts can be from your own knowledge base with references to the original input content. Texts should be detailed and you should never miss any important information. You can never miss any node in the input. You should parse corresponding texts for each scaffold node in the input. You should always return the same number of scaffold nodes as the input. You are provided with several examples to help you excel in the task: .

你是一个强大的助手,能够为一系列脚手架节点生成信息文本描述。每个脚手架节点代表文本中的一个高层次关键点或主题,你的目标是提供与每个脚手架节点相关的全面且详细的文本。这些文本可以来自你自己的知识库,并参考原始输入内容。文本应详细,你绝不能遗漏任何重要信息。你绝不能遗漏输入中的任何节点。你应为输入中的每个脚手架节点解析相应的文本。你应始终返回与输入相同数量的脚手架节点。你提供了几个示例来帮助你出色完成任务:<少样本示例>。

System prompt builder template for graph multi task agent [xsystem_ka]

图多任务智能体的系统提示构建模板 [xsystem_ka]

You are a powerful assistant in accomplishing diverse user required tasks with the help of structured knowledge as graphs. The current user requested task is of type: $<\mathbf{t}{\mathrm{user}}>$ . The detailed request or provided information is: $<!\mathbf{x}{\mathrm{user_ann}}$ , $\mathbf{g}{s}\mathrm{>}$ . [If predictive in the wild or open generation:] For the required task, a heterogeneous knowledge graph is built to assist you as useful and informative knowledge references. There are <num. of meta types> types of nodes and edges in the graph, separately: . The graph tokens for each type are: [: ]. [If predictive with pre-defined graphs:] For the required task, a pre-defined heterogeneous graph is provided as information reference. There are <num. of meta types> types of nodes and edges in the graph, separately: . The graph tokens for each type are: [: ]. Additionally, a heterogeneous knowledge graph is also constructed to augment your knowledge for the task. There are <num. of meta types> types of nodes and edges in the graph, separately: . The graph tokens for each type are: [: ]. Please generate response that satisfies the user’s request. $<!{\bf x}{\mathrm{user_ann}}!>$ . Provide concise reasoning if the task involves certain prediction.

你是一位强大的助手,能够借助结构化知识图谱完成用户所需的各种任务。当前用户请求的任务类型为:$<\mathbf{t}{\mathrm{user}}>$。详细请求或提供的信息为:$<!\mathbf{x}{\mathrm{user_ann}}$,$\mathbf{g}{s}\mathrm{>}$。[如果是预测或开放生成任务:] 对于所需任务,构建了一个异质知识图谱作为有用且信息丰富的知识参考。图谱中有<num. of meta types>种类型的节点和边,分别为:。每种类型的图谱Token为:[: ]。[如果是使用预定义图谱的预测任务:] 对于所需任务,提供了一个预定义的异质图谱作为信息参考。图谱中有<num. of meta types>种类型的节点和边,分别为:。每种类型的图谱Token为:[: ]。此外,还构建了一个异质知识图谱以增强你对该任务的知识。图谱中有<num. of meta types>种类型的节点和边,分别为:。每种类型的图谱Token为:[: ]。请生成满足用户请求的响应。$<!{\bf x}{\mathrm{user_ann}}!>$。如果任务涉及某些预测,请提供简明的推理。

[User]

[用户]

Here is a government report concerning specific topics. Your task is to write a paragraph that highly condenses the content without losing any important information.

以下是一份关于特定主题的政府报告。你的任务是撰写一段高度浓缩内容且不丢失任何重要信息的段落。

Consistent with the discretion afforded by the APA, Regulations.gov and agency-specific comment websites use required and optional fields on comment forms to collect some identity information from commenters. In addition to the text of the comment, agencies may choose to collect identity information by requiring commenters to fill in other fields, such as name, address, and email address before they are able to submit a comment. Regardless of the fields required by the comment form, the selected agencies all accept anonymous comments in practice. Specifically, in the comment forms on Regulations.gov and agency- specific comment websites, a commenter can submit under a fictitious name, such as Änonymous Anonymous,ënter a single letter in each required field, or provide a fabricated address. In each of these scenarios, as long as a character or characters are entered into the required fields, the comment will be accepted. Further, because the APA does not require agencies to authenticate submitted identity information, neither Regulations.gov nor the agency-specific comment websites contain mechanisms to check the validity of identity information that commenters submit through comment forms. Regulations.gov and agency-specific comment websites also collect some information about public users’ interaction with their websites through application event logs and proxy server logs, though the APA does not require agencies to collect or verify it as part of the rulemaking process. This information, which can include a public user’s Internet Protocol (IP) address, browser type and operating system, and the time and date of webpage visits, is collected separately from the comment submission process as part of routine information technology management for system security and performance, and cannot be reliably connected to specific comments. Seven of the 10 selected agencies have documented some internal guidance associated with the identity of commenters during the three phases of the public comment process: intake, analysis, and response to comments. However, the focus and substance of this guidance varies by agency and phase of the comment process. As shown in Table 1, for selected agencies that have guidance associated with the identity of commenters, it most frequently relates to the comment intake or response to comment phases of the public comment process. The guidance for these phases addresses activities such as managing duplicate comments (those with identical or near-identical comment text but varied identity information) or referring to commenters in a final rule...

与《行政程序法》(APA) 所赋予的自由裁量权一致,Regulations.gov 和各机构特定的评论网站通过在评论表单中使用必填和可选字段来收集评论者的一些身份信息。除了评论内容外,机构还可以选择通过要求评论者在提交评论前填写其他字段(如姓名、地址和电子邮件地址)来收集身份信息。无论评论表单要求填写哪些字段,所选机构在实践中都接受匿名评论。具体而言,在 Regulations.gov 和各机构特定的评论网站上的评论表单中,评论者可以使用虚构的姓名(如“Änonymous Anonymous”),在每个必填字段中输入单个字母,或提供虚假地址。在这些情况下,只要在必填字段中输入了字符,评论就会被接受。此外,由于 APA 不要求机构验证提交的身份信息,Regulations.gov 和各机构特定的评论网站也没有机制来检查评论者通过评论表单提交的身份信息的有效性。

Regulations.gov 和各机构特定的评论网站还通过应用程序事件日志和代理服务器日志收集有关公众用户与其网站互动的一些信息,尽管 APA 不要求机构在规则制定过程中收集或验证这些信息。这些信息可能包括公众用户的互联网协议 (IP) 地址、浏览器类型和操作系统,以及网页访问的时间和日期,它们与评论提交过程分开收集,作为系统安全和性能的常规信息技术管理的一部分,无法可靠地与特定评论关联起来。

在公众评论过程的三个阶段(接收、分析和回应评论)中,10 个所选机构中有 7 个记录了与评论者身份相关的一些内部指导。然而,这些指导的重点和内容因机构和评论过程阶段而异。如表 1 所示,对于有与评论者身份相关指导的所选机构,这些指导通常与公众评论过程的评论接收或回应评论阶段相关。这些阶段的指导涉及管理重复评论(具有相同或几乎相同的评论文本但身份信息不同)或在最终规则中引用评论者等活动...

[GraphAgent]

[GraphAgent]

The report examines how government agencies manage and disclose identity information associated with public comments on proposed regulations. It finds that while agencies are not required to collect or verify identity information, they do collect some information from commenters and accept anonymous comments. However, the report notes inconsistencies in how agencies post identity information, particularly with duplicate comments. It also highlights a lack of transparency in disclosing these practices to the public, potentially hindering meaningful participation in the rulemaking process. The report recommends that agencies establish clear policies for posting comments and communicate these practices to the public more effectively. Since issuing the report , some agencies have begun implementing these recommendations, but others have not yet done so.

报告审查了政府机构如何管理和披露与拟议法规的公众评论相关的身份信息。报告发现,虽然机构不需要收集或验证身份信息,但它们确实从评论者那里收集了一些信息,并接受匿名评论。然而,报告指出,机构在发布身份信息时存在不一致之处,特别是在处理重复评论时。报告还强调,这些做法在向公众披露时缺乏透明度,可能会阻碍公众有意义地参与规则制定过程。报告建议机构制定明确的评论发布政策,并更有效地向公众传达这些做法。自报告发布以来,一些机构已开始实施这些建议,但其他机构尚未采取行动。

[Llama3-8b]

[Llama3-8b]

The Government Accountability Office (GAO) report examines the treatment of identity information by 10 federal agencies during the public comment process. While agencies are not required to collect or verify identity information, they may choose to do so through comment forms. However, the report finds that agencies vary in their treatment of identity information, with some accepting anonymous comments and others requiring commenters to provide name, address, and email address. The report also notes that agencies do not consistently post duplicate comments, which can lead to inconsistent presentation of identity information to the public. Additionally, agencies do not provide clear information about their posting practices, which can limit public users’ ability to explore and use the data. The GAO recommends that agencies establish policies for posting comments and provide clear communication to the public about their posting practices, particularly with regard to identity information and duplicate comments. Since the report’s issuance, some agencies have taken steps to address these recommendations.

政府问责办公室 (GAO) 的报告审查了 10 个联邦机构在公众评论过程中对身份信息的处理方式。虽然机构不需要收集或验证身份信息,但它们可以通过评论表单选择这样做。然而,报告发现,各机构对身份信息的处理方式各不相同,有些机构接受匿名评论,而其他机构则要求评论者提供姓名、地址和电子邮件地址。报告还指出,各机构并不一致地发布重复评论,这可能导致向公众展示的身份信息不一致。此外,各机构没有提供关于其发布做法的明确信息,这可能会限制公众用户探索和使用数据的能力。GAO 建议各机构制定发布评论的政策,并向公众明确传达其发布做法,特别是在身份信息和重复评论方面。自报告发布以来,一些机构已采取措施解决这些建议。

[Llama3-70b]

[Llama3-70b]

The government’s public comment process, facilitated by Regulations.gov and agency-specific websites, collects identity information from commenters, but allows anonymous comments and does not verify submitted information. While agencies may require certain fields to be filled, commenters can submit fictitious information or minimal data, and comments will still be accepted. The treatment of identity information varies across agencies, with some posting all comments and others posting only a sample of duplicate comments, which can lead to inconsistent presentation of identity information to the public. Furthermore, agencies do not effectively communicate their posting practices and limitations to the public, potentially leading to inaccurate conclusions about the comments submitted. To address this, recommendations have been made to establish poli2c0ies for posting comments and clearly communicate them to the public, with some agencies having taken action to implement these changes.

政府通过Regulations.gov和各机构特定网站进行的公众意见征集过程,会收集评论者的身份信息,但允许匿名评论且不验证提交的信息。虽然机构可能要求填写某些字段,但评论者可以提交虚构信息或最少数据,评论仍会被接受。身份信息的处理方式因机构而异,有些机构会发布所有评论,而有些机构只发布重复评论的样本,这可能导致公众看到的身份信息呈现不一致。此外,机构没有有效地向公众传达其发布实践和限制,可能导致对提交评论的误解。为解决这一问题,已提出建议,要求制定发布评论的政策并明确向公众传达,一些机构已采取行动实施这些改变。

[User]

[用户]

I am writing a "related works" section for my research paper in the NLP domain. The topic to cover is "Dense Passage Retrieval". I would like to discuss the following papers in the section: Title: Representation Degeneration Problem in Training Natural Language Generation Models. Abstract:

我正在为我的自然语言处理(NLP)领域的研究论文撰写“相关工作”部分。该部分将涵盖“密集段落检索”主题。我计划在该部分讨论以下论文:

标题:训练自然语言生成模型中的表示退化问题

Further, Use $\mathcal{@\mathbf{C}}\mathrm{ITE}[\mathrm{id}]\mathcal{@}$ to cite a paper in the paragraph. For example, $\mathcal{@\mathbf{CITE}}[1]\mathcal{@}$ will be replaced by the citation of the paper with id 1. At the end of the generated content, list all the references in the format of ID: Title.

此外,使用 $\mathcal{@\mathbf{C}}\mathrm{ITE}[\mathrm{id}]\mathcal{@}$ 在段落中引用论文。例如,$\mathcal{@\mathbf{CITE}}[1]\mathcal{@}$ 将被替换为 ID 为 1 的论文引用。在生成内容的末尾,以 ID: 标题的格式列出所有参考文献。

[GraphAgent ]

[GraphAgent]

Dense passage retrieval (DPR) has recently attracted a lot of attention in the IR community, which aims to retrieve relevant passages from a large corpus with the help of a query. DPR is based on the neural network, which can learn query and passage representations from the corpus and query separately. The retrieval model is usually trained by contrastive loss, which is to pull the query representation close to the passage representation when they are relevant, and push them apart when they are irrelevant. In the early stage , DPR is mainly based on the BERT model $\mathcal{Q}\mathrm{CITE}[4]\mathcal{@}$ , which is pre-trained by masked language modeling. However, BERT is prone to degenerate into a linear mapping to the word embeddings $\scriptscriptstyle{\mathcal{Q}}\mathrm{CITE}[5]\circledast$ , which leads to a performance drop. To alleviate this problem , several methods are proposed to train DPR model. For example , $\mathcal{Q}\mathrm{CITE}[3]\ @$ propose the use of the negative sampling strategy, which is to sample a negative passage from the corpus to calculate the contrastive loss. $\mathcal{@\mathbf{CITE}[1]}\mathcal{@}$ propose the use of the nearest neighbor negative sampling strategy, which is to sample a negative passage from the corpus with the nearest distance to the query. $\mathcal{Q}\mathrm{CITE}[2]\mathcal{@}$ propose the use of the contrastive loss with the corpus-level information, which is to pull the passage representation close to the corpus-level representation when they are relevant.

密集段落检索 (Dense Passage Retrieval, DPR) 最近在信息检索 (IR) 社区中引起了广泛关注,其目标是通过查询从大规模语料库中检索相关段落。DPR 基于神经网络,能够分别从语料库和查询中学习查询和段落的表示。检索模型通常通过对比损失进行训练,即在相关时将查询表示拉近段落表示,在不相关时将它们推远。在早期阶段,DPR 主要基于 BERT 模型 $\mathcal{Q}\mathrm{CITE}[4]\mathcal{@}$,该模型通过掩码语言建模进行预训练。然而,BERT 容易退化为词嵌入的线性映射 $\scriptscriptstyle{\mathcal{Q}}\mathrm{CITE}[5]\circledast$,从而导致性能下降。为了缓解这一问题,提出了几种训练 DPR 模型的方法。例如,$\mathcal{Q}\mathrm{CITE}[3]\ @$ 提出了使用负采样策略,即从语料库中采样一个负样本来计算对比损失。$\mathcal{@\mathbf{CITE}[1]}\mathcal{@}$ 提出了使用最近邻负采样策略,即从语料库中采样与查询距离最近的负样本。$\mathcal{Q}\mathrm{CITE}[2]\mathcal{@}$ 提出了使用包含语料库级别信息的对比损失,即在相关时将段落表示拉近语料库级别的表示。

References: ID: 1. Title: Representation Degeneration Problem in Training Natural Language Generation Models

参考文献: ID: 1. 标题: 训练自然语言生成模型中的表示退化问题

ID: 2. Title: Deeper Text Understanding for IR with Contextual Neural Language Modeling ID: 3. Title: Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval ID: 4. Title: Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval ID: 5. Title: PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval

ID: 2. 标题: 利用上下文神经语言建模进行更深层次的文本理解以改进信息检索
ID: 3. 标题: 基于近似最近邻负对比学习的密集文本检索
ID: 4. 标题: 无监督语料库感知语言模型预训练用于密集段落检索
ID: 5. 标题: PAIR: 利用段落中心相似关系改进密集段落检索

[Llama3-70b]

[Llama3-70b]

Dense passage retrieval has gained significant attention in recent years, leveraging the power of neural networks to learn complex language patterns and query-document relations $\mathcal{@}\mathrm{CITE}[2]\mathcal{\ @}$ . This approach has many intriguing advantages over sparse retrieval, such as better capturing of language structures and improved performance on queries written in natural languages $\mathcal{@}\mathbf{C}\mathrm{ITE}[2]\circledast$ . However, the effectiveness of dense retrieval often requires combination with sparse retrieval, and the training mechanisms used in dense retrieval can be a bottleneck $\mathcal{Q}\mathrm{CITE}[3]\mathcal{@}$ . To address this, various techniques have been proposed, including Approximate Nearest Neighbor Negative Contrastive Learning $\mathcal{Q}\bar{\mathrm{CITE}}[3]\mathcal{@}$ , unsupervised corpus-aware language model pre-training $\mathcal{O}\mathrm{CITE}[4]\mathcal{@}$ , and leveraging passage-centric similarity relations $\scriptscriptstyle{\mathcal{Q}}\mathrm{CITE}[5],\ @$ . These advancements have significantly improved the performance of dense passage retrieval models, making them a promising approach for various natural language processing tasks.

近年来,密集段落检索 (Dense Passage Retrieval) 获得了广泛关注,它利用神经网络的力量来学习复杂的语言模式和查询-文档关系 [2]。与稀疏检索相比,这种方法具有许多引人注目的优势,例如更好地捕捉语言结构,并在自然语言编写的查询上表现更好 [2]。然而,密集检索的有效性通常需要与稀疏检索结合使用,并且密集检索中使用的训练机制可能成为瓶颈 [3]。为了解决这个问题,已经提出了各种技术,包括近似最近邻负对比学习 [3]、无监督语料库感知语言模型预训练 [4],以及利用以段落为中心的相似性关系 [5]。这些进展显著提高了密集段落检索模型的性能,使其成为各种自然语言处理任务的有前景的方法。

[Llama3-8b Fine-tuned]

[Llama3-8b 微调]

Recent advancements in PLMs such as BERT (De- vlin et al., 2018), Roberta $\langle\mathcal{O}\mathrm{CITE}[1]\mathcal{@}$ , and GPT (@CITE have enabled numerous unsupervised techniques to derive dense represen- tations of queries and passages for retrieval. These approaches primarily use a Bi-Encoder structure to embed text in a low-dimensional space and learn similarity relations via contrastive learning, con- trasting traditional sparse retrieval methods like BM25 or DeepCT $\langle\ @{\mathbf{CITE}}[2]\ @$ ;. DPR ( pi- oneered an unsupervised dense passage retrieval framework, affirming the feasibility of using dense representations for retrieval independently. This efficient and operational approach was further refined by subsequent studies $(\mathtt{@C I T E}[4]\mathbb{@};,;,\mathtt{@C I T E}[5]\mathbb{@}$ that focused on high-quality negative sample min- ing, additional passage relation analysis, and extra training. The essence of these methods is to repre- sent texts in a target space where queries are closer to relevant and distant from irrelevant passages. References: ID: 1. Title: Representation Degeneration Problem in Training Natural Language Generation Models ID: 2. Title: Deeper Text Understanding for IR with Contextual Neural Language Modeling ID: 3. Title: Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval ID: 4. Title: Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval ID: 5. Title: PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval

近年来,诸如 BERT (Devlin et al., 2018)、Roberta $\langle\mathcal{O}\mathrm{CITE}[1]\mathcal{@}$ 和 GPT (@CITE) 等预训练语言模型 (PLMs) 的进展,使得许多无监督技术能够为检索生成查询和段落的密集表示。这些方法主要使用双编码器 (Bi-Encoder) 结构将文本嵌入到低维空间中,并通过对比学习来学习相似性关系,这与传统的稀疏检索方法(如 BM25 或 DeepCT $\langle\ @{\mathbf{CITE}}[2]\ @$)形成对比。DPR 开创了一种无监督的密集段落检索框架,证实了独立使用密集表示进行检索的可行性。后续研究 $(\mathtt{@C I T E}[4]\mathbb{@};,;,\mathtt{@C I T E}[5]\mathbb{@}$) 进一步优化了这一高效且可操作的方法,重点关注高质量负样本挖掘、额外的段落关系分析和额外的训练。这些方法的本质是在目标空间中表示文本,使得查询更接近相关段落,而远离不相关段落。

参考文献:
ID: 1. 标题:训练自然语言生成模型中的表示退化问题
ID: 2. 标题:使用上下文神经语言模型进行更深入的文本理解以改进信息检索
ID: 3. 标题:密集文本检索的近似最近邻负对比学习
ID: 4. 标题:无监督语料库感知语言模型预训练用于密集段落检索
ID: 5. 标题:PAIR:利用段落中心相似性关系改进密集段落检索

Table 9: Visualization examples for automatically discovered semantic knowledge graphs for a GovReport sample.

表 9: GovReport 样本中自动发现的语义知识图谱的可视化示例。

[Original Text with Knowledge]

[原始文本与知识]

In our June 2019 report, we found that, while abuse deficiencies cited in nursing homes were relatively rare from 2013 through 2017, they became more frequent during that time, with the largest increase in severe cases. Specifically, abuse deficiencies comprised less than 1 percent of the total deficiencies in each of the years we examined, which is likely conservative. Abuse in nursing homes is often under reported by residents, family, staff, and the state survey agency, according to CMS officials and stakeholders we interviewed. However, abuse deficiencies more than doubled–from 430 in 2013 to 875 in 2017–over the 5-year period. (See appendix II.) In addition, abuse deficiencies cited in 2017 were more likely to be categorized at the highest levels of severity– deficiencies causing actual harm to residents or putting residents in immediate jeopardy–than they were in 2013. In light of the increased number and severity of abuse deficiencies, it is imperative that CMS have strong nursing home oversight in place to protect residents from abuse; however, we found oversight gaps that may limit the agency’s ability to do so. Specifically, we found that CMS: (1) cannot readily access data on the type of abuse or type of perpetrator, (2) has not provided guidance on what information nursing homes should include in facility-reported incidents, and (3) has numerous gaps in its referral process that can result in delayed and missed referrals to law enforcement. In our June 2019 report, we found that CMS’s data do not allow for the type of abuse or perpetrator to be readily identified by the agency. Specifically, CMS does not require the state survey agencies to record abuse and perpetrator type and, when this information is recorded, it cannot be easily analyzed by CMS. Therefore, we reviewed a representative sample of 400 CMS narrative descriptions– written by state surveyors–associated with abuse deficiencies cited in 2016 and 2017 to identify the most common types of abuse and perpetrators. From this review, we found that physical abuse (46 percent) and mental/verbal abuse (44 percent) occurred most often in nursing homes, followed by sexual abuse (18 percent). Furthermore, staff, which includes those working in any part of the nursing home, were more often the perpetrators (58 percent) of abuse in deficiency narratives, followed by resident perpetrators (30 percent) and other types of perpetrators (2 percent). (See appendix III for examples from our abuse deficiency narrative review.)...

在我们2019年6月的报告中,我们发现,尽管从2013年到2017年,养老院中引用的虐待缺陷相对较少,但在此期间它们变得更加频繁,尤其是严重案例的增加最为显著。具体而言,虐待缺陷在我们检查的每一年中占总缺陷的比例不到1%,这可能是保守的估计。根据我们采访的CMS官员和利益相关者,养老院中的虐待行为往往被居民、家属、工作人员和州调查机构低估。然而,虐待缺陷在5年期间增加了一倍多——从2013年的430起增加到2017年的875起。(见附录II。)此外,2017年引用的虐待缺陷更有可能被归类为最高严重程度——导致居民实际伤害或使居民处于直接危险中的缺陷——比2013年更常见。鉴于虐待缺陷数量和严重程度的增加,CMS必须加强养老院的监督,以保护居民免受虐待;然而,我们发现监督存在差距,可能限制该机构的能力。具体而言,我们发现CMS:(1)无法轻松访问虐待类型或施害者类型的数据,(2)未提供关于养老院应在设施报告的事件中包含哪些信息的指导,(3)其转介流程存在许多差距,可能导致向执法机构的转介延迟和遗漏。在我们2019年6月的报告中,我们发现CMS的数据不允许该机构轻松识别虐待类型或施害者。具体而言,CMS不要求州调查机构记录虐待和施害者类型,即使记录了这些信息,CMS也无法轻松分析。因此,我们审查了2016年和2017年引用的虐待缺陷相关的400份CMS叙述描述的样本,以确定最常见的虐待类型和施害者。从这次审查中,我们发现身体虐待(46%)和心理/言语虐待(44%)在养老院中最常见,其次是性虐待(18%)。此外,工作人员(包括在养老院任何部分工作的人员)在缺陷叙述中更常是施害者(58%),其次是居民施害者(30%)和其他类型的施害者(2%)。(见附录III中的虐待缺陷叙述审查示例。)...

[Scaffold Node $k=0]$ (top-level aspects or concepts)

[Scaffold Node $k=0]$ (顶层方面或概念)

Node meta type: Policy Objectives And Goals

节点元类型:策略目标与目的

Text attribute: The policy objectives and goals of this report are to ensure that CMS has strong nursing home oversight in place to protect residents from abuse. The report aims to identify and address the gaps in CMS’s ability to monitor and respond to abuse deficiencies effectively.

文本属性:本报告的政策目标和宗旨是确保CMS(医疗保险和医疗补助服务中心)具备强有力的养老院监管机制,以保护居民免受虐待。报告旨在识别并解决CMS在有效监控和应对虐待缺陷方面的能力差距。

Node meta type: Stake holder Impact and Implications

节点元类型:利益相关者影响及意义

Text attribute: Stakeholders impacted by this report include nursing home residents, their families, staff, state survey agencies, and CMS officials. The implications are that without proper oversight and reporting mechanisms, abuse in nursing homes may continue to be under reported and inadequately addressed. Node

文本属性:受本报告影响的利益相关者包括养老院居民、他们的家人、工作人员、州调查机构和CMS官员。其影响在于,如果没有适当的监督和报告机制,养老院中的虐待行为可能会继续被低估且得不到充分解决。节点

meta type: Methodology and Evidence

meta type: 方法论与证据

Text attribute: The methodology involved reviewing a representative sample of $400\ \mathrm{CMS}$ narrative descriptions associated with abuse deficiencies cited in 2016 and 2017. This review aimed to identify the most common types of abuse and perpetrators. Additionally, interviews with CMS officials and stakeholders provided insights into the under reporting and handling of abuse incidents.

文本属性:该方法涉及审查2016年和2017年引用的与虐待缺陷相关的400份CMS叙述性描述的代表性样本。此次审查旨在确定最常见的虐待类型和施虐者。此外,与CMS官员和利益相关者的访谈提供了关于虐待事件漏报和处理情况的见解。

Node meta type: Findings and Recommendations

节点元类型:发现与建议

Text attribute: Key findings include the under reporting of abuse in nursing homes, the doubling of abuse deficiencies from 2013 to 2017, and the increased severity of these deficiencies. Recommendations include requiring state survey agencies to report abuse and perpetrator type, providing guidance on facility-reported incidents, and improving the referral process to law enforcement.

文本属性:主要发现包括养老院虐待事件的报告不足、2013年至2017年虐待缺陷翻倍以及这些缺陷的严重性增加。建议包括要求州调查机构报告虐待和施虐者类型、提供关于设施报告事件的指导,并改进向执法机构的转介流程。

Node meta type: Implementation and Evaluation

节点元类型:实现与评估

Text attribute: As of November 2019, the Department of Health and Human Services (HHS) had not implemented the recommendations made in the report. The implementation and evaluation of these recommendations are crucial to improving CMS’s oversight and protecting nursing home residents from abuse.

文本属性:截至2019年11月,卫生与公众服务部 (HHS) 尚未实施报告中提出的建议。这些建议的实施和评估对于改善CMS的监督和保护养老院居民免受虐待至关重要。

[Scaffold Node k = 1] (细粒度概念作为关键词。文本属性省略。)
养老院中的虐待缺陷 医疗保险和医疗补助服务中心 (CMS)
监督漏洞 改进建议 养老院监督 居民保护
虐待缺陷 CMS 监控养老院居民的能力 养老院中的虐待
州调查机构 CMS 官员 CMS 叙述性描述 虐待缺陷
虐待事件少报 CMS 官员和利益相关者 虐待少报
虐待缺陷 州调查机构 向执法机构转介的流程
卫生与公众服务部 医疗保险和医疗补助服务中心 22 名养老院居民 虐待

[Original Text with Knowledge]

[原始文本与知识]

Title: A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models. Abstract: Contrastive ly trained text-image models have the remarkable ability to perform zero-shot classification, that is, classifying previously unseen images into categories that the model has never been explicitly trained to identify. However, these zero-shot class if i ers need prompt engineering to achieve high accuracy. Prompt engineering typically requires hand-crafting a set of prompts for individual downstream tasks. In this work, we aim to automate this prompt engineering and improve zero-shot accuracy through prompt ensembling. In particular, we ask "Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?". We demonstrate that this is possible. In doing so, we identify several path o logie s in a naive prompt scoring method where the score can be easily overconfident due to biases in pre-training and test data, and we propose a novel prompt scoring method that corrects for the biases. Using our proposed scoring method to create a weighted average prompt ensemble, our method outperforms equal average ensemble, as well as hand-crafted prompts, on ImageNet, 4 of its variants, and 11 fine-grained classification benchmarks, all while being fully automatic, optimization-free, and not requiring access to labeled validation data.

标题: 一种简单的零样本提示加权技术,用于改进文本-图像模型中的提示集成

摘要: 对比训练的文本-图像模型具有执行零样本分类的显著能力,即将以前未见过的图像分类到模型从未明确训练过的类别中。然而,这些零样本分类器需要通过提示工程来实现高精度。提示工程通常需要为单个下游任务手工制作一组提示。在这项工作中,我们的目标是自动化这一提示工程,并通过提示集成来提高零样本精度。具体来说,我们提出一个问题:“给定一个大的提示池,我们能否自动评分这些提示,并集成那些最适合特定下游数据集的提示,而不需要访问带标签的验证数据?”。我们证明了这是可能的。在此过程中,我们识别了在一种简单的提示评分方法中的几个问题,其中由于预训练和测试数据中的偏差,评分可能过于自信,我们提出了一种新的提示评分方法来纠正这些偏差。使用我们提出的评分方法创建加权平均提示集成,我们的方法在ImageNet、其4个变体和11个细粒度分类基准上优于等权平均集成和手工制作的提示,同时完全自动化、无需优化,并且不需要访问带标签的验证数据。

[Scaffold Node $k=0]$ (top-level aspects or concepts)

[Scaffold Node $k=0]$ (顶层方面或概念)

Node meta type: Research Background

节点元类型:研究背景

Text attribute: Contrastive ly trained text-image models possess the ability to perform zero-shot classification, which involves categorizing unseen images into untrained categories. However, achieving high accuracy in zero-shot classification often requires meticulous prompt engineering, typically involving hand-crafted prompts tailored for specific downstream tasks.

文本属性:对比训练(Contrastive Training)的文本-图像模型具备零样本分类的能力,即能够将未见过的图像分类到未经训练的类别中。然而,要在零样本分类中实现高准确率,通常需要细致的提示词工程(Prompt Engineering),这通常涉及为特定下游任务定制的手工提示词。

Node meta type: Research Question

节点元类型:研究问题

Text attribute: The research question addressed in this work is whether it is possible to automatically score and ensemble the most suitable prompts from a large pool for a particular downstream dataset, without relying on labeled validation data. This question stems from the need to automate and improve the accuracy of zero-shot classification through better prompt engineering.

文本属性:本工作研究的问题是,是否有可能在不依赖标注验证数据的情况下,自动评分并集成从大量提示池中选择的最适合特定下游数据集的提示。这一问题源于通过更好的提示工程来自动化和提高零样本分类准确性的需求。

Node meta type: Methodology

节点元类型:方法论

Text attribute: The methodology involves identifying and addressing path o logie s in a naive prompt scoring method, which can be overly confident due to biases in pre-training and test data. The authors propose a novel prompt scoring method that corrects for these biases, enabling the creation of a weighted average prompt ensemble that is fully automatic and optimization-free.

文本属性:该方法涉及识别并解决一种简单的提示评分方法中的路径偏差问题,这种方法可能由于预训练和测试数据中的偏差而过于自信。作者提出了一种新的提示评分方法,能够纠正这些偏差,从而创建一种完全自动且无需优化的加权平均提示集成。

Node meta type: Key Results

节点元类型:关键结果

Text attribute: The key results demonstrate that the proposed prompt weighting technique outperforms both equal average ensemble and hand-crafted prompts on ImageNet, four of its variants, and 11 fine-grained classification benchmarks. The method achieves this while remaining fully automatic, not requiring optimization, and without access to labeled validation data.

文本属性:关键结果表明,所提出的提示加权技术在 ImageNet、其四个变体和 11 个细粒度分类基准上均优于平均集成和手工制作的提示。该方法在保持完全自动化的同时实现了这一点,不需要优化,也不需要访问带标签的验证数据。

[Scaffold Node $k=1]$ (Fine-grained concepts as keywords. Text attributes are omitted.)

[Scaffold Node $k=1]$ (细粒度概念作为关键词。文本属性省略。)

| 零样本提示加权自动化 | 提示工程 零样本分类准确率 |
| 零样本分类 | 细致的提示工程 自动评分 |
| 集成提示 零样本分类 | 提示工程 朴素提示评分方法 |
| 新颖提示评分方法 加权平均 | 提示加权技术 |
| 全自动 | 提示集成 |

阅读全文(20积分)