GRAPHAGENT: AGENTIC GRAPH LANGUAGE ASSISTANT

GRAPHAGENT: 图语言助手智能体

ABSTRACT

摘要

Real-world data is represented in both structured (e.g., graph connections) and unstructured (e.g., textual, visual information) formats, encompassing complex relationships that include explicit links (such as social connections and user behaviors) and implicit interdependencies among semantic entities, often illustrated through knowledge graphs. In this work, we propose GraphAgent, an automated agent pipeline that addresses both explicit graph dependencies and implicit graphenhanced semantic inter-dependencies, aligning with practical data scenarios for predictive tasks (e.g., node classification) and generative tasks (e.g., text generation). GraphAgent comprises three key components: (i) a Graph Generator Agent that builds knowledge graphs to reflect complex semantic dependencies; (ii) a Task Planning Agent that interprets diverse user queries and formulates corresponding tasks through agentic self-planning; and (iii) a Task Execution Agent that efficiently executes planned tasks while automating tool matching and invocation in response to user queries. These agents collaborate seamlessly, integrating language models with graph language models to uncover intricate relational information and data semantic dependencies. Through extensive experiments on various graph-related predictive and text generative tasks on diverse datasets, we demonstrate the effectiveness of our GraphAgent across various settings. We have made our proposed GraphAgent open-source at: https://github.com/HKUDS/GraphAgent.

现实世界中的数据以结构化（如图形连接）和非结构化（如文本、视觉信息）格式表示，涵盖了包括显式链接（如社交连接和用户行为）和语义实体之间的隐式相互依赖关系的复杂关系，通常通过知识图谱进行说明。在本研究中，我们提出了GraphAgent，一个自动化智能体管道，旨在解决显式图形依赖和隐式图形增强的语义相互依赖关系，与实际数据场景中的预测任务（如节点分类）和生成任务（如文本生成）保持一致。GraphAgent由三个关键组件组成：(i) 一个图形生成智能体，用于构建知识图谱以反映复杂的语义依赖关系；(ii) 一个任务规划智能体，用于解释多样化的用户查询并通过智能体自我规划制定相应的任务；(iii) 一个任务执行智能体，用于高效执行计划任务，同时自动匹配和调用工具以响应用户查询。这些智能体无缝协作，将语言模型与图形语言模型集成，以揭示复杂的关系信息和数据语义依赖关系。通过对各种数据集上的图形相关预测和文本生成任务进行广泛实验，我们证明了GraphAgent在各种设置中的有效性。我们已将提出的GraphAgent开源，地址为：https://github.com/HKUDS/GraphAgent。

1 INTRODUCTION

1 引言

Real-world information exists in a complex ecosystem of interconnected data types. Structured data, particularly graph-based connections, captures explicit relationships such as social networks and user interaction patterns (Fey et al., 2023). Complementing this, unstructured data - including text and visual content - reveals implicit semantic relationships between entities (Zhong & Mottin, 2023). The integration of these diverse data formats has become crucial for modern applications, as it enables more comprehensive and nuanced analysis of complex real-world scenarios (Lu et al., 2024).

现实世界的信息存在于一个由互连数据类型组成的复杂生态系统中。结构化数据，特别是基于图的连接，捕捉了诸如社交网络和用户交互模式等明确关系 (Fey et al., 2023)。与此相辅相成的是，非结构化数据——包括文本和视觉内容——揭示了实体之间的隐含语义关系 (Zhong & Mottin, 2023)。这些多样化数据格式的集成对于现代应用变得至关重要，因为它能够对复杂的现实场景进行更全面和细致的分析 (Lu et al., 2024)。

Graph serves as an effective means of representing relational information across various domains. In academic networks, papers are interconnected through explicit citations, with each paper represented as a node in a graph and edges indicating these citations Chen et al. (2023); Wang et al. (2022). This structure enables researchers to trace the influence of one paper on another, showcasing the evolution of ideas. Additionally, the papers’ content provides unstructured data for analyzing themes, methodologies, and findings. By integrating structured citation data with unstructured text, researchers can identify trends and derive valuable insights, leading to applications such as knowledge summaries and scientific question-answering, which can be framed as Graph-enhanced Text Generative Tasks.

图是表示跨领域关系信息的有效手段。在学术网络中，论文通过显式引用相互连接，每篇论文表示为图中的一个节点，边表示这些引用 (Chen et al., 2023; Wang et al., 2022)。这种结构使研究人员能够追踪一篇论文对另一篇论文的影响，展示思想的演变。此外，论文的内容为分析主题、方法和发现提供了非结构化数据。通过将结构化引用数据与非结构化文本相结合，研究人员可以识别趋势并得出有价值的见解，从而应用于知识总结和科学问答等任务，这些任务可以视为图增强文本生成任务。

In e-commerce scenarios, customer interactions form structured behavior data that can be analyzed in conjunction with unstructured data sources, such as product reviews and descriptions Shuai et al. (2022); Li et al. (2023). This integrated approach enables businesses to gain deeper insights into consumer behavior patterns and improve recommendation accuracy. Specifically, by integrating user behavior graphs with rich textual information, these user-item interaction forecasting challenges can be effectively approached as Graph-related Predictive Tasks.

在电子商务场景中，客户互动形成了结构化行为数据，这些数据可以与产品评论和描述等非结构化数据源结合分析 (Shuai et al., 2022; Li et al., 2023)。这种集成方法使企业能够更深入地了解消费者行为模式，并提高推荐准确性。具体而言，通过将用户行为图与丰富的文本信息相结合，这些用户-物品互动预测挑战可以有效地作为图相关预测任务来处理。

Existing graph learning methods have become essential frameworks for analyzing and learning from graph data (Hamilton, 2020). These methods focus on learning embeddings for nodes and edges, mapping structural information into a latent representation space (Yang et al., 2020). Among these, Graph Neural Networks (GNNs) stand out as state-of-the-art (SOTA) approaches (Dai et al., 2022; Liu et al., 2022). GNNs employ a message-passing mechanism that allows nodes to exchange information with their neighbors, effectively capturing the graph’s structural characteristics and enhancing representation learning. However, they primarily focus on explicit graph connections, often neglecting the complex semantic dependencies associated with linked textual data. Additionally, GNNs generally have limited generalization capabilities for real-world graph mining tasks (Xia & Huang, 2024; Mao et al., 2024). They often require training task-specific models, which complicates automation and reduces effectiveness in zero-shot scenarios. In practical applications, the ability to process both structured and unstructured data, particularly with unseen new data, is crucial.

现有的图学习方法已成为分析和学习图数据的重要框架 (Hamilton, 2020)。这些方法专注于学习节点和边的嵌入，将结构信息映射到潜在表示空间中 (Yang et al., 2020)。其中，图神经网络 (Graph Neural Networks, GNNs) 作为最先进 (SOTA) 的方法脱颖而出 (Dai et al., 2022; Liu et al., 2022)。GNNs 采用消息传递机制，允许节点与其邻居交换信息，有效捕捉图的结构特征并增强表示学习。然而，它们主要关注显式的图连接，往往忽略了与链接文本数据相关的复杂语义依赖关系。此外，GNNs 在现实世界的图挖掘任务中通常具有有限的泛化能力 (Xia & Huang, 2024; Mao et al., 2024)。它们通常需要训练特定任务的模型，这使自动化变得复杂，并降低了在零样本场景中的有效性。在实际应用中，处理结构化和非结构化数据的能力，特别是处理未见过的数据，至关重要。

Inspired by the recent success of large language models (LLMs), researchers are striving to enhance the generalization capabilities of graph learning models by enabling LLMs to comprehend graph structural information. Notable examples include GraphGPT (Tang et al., 2024a) and LLaGA (Chen et al., 2024a), which convert graph-structured data into tokens suitable for LLM input. However, these approaches are primarily designed for conventional graph learning tasks, such as node classification and link prediction. This narrow focus limits their broader application in effectively handling both structured and unstructured data in a more flexible

受大语言模型 (LLMs) 近期成功的启发，研究人员正致力于通过让 LLMs 理解图结构信息来增强图学习模型的泛化能力。典型的例子包括 GraphGPT (Tang et al., 2024a) 和 LLaGA (Chen et al., 2024a)，它们将图结构数据转换为适合 LLM 输入的 Token。然而，这些方法主要针对传统的图学习任务，如节点分类和链接预测。这种局限性限制了它们在更灵活地处理结构化和非结构化数据方面的广泛应用。

Figure 1: GraphAgent processes both structured and unstructured data, adapting seamlessly to various downstream tasks across diverse scenarios.

图 1: GraphAgent 处理结构化和非结构化数据，无缝适应各种场景下的下游任务。

and efficient manner. In light of these limitations, an important question arises: How can we empower individuals without any background in graph theory or machine learning to analyze their graph data using natural language and obtain the desired predictions and insights?

鉴于这些限制，一个重要的问题出现了：我们如何能够让没有任何图论或机器学习背景的个人使用自然语言分析他们的图数据，并获得所需的预测和见解？

The Presented Work. In this paper, we aim to establish a fully automated analysis framework capable of handling a wide variety of data types, including both structured and unstructured data. Our framework, GraphAgent, is designed to address diverse user needs, encompassing both graph-related predictive and generative tasks. Built on an agentic architecture, GraphAgent allows users to interact with it using natural language. This intuitive and comprehensive approach thoroughly empowers all individuals to obtain predictions and insights from graph-structured data, tailored to their specific requirements, without requiring specialized knowledge in graph learning.

本文工作。在本文中，我们旨在建立一个全自动的分析框架，能够处理多种数据类型，包括结构化和非结构化数据。我们的框架 GraphAgent 旨在满足用户多样化的需求，涵盖与图相关的预测和生成任务。基于智能体架构构建的 GraphAgent 允许用户使用自然语言与其交互。这种直观且全面的方法使所有个体能够根据其特定需求，从图结构数据中获得预测和洞察，而无需具备图学习的专业知识。

To achieve our objective, several key challenges must be addressed: i) Constructing Potential Semantic Relationships: How to derive latent semantic connections from complex data. ii) Automating Query Understanding and Task Formulation: How to automatically interpret user query prompts, formulate them into specific tasks (e.g., predictive or generative tasks), and effectively plan those tasks. iii) Efficient Task Execution: How to accurately and effectively implement the formulated tasks and return correct results. To tackle these challenges, our proposed model is designed with an advanced framework comprising three main components: a Graph Generator Agent that constructs Semantic Knowledge Graphs (SKGs) from user text, a Task Planning Agent that interprets queries and formulates tasks, and a Graph Action Agent that automates the task execution.

为实现我们的目标，必须解决几个关键挑战：i) 构建潜在语义关系：如何从复杂数据中推导出潜在的语义连接。ii) 自动化查询理解与任务制定：如何自动解释用户查询提示，将其制定为具体任务（例如预测性或生成性任务），并有效规划这些任务。iii) 高效任务执行：如何准确有效地执行制定的任务并返回正确结果。为应对这些挑战，我们提出的模型设计了一个先进的框架，包含三个主要组件：一个从用户文本中构建语义知识图谱（SKG）的图生成器智能体（Graph Generator Agent），一个解释查询并制定任务的任务规划智能体（Task Planning Agent），以及一个自动化任务执行的图操作智能体（Graph Action Agent）。

To summarize, this work presents the following contributions:

总结来说，本工作提出了以下贡献：

• Complex Practical Data Integration. Our framework provides robust handling of real-world scenarios by seamlessly merging structured and unstructured data with graph-based entity relationships. This unified approach enables dual capabilities - supporting both predictive analytics and text generation tasks. By allowing natural language interactions, users can directly query and analyze complex data structures, streamlining information extraction and improving accessibility. • Multi-Agent Workflow. This work introduces GraphAgent, an advanced automated graph language assistant that enhances the integration of structured and unstructured data analysis. It autonomously constructs semantic knowledge graphs (SKGs) from text, formulates predictive and generative tasks from user queries, and efficiently executes these tasks. This seamless collaboration enables GraphAgent to uncover complex relational information and semantic dependencies, significantly improving usability and accessibility in graph analysis.

• 复杂的实际数据集成。我们的框架通过将结构化和非结构化数据与基于图的实体关系无缝融合，提供了对现实场景的稳健处理。这种统一方法实现了双重能力——既支持预测分析，也支持文本生成任务。通过允许自然语言交互，用户可以直接查询和分析复杂的数据结构，简化信息提取并提高可访问性。
• 多智能体工作流。本工作引入了 GraphAgent，这是一种先进的自动化图语言助手，增强了结构化和非结构化数据分析的集成。它能够从文本中自主构建语义知识图 (SKG)，根据用户查询制定预测和生成任务，并高效执行这些任务。这种无缝协作使 GraphAgent 能够揭示复杂的关系信息和语义依赖，显著提高了图分析的可用性和可访问性。

Figure 2: The overall framework of the proposed GraphAgent.

图 2: 提出的 GraphAgent 的整体框架。

• Experimental Evaluation. We validated our model on both structured and unstructured data, showing strong performance across graph predictive tasks and new graph-related text generative tasks. Additionally, we conducted ablation experiments to assess the effectiveness of key modules. It is important to note that our entire agent framework employs relatively small open-source large language models (e.g., LLaMA-8B), yet our model still exhibits significant advantages compared to current state-of-the-art closed-source models (e.g., GPT-4, Gemini) for generation tasks.

• 实验评估。我们在结构化和非结构化数据上验证了我们的模型，展示了在图预测任务和新图相关文本生成任务中的强大性能。此外，我们还进行了消融实验以评估关键模块的有效性。值得注意的是，我们的整个智能体框架采用了相对较小的开源大语言模型（例如 LLaMA-8B），但我们的模型在生成任务中仍然表现出与当前最先进的闭源模型（例如 GPT-4、Gemini）相比的显著优势。

2 METHODOLOGY

2 方法论

2.1 PRELIMINARIES

2.1 预备知识

Graph-empowered Agents. Our GraphAgent proposes an automated agentic pipeline that addresses graph predictive and text generation tasks. It can be formulated as $\mathcal{Y},=,f(\mathcal{O};\mathrm{LLM})$ , where the agentic function $f(\cdot)$ incorporates an Observation $\scriptscriptstyle\mathcal{O}$ that includes both the structured data (e.g., explicit graph connections) or unstructured data (e.g., textual information). The agent then produces an Action $\boldsymbol{\wp}$ , which can involve predictions (e.g., node classifications) or text generation tasks (e.g., summarizing text with implicit entity interdependencies). The workflow of GraphAgent leverages the capabilities of LLMs to enhance its effectiveness in both predictive and generative tasks.

图赋能智能体。我们的 GraphAgent 提出了一种自动化的智能体管道，用于解决图预测和文本生成任务。它可以表示为 $\mathcal{Y},=,f(\mathcal{O};\mathrm{LLM})$，其中智能体函数 $f(\cdot)$ 包含一个观察 $\scriptscriptstyle\mathcal{O}$，该观察既包括结构化数据（例如显式图连接），也包括非结构化数据（例如文本信息）。然后，智能体生成一个动作 $\boldsymbol{\wp}$，该动作可能涉及预测（例如节点分类）或文本生成任务（例如总结具有隐式实体相互依赖关系的文本）。GraphAgent 的工作流程利用了大语言模型的能力，以增强其在预测和生成任务中的有效性。

Graph-Structured Data. In our GraphAgent, both structured and unstructured data are represented as graphs, differing only in the explicitness or implicit ness of the entity-wise relationships. To accommodate the diversity of graph data, we utilize heterogeneous graphs to represent the input data. Specifically, a heterogeneous graph is denoted as $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{N},\mathcal{R})$ , where $\nu$ is the set of all entities, and $\mathcal{E}$ is the set of all edges connecting pairs of entities. The sets $\mathcal{N}$ and $\mathcal{R}$ represent the types of nodes and edges, respectively. For each edge, a meta-type attribute can be retrieved in the form $(n_{h},r_{i},n_{t})$ , denoting the meta-types of the head node $n_{h}$ , relation $r_{i}$ , and tail node $n_{t}$ , respectively.

图结构数据。在我们的 GraphAgent 中，结构化和非结构化数据都被表示为图，唯一的区别在于实体间关系的显式或隐式。为了适应图数据的多样性，我们使用异构图来表示输入数据。具体来说，异构图表示为 $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathcal{N},\mathcal{R})$，其中 $\nu$ 是所有实体的集合，$\mathcal{E}$ 是连接实体对的所有边的集合。集合 $\mathcal{N}$ 和 $\mathcal{R}$ 分别表示节点和边的类型。对于每条边，可以以 $(n_{h},r_{i},n_{t})$ 的形式检索元类型属性，分别表示头节点 $n_{h}$、关系 $r_{i}$ 和尾节点 $n_{t}$ 的元类型。

2.2 GRAPH GENERATION AGENT

2.2 图生成智能体

To uncover the rich contextual information within unstructured data, GraphAgent designs a Graph Generation Agent that automatically constructs meaningful Semantic Knowledge Graphs (SKGs) from any type of textual input. For example, for a paper abstract that includes the sentence, “Contrastively trained text-image models have the remarkable ability to perform zero-shot classification”, the model can extract relevant entity nodes such as “text-image models” and “zero-shot classification”.

为了揭示非结构化数据中丰富的上下文信息，GraphAgent 设计了一个图生成智能体，能够从任何类型的文本输入中自动构建有意义的语义知识图 (Semantic Knowledge Graphs, SKGs)。例如，对于包含句子“对比训练的文本-图像模型具有执行零样本分类的显著能力”的论文摘要，模型可以提取出相关的实体节点，如“文本-图像模型”和“零样本分类”。

Iterative Two-Phase Graph Generation Workflow. To capture complex implicit entity-wise dependencies, our graph generation agent operates through an automated two-phase workflow: (1) Scaffold Knowledge Entity Extraction and (2) Knowledge Description Augmentation. The first phase is dedicated to identifying key knowledge entities or concepts referred to as scaffold knowledge nodes from the provided text, regardless of its format. Specifically, this phase can be formulated as:

迭代式两阶段图生成工作流。为了捕捉复杂的隐式实体间依赖关系，我们的图生成智能体通过一个自动化的两阶段工作流程进行操作：(1) 骨架知识实体提取和 (2) 知识描述增强。第一阶段致力于从提供的文本中识别出关键的知识实体或概念，这些被称为骨架知识节点，无论文本的格式如何。具体来说，这一阶段可以表述为：

\begin{array}{r}{\mathcal{V}_{\mathrm{scaffold}}^{k=0}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys\_sk}},\mathbf{g}_{s}),}\end{array}

where $\mathbf{g}{s}$ represents the input unstructured text data, while $\mathbf{x}{\mathrm{sys_sk}}$ denotes the system prompt for extracting scaffold knowledge nodes. We adopt an iterative approach to graph generation to capture both high-level and fine-grained semantic dependencies among multi-grained entities. For example, in an academic paper, high-level entities might include "Machine Learning," while fine-grained entities could be "Self-Supervised Learning" and "Graph Neural Network". Specifically, $\bar{\nu}_{\mathrm{scaffc}}^{k=0}$ ld refers to the generated vertices during the initial iteration ( $k=0$ ).

其中 $\mathbf{g}{s}$ 表示输入的非结构化文本数据，而 $\mathbf{x}{\mathrm{sys_sk}}$ 表示用于提取骨架知识节点的系统提示。我们采用迭代的方法来生成图，以捕捉多粒度实体之间的高层次和细粒度语义依赖关系。例如，在学术论文中，高层次实体可能包括“机器学习”，而细粒度实体可能是“自监督学习”和“图神经网络”。具体来说，$\bar{\nu}_{\mathrm{scaffc}}^{k=0}$ 指的是初始迭代（$k=0$）期间生成的顶点。

The second phase of knowledge augmentation centers on enhancing and enriching the textual descriptions of the generated entity nodes to ensure accurate, comprehensive, and con textually appropriate language modeling. This critical step ensures that each entity is represented with sufficient detail and semantic clarity. Formally, we define this phase as follows:

知识增强的第二阶段侧重于增强和丰富生成的实体节点的文本描述，以确保准确、全面且上下文适当的语言建模。这一关键步骤确保每个实体都能以足够的细节和语义清晰度进行表示。正式地，我们将这一阶段定义如下：

\mathcal{C}_{\mathrm{scaffold}}^{k=0}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys\_ka}},\mathbf{g}_{s},\gamma_{\mathrm{scaffold}}^{k=0}).

where $\mathcal{C}{\mathrm{scaffold}}^{k=0}$ denotes the node-specific descriptions, while $\mathbf{x}{\mathrm{sys_ka}}$ denotes the system prompt for knowledge augmentation. To iterative ly execute this two-phase workflow, GraphAgent uses the textual augmentation output from the previous round as the implicit graph input for the next round:

其中 $\mathcal{C}{\mathrm{scaffold}}^{k=0}$ 表示节点特定的描述，而 $\mathbf{x}{\mathrm{sys_ka}}$ 表示用于知识增强的系统提示。为了迭代执行这个两阶段工作流，GraphAgent 使用上一轮的文本增强输出作为下一轮的隐式图输入：

\begin{array}{r l}&{\mathcal{V}_{\mathrm{scaffold}}^{k=j}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys}_{-}\mathrm{sk}},\mathcal{C}_{\mathrm{scaffold}}^{k=j-1})}\\ &{\mathcal{C}_{\mathrm{scaffold}}^{k=j}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys}_{-}\mathrm{ka}},\mathcal{C}_{\mathrm{scaffold}}^{k=j-1},\mathcal{V}_{\mathrm{scaffold}}^{k=j-1}).}\end{array}

set: $\nu_{\mathrm{skg}}=\breve{\bigcup}\nu_{\mathrm{scaffold}}^{k}$ and $\mathcal{C}_{\mathrm{skg}}=\bigcup\dot{\mathcal{C}}_{\mathrm{scaffold}}^{k}$ . The relationships among these nodes, denoted as $\mathcal{E}_{\mathrm{skg}}$ are establis hed based on their deri vation: if a new node is generated from the textual description of a node in the previous iteration, we connect these two nodes in the semantic knowledge graph. The system prompts used for graph generation are detailed in Table 6, which is presented in the Appendix.

集合：$\nu_{\mathrm{skg}}=\breve{\bigcup}\nu_{\mathrm{scaffold}}^{k}$ 和 $\mathcal{C}_{\mathrm{skg}}=\bigcup\dot{\mathcal{C}}_{\mathrm{scaffold}}^{k}$。这些节点之间的关系，表示为 $\mathcal{E}_{\mathrm{skg}}$，是基于它们的派生建立的：如果一个新节点是从前一次迭代中某个节点的文本描述生成的，我们就在语义知识图中连接这两个节点。用于图生成的系统提示详见附录中的表 6。

2.3 TASK PLANNING AGENT

2.3 任务规划智能体 (Task Planning Agent)

With both structured and unstructured data represented as graphs, GraphAgent employs a task planning agent to automatically interpret user queries and transform the graph data into a unified embedding structure. This facilitates easier utilization by the subsequent predictive and generative modules. Input-output examples of the task planning agent is provided in Table 3 in the Appendix.

将结构化和非结构化数据表示为图后，GraphAgent 使用任务规划智能体自动解释用户查询，并将图数据转换为统一的嵌入结构。这有助于后续的预测和生成模块更轻松地利用这些数据。任务规划智能体的输入输出示例见附录中的表 3。

2.3.1 Intent Identification and Task Formulation

2.3.1 意图识别与任务制定

The task planning agent is initially tasked with formulating meaningful predictive or generative tasks based on the user query prompt. Given a user query prompt $\mathbf{x}{\mathrm{usr_p}}$ and a predefined system prompt for task parsing $\mathbf{x}{\mathrm{sys_tp}}$ , the task planning agent formulates the intended task as follows:

任务规划智能体最初的任务是基于用户查询提示制定有意义的预测性或生成性任务。给定用户查询提示 $\mathbf{x}{\mathrm{usr_p}}$ 和预定义的任务解析系统提示 $\mathbf{x}{\mathrm{sys_tp}}$ ，任务规划智能体按如下方式制定预期任务：

\begin{array}{r}{\mathbf{g}_{s},\mathbf{x}_{\mathrm{usr\_ann}},\mathbf{t}_{\mathrm{usr}}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys\_tp}},\mathbf{x}_{\mathrm{usr\_p}}),}\end{array}```

This intent identification and task formulation procedure generates three fundamental types of task attributes within our agent architecture, which is specifically defined as follows:

此意图识别和任务制定程序在我们的智能体架构中生成了三种基本类型的任务属性，具体定义如下：

• Source graph $\mathbf{g}_{s}$ represented by formatted files, textual graph descriptions, or plain documents.

• 源图 $\mathbf{g}_{s}$ 由格式化文件、文本图描述或纯文本文档表示。

• Task type $\mathbf{t}{\mathrm{usr}}$ is inferred from the query prompt and can be one of "predictive predefined", "predictive wild", or "open generation". This task type symbol is used to automatically select system prompt templates during training or inference for different tasks. • User annotation $\mathbf{x}{\mathrm{usr_ann}}$ includes additional task information, such as task descriptions, label candidates for predictive tasks, and generation requirements for generative tasks.

• 任务类型 $\mathbf{t}{\mathrm{usr}}$ 是从查询提示中推断出来的，可以是“预测预定义”、“预测开放”或“开放生成”中的一种。此任务类型符号用于在训练或推理期间自动选择不同任务的系统提示模板。
• 用户注释 $\mathbf{x}{\mathrm{usr_ann}}$ 包括额外的任务信息，例如任务描述、预测任务的标签候选以及生成任务的生成要求。

To construct grounded graph tokens that can be understood by the subsequent action agent, the task planning agent follows two stages: i) Graph-Token Grounding—converting graphs with nodes and edges into grounded Python objects; ii) Graph Token iz ation—generating tokens from the input that preserve complex interdependencies among graph-structured entities.

为了构建后续动作智能体能够理解的基于图的Token，任务规划智能体遵循两个阶段：i) 图-Token 基础化 (Graph-Token Grounding) —— 将带有节点和边的图转换为基于Python语言的对象；ii) 图Token化 (Graph Tokenization) —— 从输入中生成Token，保留图结构实体之间的复杂相互依赖关系。

2.3.2 Graph-Token Grounding

2.3.2 图-Token 对齐

Our framework reads graph nodes and edges and converts them into grounded Python objects using a graph-building and wrapping tool. Notably, our model can handle diverse graph inputs, regardless of whether an explicit graph with predefined nodes and edges is present. For simplicity, we will demonstrate a scenario where the user uploads a predefined graph. For example, the query prompt might be: "...I want to know which category is correct for the node with ID [305]..." with uploaded graph files such as ["node_list.txt", "edge_list.txt"]. To build a grounded graph object in Python, we utilize the graph-building and wrapping tool $(\mathrm{GBW_Tool(\cdot)})$ with PyG (Fey & Lenssen, 2019) to add nodes and construct edges. Since user-uploaded graphs can have arbitrary node and edge types, we standardize the graphs as heterogeneous graphs, where $s_{i}$ and $r_{i}$ represent the types of each node and edge, respectively. Formally, a heterogeneous graph is constructed as:

我们的框架读取图节点和边，并使用图构建和封装工具将它们转换为具体的 Python 对象。值得注意的是，我们的模型可以处理各种图输入，无论是否存在具有预定义节点和边的显式图。为了简化，我们将演示用户上传预定义图的场景。例如，查询提示可能是：“...我想知道 ID 为 [305] 的节点属于哪个类别...” 并上传图文件，如 ["node_list.txt", "edge_list.txt"]。为了在 Python 中构建具体的图对象，我们利用图构建和封装工具 $(\mathrm{GBW_Tool(\cdot)})$ 和 PyG (Fey & Lenssen, 2019) 来添加节点并构建边。由于用户上传的图可以具有任意节点和边类型，我们将图标准化为异构图，其中 $s_{i}$ 和 $r_{i}$ 分别表示每个节点和边的类型。形式上，异构图构建如下：

\begin{array}{r l}&{\mathcal{G}^{\mathrm{exp}}=\mathrm{GBW}_{-}\mathrm{Tool}(\mathcal{V},\mathcal{E},\mathcal{N},\mathcal{R})}\\ &{\mathcal{G}^{\mathrm{skg}}=\mathrm{GBW}_{-}\mathrm{Tool}(\mathcal{V}_{\mathrm{skg}},\mathcal{E}_{\mathrm{skg}},\mathcal{N}_{\mathrm{skg}},\mathcal{R}_{\mathrm{skg}})}\end{array}```

where $\mathcal{V},\mathcal{E},\mathcal{N},\mathcal{R}$ represent the nodes, edges, node types, and edge types of the explicit graph, respectively. They are obtained by parsing the graph input $\mathbf{g}{s}$ . Similarly, $\bar{\mathcal{V}}{\mathrm{skg}},\mathcal{E}{\mathrm{skg}},\mathcal{N}{\mathrm{skg}},\mathcal{R}_{\mathrm{skg}}$ denote the corresponding graph components generated by the aforementioned Graph Generation Agent. This graph grounding module enables our model to convert graph data from various representations and forms into unified Python objects, facilitating their subsequent utilization.

其中，$\mathcal{V},\mathcal{E},\mathcal{N},\mathcal{R}$ 分别表示显式图的节点、边、节点类型和边类型。它们是通过解析图输入 $\mathbf{g}{s}$ 获得的。类似地，$\bar{\mathcal{V}}{\mathrm{skg}},\mathcal{E}{\mathrm{skg}},\mathcal{N}{\mathrm{skg}},\mathcal{R}_{\mathrm{skg}}$ 表示由上述图生成智能体生成的相应图组件。该图基础模块使我们的模型能够将各种表示和形式的图数据转换为统一的 Python 对象，便于后续使用。

2.3.3 Graph Token iz ation

2.3.3 图 Token 化

The Task Planning Agent converts discrete nodes and edges into embedded representations suitable for action agents based on graph LLMs. This token iz ation process consists of two stages: first, encoding the graph into embeddings, and second, retrieving the nodes and their neighbors to create input graph tokens. For the embedding process, we employ a pre-trained text encoder $f_{\mathrm{text_enc}}$ and a pre-trained GNN $f_{\mathrm{gnn}}$ . Graph tokens are generated by initially encoding the textual features c of the graph nodes and their meta types using the text encoder, followed by modeling geometric features.

任务规划智能体将离散的节点和边转换为适合基于图大语言模型的动作智能体的嵌入表示。这个 Token 化过程包括两个阶段：首先，将图编码为嵌入表示；其次，检索节点及其邻居以创建输入图 Token。在嵌入过程中，我们使用预训练的文本编码器 $f_{\mathrm{text_enc}}$ 和预训练的图神经网络 $f_{\mathrm{gnn}}$。图 Token 的生成首先通过文本编码器对图节点的文本特征 c 及其元类型进行编码，然后对几何特征进行建模。

\begin{array}{r l}&{\mathbf{e}_{i}^{\mathrm{text}}=f_{\mathrm{text\_encoder}}(\mathbf{c}_{i});\;\mathbf{e}_{s_{i}|r_{i}}^{\mathrm{text}}=f_{\mathrm{text\_encoder}}(\mathbf{c}_{s_{i}|r_{i}})}\\ &{\mathbf{e}_{i}^{\mathrm{gnn}}=f_{\mathrm{gnn}}(\mathbf{e}_{i}^{\mathrm{text}},\mathbf{e}_{s_{i}}^{\mathrm{text}},\mathbf{e}_{r_{i}}^{\mathrm{text}},\mathcal{V},\mathcal{E}).}\end{array}

For each central node $i$ in our heterogeneous graph, we systematically apply a graph sampling tool to create the subgraph input for the subsequent action agent, which can be formulated as follows:

对于我们异构图中的每个中心节点 $i$，我们系统地应用图采样工具来创建后续动作智能体的子图输入，其公式如下：

[\mathbf{e}_{N_{i}}^{\mathrm{gnn}}]=\mathrm{Sampling}_{-}\mathrm{Tool}(\mathcal{G},\mathbf{E}^{\mathrm{gnn}},i)

2.4 GRAPH ACTION AGENT

2.4 图动作智能体

To enhance the capabilities of graph encoding and prediction/generation, we incorporate a trainable Graph Action Agent into our GraphAgent framework, based on the Graph LLM architecture (Tang et al., 2024b; Chen et al., 2024a). This Graph Action Agent is specifically trained to optimize performance for both predictive and text generation tasks involving graph data.

为了增强图编码和预测/生成的能力，我们在GraphAgent框架中引入了一个可训练的图操作智能体（Graph Action Agent），该框架基于图大语言模型（Graph LLM）架构（Tang et al., 2024b; Chen et al., 2024a）。这个图操作智能体经过专门训练，以优化涉及图数据的预测和文本生成任务的性能。

2.4.1 Cross-Task Graph Agent

2.4.1 跨任务图智能体

The graph action agent is capable of handling two categories of diverse tasks, as shown below. The details on the system prompt builder and examples of system prompts are shown in Table 6.

图动作智能体能够处理以下两类不同的任务。系统提示构建器的详细信息及系统提示示例见表 6。

• Predictive Graph-Language Tasks. These tasks focus on generating predictions based on user prompts, utilizing both structured and unstructured data. Examples include node classification and link prediction for explicit graph data, as well as document classification based on extracted implicit semantic knowledge graphs (SKGs), such as categorizing news articles. When using implicit SKGs to complement explicit graphs, the graph generator agent uses the observed explicit nodes as initial scaffold nodes to build the SKG. Specifically, for these tasks, our model constructs a system prompt that effectively guides the LLM toward task-specific objectives:

• 预测性图语言任务。这些任务侧重于基于用户提示生成预测，利用结构化和非结构化数据。示例包括显式图数据的节点分类和链接预测，以及基于提取的隐式语义知识图 (SKG) 的文档分类，例如对新闻文章进行分类。当使用隐式 SKG 来补充显式图时，图生成器智能体使用观察到的显式节点作为初始支架节点来构建 SKG。具体来说，对于这些任务，我们的模型构建了一个系统提示，有效地引导大语言模型实现任务特定目标：

{\bf x}_{\mathrm{sys\mathrm{-}p r e d\mathrm{\underline{{~i~}}}}}=f_{\mathrm{sys}}({\bf t}_{\mathrm{usr}},{\bf x}_{\mathrm{usr\mathrm{\underline{{~ann}}}}},{\bf g}_{s}),

where the prompt builder function $f_{\mathrm{sys}}$ creates an appropriate system prompt based on the task type and user annotations, incorporating $\mathbf{g}_{s}$ for node or graph textual information. The predictive graph-language tasks are then defined as follows:

其中提示构建函数 $f_{\mathrm{sys}}$ 根据任务类型和用户注释创建适当的系统提示，并结合 $\mathbf{g}_{s}$ 用于节点或图文本信息。预测性图语言任务定义如下：

\mathbf{y}_{\mathrm{pred}},\mathbf{y}_{\mathrm{reasoning}}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys\mathrm{.}p r e d\mathrm{.}i}},\{\mathcal{G}^{\mathrm{exp}}|\mathcal{G}^{\mathrm{skg}}\}),

where ${\mathcal G^{\mathrm{exp}}|\mathcal G^{\mathrm{skg}}}$ indicates that the agent can utilize either $\mathcal{G}^{\mathrm{exp}},\mathcal{G}^{\mathrm{skg}}$ or both. In this context, the LLM generates accurate predictions and reasoning in response to the user’s query prompt.

其中 ${\mathcal G^{\mathrm{exp}}|\mathcal G^{\mathrm{skg}}}$ 表示智能体可以使用 $\mathcal{G}^{\mathrm{exp}}$、$\mathcal{G}^{\mathrm{skg}}$ 或两者。在这种情况下，大语言模型会根据用户的查询提示生成准确的预测和推理。

• Generative Graph-Language Tasks. The discovered SKGs can serve as robust and comprehensive references for generative language tasks, such as text generation and sum mari z ation. These openended tasks are typically prompted in a direct text format that implicitly contains knowledge, without the need for predefined graphs. For example, to summarize a news article, an SKG $\mathcal{G^{\mathrm{skg}}}$ is automatically constructed from the article’s content, which includes rich entities and connections that aid in the sum mari z ation task. Additionally, a system prompt is automatically generated to enhance the content generation quality using the graph-structured information, as follows:

• 生成式图语言任务。发现的SKG可以作为生成式语言任务（如文本生成和摘要）的稳健且全面的参考。这些开放式任务通常以直接文本格式提示，隐式包含知识，无需预定义的图。例如，为了总结一篇新闻文章，SKG $\mathcal{G^{\mathrm{skg}}}$ 会自动从文章内容中构建，其中包含丰富的实体和连接，有助于摘要任务。此外，系统提示会自动生成，以利用图结构信息提高内容生成质量，如下所示：

\begin{array}{r l r}&{\mathbf{x}_{\mathrm{sys\mathrm{-}g e n\mathrm{-}i}}=f_{\mathrm{sys}}(\mathbf{t}_{\mathrm{usr}},\mathbf{x}_{\mathrm{usr\mathrm{-}a n n}},\mathbf{g}_{s})}&\\ &{\qquad\mathbf{y}_{\mathrm{gen}}=\mathrm{LLM}(\mathbf{x}_{\mathrm{sys\mathrm{-}g e n\mathrm{-}i}},\mathcal{G}^{\mathrm{skg}}),}\end{array}

where $\mathbf{y}_{\mathrm{gen}}$ represents the generated textual output, with input parameters consistent with those used in predictive tasks. In this context, the LLM focuses on producing coherent and con textually accurate content based on both text and graph inputs.

其中 $\mathbf{y}_{\mathrm{gen}}$ 表示生成的文本输出，输入参数与预测任务中使用的参数一致。在这种情况下，大语言模型专注于基于文本和图输入生成连贯且上下文准确的内容。

2.4.2 Graph-Instruction Alignment

2.4.2 图-指令对齐

To teach our agent in comprehending graph-structured data, we implement graph-instruction alignment in the initial fine-tuning stage. Inspired by the work of Tang et al. (2024b), we utilize the efficient, effective, and easily scalable task of graph-instruction matching as our alignment target. Specifically, we present a set of graph token-instruction pairs:

为了教导我们的AI智能体理解图结构数据，我们在初始的微调阶段实施了图-指令对齐。受Tang等人 (2024b) 工作的启发，我们采用了高效、有效且易于扩展的图-指令匹配任务作为对齐目标。具体来说，我们提供了一组图Token-指令对：

\mathcal{D}^{g}=[(\mathbf{e}_{0},s_{0}),(\mathbf{e}_{1},s_{1}),...];\;\mathcal{D}^{c}=[(\mathbf{c}_{0},\mathbf{c}_{s_{0}}),(\mathbf{c}_{1},\mathbf{c}_{s_{1}}),...],

where $({\bf e}{i},s{i})$ denotes the i-th graph token with meta type $s_{i}$ , and $(\mathbf{c}{i},\mathbf{c}{s_{i}})$ denotes the text description of the i-th graph token and its meta type, correspondingly. We devise two general tasks to achieve fine-grained and comprehensive alignment between the graph tokens and the textual instructions:

其中 $({\bf e}{i},s{i})$ 表示第 i 个图 Token 及其元类型 $s_{i}$，而 $(\mathbf{c}{i},\mathbf{c}{s_{i}})$ 分别表示第 i 个图 Token 及其元类型的文本描述。我们设计了两个通用任务，以实现图 Token 与文本指令之间的细粒度和全面对齐：

• Intra-type alignment. This alignment task aims to strengthen the capability of LLMs to interpret graph embedding tokens of certain meta-types through promoting their alignment with the relevant texts. This is conducted by training LLMs to output correct sequence of the texts given a sequence of graph tokens. Specifically, we construct a dataset $\mathcal{D}^{\mathrm{intra}}$ with each entry consists of two sequences of graph tokens and texts, separately: $d_{i}^{\mathrm{intra}}=([(\mathbf{e}{j},s{i}),...],[(\mathbf{c}{k},\mathbf{c}{s_{i}}),...])$ . Then, we train the alignment with a next-token-prediction Cross-Entropy objective as follows:

• 类型内对齐。该对齐任务旨在通过促进大语言模型与相关文本的对齐，增强其解释特定元类型的图嵌入 Token 的能力。具体来说，我们通过训练大语言模型在给定图 Token 序列的情况下输出正确的文本序列来实现这一点。我们构建了一个数据集 $\mathcal{D}^{\mathrm{intra}}$，其中每个条目由两个序列组成，分别是图 Token 和文本序列：$d_{i}^{\mathrm{intra}}=([(\mathbf{e}{j},s{i}),...],[(\mathbf{c}{k},\mathbf{c}{s_{i}}),...])$。然后，我们使用下一个 Token 预测的交叉熵目标进行对齐训练，如下所示：

\begin{array}{r}{\mathrm{{argmin}}_{\Theta}\mathrm{{CE\mathrm{\mathrm{\mathrm{-}}L o s s}}}(d_{i}^{\mathrm{{intra}}}[0]|\mathrm{{LLM}}(d_{i}^{\mathrm{{intra}}}[1])),}\end{array}

where $\Theta$ denotes the learnable parameters of the large language model $\mathrm{LLM}(\cdot)$ . And indices [0] and [1] indicate the text sequence and the graph token sequence, respectively.

其中 $\Theta$ 表示大语言模型 $\mathrm{LLM}(\cdot)$ 的可学习参数。索引 [0] 和 [1] 分别表示文本序列和图 Token 序列。

• Inter-type alignment. As introducing multiple meta-types in the alignment task can further empower the LLM’s comprehension of complex heterogeneous relations, we devise anthor alignment training objective using inter-type graph tokens. Technically, the dataset $\mathcal{D}^{\mathrm{inter}}$ is constructed by sampling entries that consist graph tokens of different meta-types in the first sequence: $d_{i}^{\mathrm{inter}},=,(\bar{[}({\bf e}{m},\bar{s}{m}),({\bf e}{n},s{n}),\ldots],[({\bf c}{n},\bar{{\bf c}{s_{n}}}),({\bf c}{q},{\bf c}{s_{q}}),\ldots])$ . Then, the LLM is trained to predict the text sequence and the meta-type sequence of the provided graph tokens:

• 跨类型对齐。由于在对齐任务中引入多种元类型可以进一步增强大语言模型对复杂异构关系的理解，我们设计了另一种使用跨类型图 Token 的对齐训练目标。技术上，数据集 $\mathcal{D}^{\mathrm{inter}}$ 通过采样包含不同元类型的图 Token 的条目来构建：$d_{i}^{\mathrm{inter}},=,(\bar{[}({\bf e}{m},\bar{s}{m}),({\bf e}{n},s{n}),\ldots],[({\bf c}{n},\bar{{\bf c}{s_{n}}}),({\bf c}{q},{\bf c}{s_{q}}),\ldots])$。然后，训练大语言模型以预测提供的图 Token 的文本序列和元类型序列：

\begin{array}{r}{\mathrm{argmin}_{\Theta}\mathrm{CE\mathrm{\_Loss}}(d_{i}^{\mathrm{inter}}[0]|d_{i}^{\mathrm{inter}}[1])).}\end{array}

2.4.3 Agent Task Finetuning

2.4.3 AI智能体任务微调

To enhance GraphAgent’s performance on different agent tasks, we propose to finetune the action agent with diverse graph-language instructions covering different agent tasks. Recall that with the task planning agent we have the user requested task $\mathbf{t}\in\mathcal{T}$ from the query prompt. For each $\mathbf{t}$ in the instruction dataset, we pair it with a special systematic prompt to distinguish between various tasks during training. The systematic prompt contains brief description of the task being handled. Formally, the agent task finetuning dataset is constructed as:

为了提升 GraphAgent 在不同智能体任务上的表现，我们提出通过涵盖不同智能体任务的多样化图-语言指令来微调动作智能体。回顾一下，通过任务规划智能体，我们从查询提示中获取了用户请求的任务 $\mathbf{t}\in\mathcal{T}$。对于指令数据集中的每个 $\mathbf{t}$，我们将其与一个特殊的系统提示配对，以在训练过程中区分不同的任务。系统提示包含对所处理任务的简要描述。形式上，智能体任务微调数据集的构建如下：

\mathcal{D}^{m u l t i}=\big\{\big(\{\big(\mathbf{x}_{\mathrm{pred}},\mathbf{x}_{\mathrm{reasoning}}\big)\big|\mathbf{x}_{\mathrm{gen}}\big\},\big\{\mathcal{G}^{\mathrm{exp}}\big|\mathcal{G}^{\mathrm{skg}}\big\},\mathbf{t}_{i},\mathbf{a}_{i}\big)\big\},

Table 1: Dataset details for training and evaluation. "NC" is short for node classification.

表 1: 训练和评估数据集的详细信息。"NC" 是节点分类的缩写。

	IMDB	ACM	Arxiv-Papers	ICLR-PeerReviews	RelatedWorkGeneration	GovReportSummarization
任务类型	预测	预测	预测	预测	生成	生成
子任务	NC	NC	论文分类	论文评审预测	文本生成	文本摘要
预定义图？	√		×	×		×
训练样本数	2,400		5,175	3,141	4,155
评估样本数		1,000	500	500	500	304
Token 数	10M	0.8M	30M	45M	93M	2M
预定义图节点数	11,616	10,942
SKG 来源	人物实体	论文	论文	论文, 评审	多篇论文	文档
SKG 节点数	57,120	20,388	153,555	161,592	875,921	15,621

For each instruction-output pair, the graph provided can be a explicit graph, an automatically discovered SKG, or both. For predictive tasks, the output includes both a prediction and its reasoning, while for generative tasks, the output is the gold-standard objective.

对于每个指令-输出对，提供的图可以是一个显式图、自动发现的SKG，或两者兼有。对于预测任务，输出包括预测及其推理，而对于生成任务，输出则是黄金标准目标。

Further, to facilitate a smooth learning curve for multi-tasking the graph language model, we take inspiration from curriculum learning techniques (Xu et al., 2020; Bengio et al., 2009) and sort our training tasks into different difficulty levels. We start training with easier tasks to build the model’s foundational graph-language understanding. As training progresses, we gradually introduce more complex tasks to refine the model’s capabilities. The details are demonstrated in Table 8.

此外，为了促进图语言模型在多任务学习中的平滑学习曲线，我们借鉴了课程学习技术 (Xu et al., 2020; Bengio et al., 2009) 的灵感，将训练任务按难度级别排序。我们从较简单的任务开始训练，以建立模型的基础图语言理解能力。随着训练的进行，我们逐渐引入更复杂的任务，以提升模型的能力。具体细节如表 8 所示。

3 EVALUATION

3 评估

In this section, the effectiveness of our proposed GraphAgent framework is assessed through a detailed evaluation framework centered around several key Research Questions (RQs):

在本节中，我们通过围绕几个关键研究问题 (RQs) 的详细评估框架来评估我们提出的 GraphAgent 框架的有效性：

• RQ1: How effectively does our GraphAgent capture both graph relational information and the textual semantic inter-dependencies necessary for graph-related predictive tasks? • RQ2: How effective is GraphAgent at performing predictive tasks by capturing the complex but implicitly textual semantic inter-dependencies preserved within the textual data? • RQ3: How does our GraphAgent perform in graph-enhanced text generation tasks with implicit dependency understanding when compared to state-of-the-art large language models (LLMs)? • RQ4: What effects do the key components of our GraphAgent framework have on its overall performance, as demonstrated by the ablation studies?

• RQ1: 我们的 GraphAgent 在捕捉图关系信息和图相关预测任务所需的文本语义相互依赖性方面有多有效？
• RQ2: GraphAgent 在通过捕捉文本数据中保留的复杂但隐含的文本语义相互依赖性来执行预测任务时有多有效？
• RQ3: 与最先进的大语言模型 (LLMs) 相比，我们的 GraphAgent 在具有隐含依赖性理解的图增强文本生成任务中表现如何？
• RQ4: 通过消融研究，我们的 GraphAgent 框架的关键组件对其整体性能有何影响？

3.1 EXPERIMENTAL SETTINGS

3.1 实验设置

3.1.1 Implementation Details

3.1.1 实现细节

In our GraphAgent framework, the task planning agent and the graph generation agent are both powered by GPT3.5-Turbo. We enhance their performance in tackling user queries, planning tasks, and discovering semantic knowledge graphs (SKGs) by incorporating few-shot examples into the system prompts of the large language model (LLM). For graph grounding, we effectively utilize PyG to transform structural information into graph objects. In line with established practices, we employ Sentence-BERT (all-mpnet-base-v2) for text-attributed graph embedding, ensuring a robust semantic representation. For the graph action agent, we build it using Llama3-8b Llama Team (2024) as the foundational language model. To connect the textual semantic representation space with the graph-structural representation space Liu et al. (2024); Tang et al. (2024a), we incorporate a learnable adaptation linear layer. Additionally, we implement a heterogeneous graph model Tang et al. (2024b) that has been pre-trained using data from text-graph node pairs. The nodes are encoded with embeddings from the pre-trained model, projected through the learnable adaptation layer, and ultimately processed by the LLM along with relevant language tokens. This integrated approach facilitates seamless interaction between language understanding and graph-based reasoning.

在我们的 GraphAgent 框架中，任务规划智能体和图生成智能体均由 GPT3.5-Turbo 驱动。我们通过在大语言模型 (LLM) 的系统提示中加入少样本示例，提升了它们在处理用户查询、规划任务和发现语义知识图谱 (SKGs) 方面的性能。对于图接地 (graph grounding)，我们有效地利用 PyG 将结构信息转换为图对象。按照既定实践，我们使用 Sentence-BERT (all-mpnet-base-v2) 进行文本属性图嵌入，以确保强大的语义表示。对于图动作智能体，我们基于 Llama3-8b Llama Team (2024) 构建其基础语言模型。为了连接文本语义表示空间和图结构表示空间 Liu et al. (2024); Tang et al. (2024a)，我们引入了一个可学习的自适应线性层。此外，我们还实现了一个异构图模型 Tang et al. (2024b)，该模型已使用文本-图节点对数据进行预训练。节点通过预训练模型的嵌入进行编码，并通过可学习的自适应层进行投影，最终由 LLM 与相关语言 Token 一起处理。这种集成方法促进了语言理解和基于图的推理之间的无缝交互。

3.1.2 Datasets

3.1.2 数据集

To ensure usability across a diverse range of graph agent tasks, we utilize various datasets for evaluating the performance of our GraphAgent. A summary of these datasets is provided in Table 1.

为了确保在各种图智能体任务中的可用性，我们利用多个数据集来评估 GraphAgent 的性能。这些数据集的总结如表 1 所示。

• Graph-Related Predictive Tasks. For tasks that involve explicit graph relational information, we utilize two benchmark datasets: IMDB Fu et al. (2020) and ACM Wang et al. (2019a). In contrast, for predictive tasks that do not depend on explicit graph structures, we have curated two additional datasets: Arxiv-Papers He et al. (2023) and ICLR-Peer Reviews1. The Arxiv-Papers dataset comprises published papers from Arxiv in 2023, from which we randomly sampled a subset. This dataset is created by analyzing the titles and abstracts of these papers to classify whether they are likely to be accepted. The ICLR-Peer Reviews dataset features pairs of papers and their corresponding reviews from ICLR 2024, specifically focusing on borderline cases that pose challenges in determining acceptance. This dataset is used for both training and testing purposes.

• 图相关的预测任务。对于涉及显式图关系信息的任务，我们使用了两个基准数据集：IMDB (Fu et al., 2020) 和 ACM (Wang et al., 2019a)。相比之下，对于不依赖于显式图结构的预测任务，我们整理了两个额外的数据集：Arxiv-Papers (He et al., 2023) 和 ICLR-Peer Reviews1。Arxiv-Papers 数据集包含 2023 年 Arxiv 上发表的论文，我们从中随机抽取了一个子集。该数据集通过分析这些论文的标题和摘要来分类它们是否可能被接受。ICLR-Peer Reviews 数据集则包含 ICLR 2024 的论文及其对应的评审，特别关注那些在决定是否接受时具有挑战性的边缘案例。该数据集用于训练和测试目的。

• Graph-Enhanced Text Generation. To demonstrate the text generation capabilities of model, we evaluate its performance in generating related work for research papers and summarizing lengthy documents using graph-enhanced semantic dependencies. First, we collected datasets from the ACL and EMNLP conferences, covering the years 2020 to 2023, including both the "main" and "findings" tracks. We extracted the related work sections from these papers and organized them into approximately 5,000 topic-content pairs. For generating related work, GraphAgent takes a list of paper titles and their corresponding abstracts—input that can be provided by users. Using this information, scaffold knowledge graphs are created and subsequently processed by the Graph Action Agent, which comprehends the data to produce comprehensive related work for the specified papers. Second, we utilize the GovReport dataset 2 to evaluate GraphAgent as a language assistant for document sum mari z ation. This dataset comprises detailed reports from government research agencies, including the Congressional Research Service and the U.S. Government Accountability Office. It necessitates the sum mari z ation of longer documents, maintaining richer context and semantic interdependencies, unlike other sum mari z ation datasets.

• 图增强文本生成。为了展示模型的文本生成能力，我们评估了其在生成研究论文相关工作部分和总结长篇文档方面的表现，利用图增强的语义依赖关系。首先，我们从 ACL 和 EMNLP 会议中收集了 2020 年至 2023 年的数据集，包括“主”轨道和“发现”轨道。我们从这些论文中提取了相关工作部分，并将其组织成大约 5,000 个主题-内容对。对于生成相关工作，GraphAgent 接收用户提供的论文标题及其对应摘要列表。利用这些信息，创建支架知识图，随后由图动作代理处理，理解数据以生成指定论文的全面相关工作。其次，我们使用 GovReport 数据集 2 来评估 GraphAgent 作为文档摘要的语言助手。该数据集包含来自政府研究机构的详细报告，包括国会研究服务处和美国政府问责办公室。与其他摘要数据集不同，它需要对较长文档进行摘要，同时保持更丰富的上下文和语义依赖关系。

3.1.3 Baseline Methods

3.1.3 基线方法

We incorporate a diverse range of baseline models from various research domains to ensure a comprehensive comparison. Specifically, we examine methods for graph-related predictive tasks, including homogeneous GNNs, heterogeneous models, and graph LLMs. Additionally, we utilize and compare state-of-the-art large language models—both open-source and closed-source—alongside retrieval-augmented generation (RAG) systems for enhanced text generation.

我们整合了来自不同研究领域的多种基线模型，以确保全面的比较。具体来说，我们研究了与图相关的预测任务的方法，包括同质图神经网络 (GNN)、异质模型和图大语言模型。此外，我们还使用并比较了最先进的大语言模型——包括开源和闭源模型——以及检索增强生成 (RAG) 系统，以增强文本生成能力。

• Graph-Related Predictive Tasks. We consider baseline methods from three key areas: i) Homogeneous GNNs, which include SAGE Hamilton et al. (2017) and GAT Velickovic et al. (2018) as representative models; ii) Heterogeneous Graph Models, featuring the specialists such as HAN Wang et al. (2019b), HGT Hu et al. (2020), and HetGNN Zhang et al. (2019); and iii) Graph LLMs, for which we adopt HiGPT Tang et al. (2024b), a state-of-the-art heterogeneous graph language model that is particularly well-suited for managing complex heterogeneous structures.

• 图相关预测任务。我们考虑了三个关键领域的基线方法：i) 同质图神经网络 (Homogeneous GNNs)，包括 SAGE (Hamilton et al., 2017) 和 GAT (Velickovic et al., 2018) 作为代表性模型；ii) 异质图模型 (Heterogeneous Graph Models)，包括 HAN (Wang et al., 2019b)、HGT (Hu et al., 2020) 和 HetGNN (Zhang et al., 2019) 等专家模型；iii) 图大语言模型 (Graph LLMs)，我们采用了 HiGPT (Tang et al., 2024b)，这是一种最先进的异质图语言模型，特别适合处理复杂的异质结构。

• Graph-Enhanced Text Generation. We utilize a variety of state-of-the-art large language models (LLMs), categorized as follows: i) Open-Source LLMs include the Llama 3 series Llama Team (2024), Mistral $\mathrm{NeMo}^{3}$ , and Qwen2-72b Yang et al. (2024); ii) Closed-Source Commercial LLMs consist of Deepseek-Chat-V2, GPT4o-mini, and Gemini-1.5-Flash, using their API services for empirical results; iii) LLM-empowered RAG Systems. We also compare GraphAgent with GraphRAG4, which enhances LLMs through graph-based retrieval-augmented generation.

• 图增强文本生成。我们利用多种先进的大语言模型 (LLM)，分类如下：i) 开源 LLM 包括 Llama 3 系列 Llama Team (2024)、Mistral $\mathrm{NeMo}^{3}$ 和 Qwen2-72b Yang et al. (2024)；ii) 闭源商业 LLM 包括 Deepseek-Chat-V2、GPT4o-mini 和 Gemini-1.5-Flash，使用其 API 服务获取实证结果；iii) LLM 赋能的 RAG 系统。我们还将 GraphAgent 与 GraphRAG4 进行比较，后者通过基于图的检索增强生成来增强 LLM。

3.1.4 Evaluation Protocols

3.1.4 评估协议

We implement comprehensive and consistent training strategies across all models. We apply full fine-tuning for our model and all baseline models requiring supervised fine-tuning. For model selection, we utilize validation sets with early-stopping for predictive tasks, while monitoring training loss decreasing rate for alignment training and generative tasks. To ensure fair comparison, we maintain consistent feature encoder (all-mpnet-base-v2) across all models including GNNs and Graph LLMs. We use identical prompt templates across all LLM-based models, with GraphLLMs receiving additional graph tokens for embedding injection and basic meta type descriptions (detailed in Table 6). The iterative steps are set to 2 for discovering two-hop knowledge graphs per query prompt.

我们在所有模型中实施全面且一致的训练策略。对于我们的模型以及所有需要监督微调的基线模型，我们应用全量微调。在模型选择方面，我们利用验证集进行预测任务的早停，同时监控训练损失下降率以进行对齐训练和生成任务。为了确保公平比较，我们在所有模型（包括 GNN 和 Graph LLM）中保持一致的特性编码器（all-mpnet-base-v2）。我们在所有基于 LLM 的模型中使用相同的提示模板，GraphLLM 接收额外的图 token 用于嵌入注入和基本元类型描述（详见表 6）。每次查询提示发现两跳知识图谱的迭代步骤设置为 2。

Table 2: Zero-shot learning performance evaluation: We assess our model’s transfer capabilities by training on IMDB dataset with few-shot learning, then evaluating node classification performance on ACM dataset under zero-shot conditions, utilizing both graph structural and textual information.

表 2: 零样本学习性能评估：我们通过在 IMDB 数据集上进行少样本学习来训练模型，然后在零样本条件下利用图结构和文本信息评估 ACM 数据集上的节点分类性能，以此来评估模型的迁移能力。

指标	训练集	SAGE	GAT	HAN	HGT	HetGNN	HiGPT	GraphAgent	提升
Micro-F1 (%)	IMDB-1	32.93±4.18	335.67±0.5334.07±1.11			32.40±0.1437.43±4.34	445.40±0.89	51.21±1.32	12.8%
Micro-F1 (%)	IMDB-40	31.73±0.05	23.93±1.44	26.97±1.94	435.60±0.99		31.80±0.1650.50±0.77	74.98±1.24	48.5%
Macro-F1 (%)	IMDB-1	26.47±2.69	29.08±1.31	22.50±4.16	16.31±0.05	31.39±4.68	841.77±1.24	46.82±1.43	12.1%
Macro-F1 (%)	IMDB-40	31.17±0.17	21.41±0.71	23.13±1.32	27.49±1.22	231.44±0.17	45.85±0.89	74.98±1.12	63.5%
AUC (%)	IMDB-1	49.34±2.475				53.18±2.95	559.69±0.82	64.10±1.25	7.4%
AUC (%)	IMDB-40	48.67±0.1343.20±1.08		45.45±1.46	551.48±0.43	48.72±0.06	63.60±0.51	80.90±1.01	27.2%

For evaluation, we adopt different metrics based on task types. In graph-related predictive tasks with ground truth, we use Micro-F1 (Mi-F1), Macro-F1 (Ma-F1), and AUC metrics. For graph-enhanced generative tasks that are open-ended, we primarily rely on the PPL score using state-of-the-art models (Llama3-70b, Qwen2-72b) to measure fluency, rather than reference-based similarity metrics which can be misleading due to their limitations in text generation evaluation. Additionally, we incorporate the LLM-as-judge approach for better approximation of human judgment. This comprehensive evaluation framework ensures robust and meaningful comparison across different model architectures while addressing the limitations of conventional evaluation metrics for generative tasks.

为了评估，我们根据任务类型采用不同的指标。在有真实标签的图相关预测任务中，我们使用 Micro-F1 (Mi-F1)、Macro-F1 (Ma-F1) 和 AUC 指标。对于开放式的图增强生成任务，我们主要依赖使用最先进模型（Llama3-70b、Qwen2-72b）的 PPL 分数来衡量流畅性，而不是基于参考的相似性指标，因为这些指标在文本生成评估中存在局限性，可能会产生误导。此外，我们还引入了 LLM-as-judge 方法，以更好地近似人类判断。这种全面的评估框架确保了在不同模型架构之间进行稳健且有意义的比较，同时解决了生成任务中传统评估指标的局限性。

3.2 GRAPH PREDICTION TASK WITH EXPLICIT AND IMPLICIT GRAPH CONTEXTS (RQ1)

3.2 显式和隐式图上下文下的图预测任务 (RQ1)

We investigate GraphAgent’s performance on graph-related prediction tasks, specifically node classification with explicit graph structures. Our approach enhances existing methods by automatically incorporating a semantic knowledge graph from node text, utilizing both the semantic KG and explicit graph connections as dual sources for graph token input. Following recent works Tang et al. (2024a;b); Chen et al. (2024a), we employ a fully zero-shot evaluation framework to better assess real-world applicability. Our experimental setup involves training models on the IMDB dataset under few-shot settings (1 shot and 40 shots), then evaluating performance on 1,000 previously unseen nodes from the ACM dataset. For our method and other LLM-enhanced approaches, we incorporate Chain-of-Thought Wei et al. (2022) for inference augmentation.

我们研究了 GraphAgent 在图相关预测任务中的表现，特别是具有显式图结构的节点分类任务。我们的方法通过自动从节点文本中融入语义知识图谱 (Semantic Knowledge Graph)，利用语义知识图谱和显式图连接作为图 Token 输入的双重来源，增强了现有方法。根据最近的研究 Tang et al. (2024a;b); Chen et al. (2024a)，我们采用完全零样本评估框架，以更好地评估实际应用中的适用性。我们的实验设置包括在 IMDB 数据集上以少样本设置（1 样本和 40 样本）训练模型，然后在 ACM 数据集的 1,000 个未见过的节点上评估性能。对于我们的方法和其他大语言模型增强方法，我们融入了 Chain-of-Thought Wei et al. (2022) 进行推理增强。

The results summarized in Table 2 demonstrate that our agent-based approach GraphAgent significantly advances the state-of-the-art in predictive graph tasks. Specifically, GraphAgent achieves an average improvement of over $28%$ across all metrics compared to the previous state-of-the-art graph language model, HiGPT. These substantial improvements stem from the synergistic integration of several key components: a graph generation agent, an automated task planning agent, and dual fine-tuning mechanisms (graph-text alignment and agent task fine-tuning). Together, these components enable GraphAgent to excel at constructing rich semantic knowledge graphs, capturing comprehensive inter-dependencies, and understanding complex relationships in both structured and unstructured graph contexts. This architecture translates into superior performance across downstream tasks.

表 2 中总结的结果表明，我们基于 AI智能体的方法 GraphAgent 在预测图任务中显著推进了现有技术水平。具体而言，与之前最先进的图语言模型 HiGPT 相比，GraphAgent 在所有指标上平均提升了超过 $28%$。这些显著的改进源于几个关键组件的协同整合：图生成 AI智能体、自动化任务规划 AI智能体以及双重微调机制（图-文本对齐和 AI智能体任务微调）。这些组件共同使 GraphAgent 能够出色地构建丰富的语义知识图谱，捕捉全面的相互依赖性，并理解结构化和非结构化图上下文中的复杂关系。这种架构转化为下游任务的卓越性能。

3.3 GRAPH PREDICTION WITH IMPLICIT SEMANTIC INTERDEPENDENCIES (RQ2)

3.3 具有隐式语义依赖关系的图预测 (RQ2)

Figure 3: Performance comparison with state-ofthe-art LLMs on complex graph prediction tasks involving implicit semantic relationships. Results marked with * indicate statistical significance $({\mathsf{p}}{<}0.01)$ compared to the second-best performer.

图 3: 在涉及隐式语义关系的复杂图预测任务中，与最先进的大语言模型的性能比较。标记为 * 的结果表示与第二佳表现者相比具有统计显著性 $({\mathsf{p}}{<}0.01)$。

方法	模型大小	Arxiv-Papers			ICLR-PeerReviews
		Mi-F1	Ma-F1	AUC	Mi-F1	Ma-F1	AUC
开源大语言模型
Llama3-8b	8B	0.514	0.289	0.527	0.402	0.394	0.502
Mistral-Nemo	12B	0.510	0.292	0.615	0.272	0.246	0.380
Llama3-70b	70B	0.630	0.330	0.635	0.434	0.421	0.551
Qwen2-72b	72B	0.632	0.472	0.700	0.344	0.277	0.509
基于API的商业大语言模型
Deepseek-Chat-V2	236B→→21B	30.746	0.580	0.757	0.362	0.312	0.516
GPT4o-mini		0.592	0.343	0.634	0.692*	0.592	0.591
Gemini-1.5-Flash		0.748	0.504	0.714	0.684	0.487	0.533
微调的大语言模型
Llama3-8bFinetuned	8B	0.794	0.593	0.736	0.620	0.554	0.553
GraphRAG实现
Llama3-8b+ GraphRAG	8B	0.516	0.288	0.601	0.430	0.427	0.517
Llama3-70b+GraphRAG	70B	0.603	0.324	0.623	0.308	0.296	0.401
GraphAgent-TaskExpert	8B	0.820	0.620	0.768	0.686	0.620*	0.615*
GraphAgent-General	8B	0.840*	0.621*	0.769*	0.667	0.604	0.607
GraphAgent-Zero-Shot	8B	0.739	0.512	0.701	0.538	0.531	0.563

We evaluate GraphAgent’s effectiveness on predictive tasks that require understanding complex semantic interdependencies, comparing against state-of-the-art LLMs. For these tasks, GraphAgent constructs semantic knowledge graphs (SKGs) by extracting implicit relational patterns through its dual-agent system of task planning and graph generation. The resulting SKG nodes act as semantic anchors, enriching the input representation through embedded and tokenized forms. Our empirical evaluation on Arxiv-Papers and ICLR-Peer Reviews datasets (Table 3) demonstrates GraphAgent’s capabilities across three configurations: task-specific (GraphAgent-Task Expert), comprehensive (GraphAgent-General), and zero-shot generalization (GraphAgent-Zero-Shot). Unlike conventional GNNs and GraphLLMs that require explicit graph structures, GraphAgent competes directly with leading LLMs of various scales, including fine-tuned and GraphRAG-augmented variants. The experimental results reveal three distinct advantages of our approach:

我们在需要理解复杂语义依赖关系的预测任务上评估了 GraphAgent 的有效性，并与最先进的大语言模型进行了比较。在这些任务中，GraphAgent 通过其任务规划和图生成的双智能体系统提取隐式关系模式，构建语义知识图谱 (SKG)。生成的 SKG 节点作为语义锚点，通过嵌入和 Token 化的形式丰富了输入表示。我们在 Arxiv-Papers 和 ICLR-Peer Reviews 数据集上的实证评估（表 3）展示了 GraphAgent 在三种配置下的能力：任务特定（GraphAgent-Task Expert）、全面（GraphAgent-General）和零样本泛化（GraphAgent-Zero-Shot）。与需要显式图结构的传统 GNN 和 GraphLLM 不同，GraphAgent 直接与各种规模的领先大语言模型竞争，包括微调和 GraphRAG 增强的变体。实验结果揭示了我们方法的三个显著优势：

• Superior Performance with Smaller Model Size. Despite having only 8B parameters, GraphAgent consistently outperforms larger LLMs, including Llama3-70b and Qwen2-72b, achieving a $31.9%$ improvement across all metrics on both datasets. By explicitly capturing complex interdependencies via semantic graph structures while maintaining contextual awareness across different semantic levels, GraphAgent effectively integrates both local and global information patterns. This architectural approach enables robust handling of intricate reasoning tasks, where both detailed semantic relationships and broader contextual coherence are crucial for accurate predictions.

• 小模型尺寸下的卓越性能。尽管仅有 8B 参数，GraphAgent 始终优于更大的大语言模型，包括 Llama3-70b 和 Qwen2-72b，在两个数据集的所有指标上实现了 31.9% 的提升。通过语义图结构显式捕捉复杂的相互依赖关系，同时保持跨不同语义层次的上下文感知，GraphAgent 有效地整合了局部和全局信息模式。这种架构方法使其能够稳健地处理复杂的推理任务，在这些任务中，详细的语义关系和更广泛的上下文连贯性对于准确预测至关重要。

• Robust Generalization Through Multi-task and Zero-shot Learning. GraphAgent exhibits exceptional adaptability and robust performance across different learning scenarios. The multitask variant, GraphAgent-General, demonstrates superior performance compared to task-specific models on Arxiv-Papers, showcasing enhanced comprehension and reasoning capabilities over text-graph pairs through self-constructed SKGs. While there is a modest performance trade-off on ICLR-Peer Reviews, the multi-task model maintains competitive results comparable to specialized versions. Notably, GraphAgent shows impressive zero-shot generalization: even with domain transfer challenges, our 8B model achieves performance parity with state-of-the-art LLMs like Deepseek-Chat-V2 and Gemini-1.5-Flash. These findings demonstrate how our approach of integrating semantic knowledge graphs and specialized tuning techniques can significantly enhance model capabilities through structured knowledge representation.

• 通过多任务和零样本学习实现鲁棒泛化。GraphAgent 在不同的学习场景中表现出卓越的适应性和鲁棒性能。多任务变体 GraphAgent-General 在 Arxiv-Papers 上展示了优于特定任务模型的性能，通过自构建的 SKG（语义知识图谱）增强了对文本-图对的理解和推理能力。尽管在 ICLR-Peer Reviews 上存在轻微的性能折衷，但多任务模型仍保持了与专用版本相当的竞争力。值得注意的是，GraphAgent 展示了令人印象深刻的零样本泛化能力：即使在面临领域转移挑战时，我们的 8B 模型也能与 Deepseek-Chat-V2 和 Gemini-1.5-Flash 等最先进的大语言模型达到性能持平。这些发现表明，通过整合语义知识图谱和专用调优技术，我们的方法能够通过结构化知识表示显著增强模型能力。

• Superior Performance over Vanilla SFT and GraphRAG. Comparative experiments demonstrate GraphAgent’s significant advantages over both vanilla supervise fine-tuning (SFT) LLMs and GraphRAG implementations. This performance gain can be attributed to two key factors: First, compared to vanilla supervised fine-tuning SFT LLMs, GraphAgent effectively leverages the LLM’s knowledge base through our semantic KG integration paradigm, leading to enhanced performance. Second, while GraphRAG uses the same knowledge references, GraphAgent’s graph embedding token approach provides a more efficient and consolidated knowledge representation. This not only reduces input token overhead but also helps mitigate LLM hallucination through structured knowledge encoding, ultimately resulting in more reliable and robust performance.

• 相较于普通 SFT 和 GraphRAG 的卓越性能。对比实验表明，GraphAgent 相较于普通的监督微调 (SFT) 大语言模型和 GraphRAG 实现具有显著优势。这一性能提升可归因于两个关键因素：首先，与普通的监督微调 SFT 大语言模型相比，GraphAgent 通过我们的语义知识图谱 (KG) 集成范式有效利用了大语言模型的知识库，从而提升了性能。其次，尽管 GraphRAG 使用了相同的知识参考，但 GraphAgent 的图嵌入 Token 方法提供了更高效和整合的知识表示。这不仅减少了输入 Token 的开销，还通过结构化知识编码帮助减轻了大语言模型的幻觉问题，最终实现了更可靠和稳健的性能。

3.4 GRAPH-ENHANCED TEXT GENERATION (RQ3)

3.4 图增强文本生成 (RQ3)

We evaluate GraphAgent’s performance on graph-enhanced text generation tasks using both perplexity (PPL) metrics and LLM-based assessment. Results for our evaluated text generation tasks are presented in Table 4 and Figure 6, while zero-shot generalization results on GovReport data are shown in Table 5.

我们使用困惑度 (PPL) 指标和基于大语言模型的评估方法，评估了 GraphAgent 在图增强文本生成任务中的表现。评估的文本生成任务结果如表 4 和图 6 所示，而在 GovReport 数据上的零样本泛化结果如表 5 所示。

• Enhanced Generation Quality via Lower Perplexity. Table 4 demonstrates GraphAgent’s superior performance with lower perplexity scores as compared to baselines, as validated by both Llama3-70b and Qwen2-72b. The generated content exhibits enhanced fluency and clarity compared to larger LLMs. We observe that both SFT and GraphRAG variants show performance degradation, indicating that neither simple input-output fine-tuning nor direct knowledge injection through prompts can effectively

• 通过降低困惑度提升生成质量。表 4 展示了 GraphAgent 在困惑度得分上相较于基线模型的优越性能，这一点已通过 Llama3-70b 和 Qwen2-72b 的验证。与更大的大语言模型相比，生成的内容表现出更高的流畅性和清晰度。我们观察到，无论是 SFT 还是 GraphRAG 变体，其性能均有所下降，这表明简单的输入输出微调或通过提示直接注入知识都无法有效提升生成质量。

Figure 4: Performances on ACL-EMNLP related works content generation. Light grey denotes that the score is computed with the same-family model.

图 4: ACL-EMNLP 相关工作内容生成的性能。浅灰色表示分数是用同系列模型计算的。

方法	模型大小	PPL-Llama3-70b 均值	PPL-Llama3-70b 最大值	PPL-Qwen2-72b 均值	PPL-Qwen2-72b 最大值
开源大语言模型
Llama3-8b	8B	7.016	13.061	7.491	12.787
Mistral-Nemo	12B	7.367	15.967	6.872	12.065
Llama3-70b	70B	6.168	14.436	5.877	12.897
Qwen2-72b	72B	6.043	11.675	5.325	11.302
基于API的商业大语言模型
Deepseek-Chat-V2	236B→21B	5.632	13.483	5.144	10.337
GPT4o-mini	-	7.277	15.480	6.818	13.267
Gemini-1.5-Flash	-	5.188	10.399	5.377	10.779
微调的大语言模型
Llama3-8b 微调	8B	7.682	19.452	7.629	18.757
GraphRAG 实现
Llama3-8b + GraphRAG	8B	7.098	18.092	6.539	14.722
Llama3-70b + GraphRAG	70B	6.590	14.827	6.135	14.163
GraphAgent-TaskExpert	8B	3.805	10.316	4.069	11.685
GraphAgent-General	8B	3.618*	8.000*	3.867*	8.775*

capture the complex reasoning patterns required for understanding intricate contextual relationships. In contrast, our approach leverages automatically constructed semantic knowledge graphs to substantially enhance the model’s reasoning and comprehension capabilities.

捕捉理解复杂上下文关系所需的复杂推理模式。相比之下，我们的方法利用自动构建的语义知识图谱，显著增强了模型的推理和理解能力。

• Superior Generation Quality via LLM-based Evaluation. To rigorously validate our model’s alignment with human preferences for the generated content, we employed the LLM-as-judge methodology Zheng et al. (2024), which demonstrates stronger correlation with human judgment compared to traditional metrics like BLEU Papineni et al. (2002) and ROUGE Lin (2004). Using GPT-4 as the judge (evaluation prompts detailed in Table 4), we compared GraphAgent against several strong baselines: Llama3-8b, Llama3-8b fine-tuned, Mistral Nemo, and Llama3-70b.

• 通过基于大语言模型的评估实现卓越生成质量。为了严格验证我们的模型与人类对生成内容偏好的对齐程度，我们采用了 Zheng 等人 (2024) 提出的 LLM-as-judge 方法，该方法相比传统的 BLEU Papineni 等人 (2002) 和 ROUGE Lin (2004) 等指标，与人类判断的相关性更强。我们使用 GPT-4 作为评判者（评估提示详见表 4），将 GraphAgent 与几个强大的基线模型进行了比较：Llama3-8b、Llama3-8b 微调版、Mistral Nemo 和 Llama3-70b。

Evaluation on 200 samples from the text generation test set (Figure 6) demonstrates GraphAgent’s superior performance: achieving $114%$ quality improvement over Llama3-8b and $45%$ over Llama3- 70b. GraphAgent generates higher quality content in $67%$ of cases compared to same-sized models and outperforms leading open-source models in $58%$ of instances, despite having only 8B parameters and requiring minimal additional input overhead. These results validate our GraphAgent’s effectiveness in leveraging semantic knowledge graphs for enhanced text generation capabilities.

在文本生成测试集的 200 个样本上的评估（图 6）展示了 GraphAgent 的卓越性能：相比 Llama3-8b 实现了 114% 的质量提升，相比 Llama3-70b 提升了 45%。GraphAgent 在 67% 的情况下生成的内容质量优于同规模模型，并且在 58% 的实例中表现优于领先的开源模型，尽管它仅有 8B 参数且需要极少的额外输入开销。这些结果验证了 GraphAgent 在利用语义知识图谱增强文本生成能力方面的有效性。

• Cross-domain Performance on Document Sum mari z ation. The effectiveness of GraphAgent extends beyond academic writing to document sum mari z ation tasks, as demonstrated in our graph-enhanced text generation evaluation on GovReport data (Table 7 shown in Appendix). Notably, without any task-specific optimization, GraphAgent exhibits strong structural reasoning abilities by generating well-organized summaries (highlighted in green ). This successful transfer of capabilities across domains underscores the model’s robust generalization poten

• 跨领域文档摘要性能。GraphAgent 的有效性不仅限于学术写作，还扩展到文档摘要任务，正如我们在 GovReport 数据上的图增强文本生成评估中所展示的那样（附录中的表 7）。值得注意的是，在没有进行任何任务特定优化的情况下，GraphAgent 通过生成结构良好的摘要（以绿色突出显示）展示了强大的结构推理能力。这种跨领域能力的成功转移突显了模型的强大泛化潜力。

Figure 5: GovReport sum mari z ation performance. Evaluation scores are presented with same-family model comparisons highlighted in light grey.

图 5: GovReport 摘要性能。评估分数以浅灰色突出显示同系列模型的比较。

方法	模型大小	PPL-Llama3-70b 均值	PPL-Llama3-70b 最大值	PPL-Qwen2-72b 均值	PPL-Qwen2-72b 最大值
Llama3-8b	8B	9.476	25.355	7.564	17.443
Mistral-Nemo	12B	9.333	28.537	7.194	19.347
Llama3-70b	70B	6.473	14.724	5.629	11.813
Qwen2-72b	72B	7.134	16.075	5.494	11.294
Deepseek-Chat-V2	236B→21B	8.246	21.176	7.311	18.092
GPT4o-mini	-	10.332	23.300	6.576	10.213
Gemini-1.5-Flash	-	7.374	18.408	6.133	9.237
GraphAgent-General	8B	6.736	20.362	5.936	27.196

tial. Experimental results from Table 5 shown in Appendix demonstrate GraphAgent’s competitive performance in zero-shot generative tasks with graphs. The model achieves significantly lower perplexity (PPL) scores compared to same-sized counterparts like Llama3-8b and even the larger Mistral-Nemo. Moreover, GraphAgent matches the fluency levels of leading closed-source and open-source LLMs in generating GovReport summaries. These findings suggest that our approach of automatically extracting and leveraging semantic knowledge graphs from input content, combined with diverse multi-task graph-based training, enables robust zero-shot performance.

表 5 中的实验结果（见附录）展示了 GraphAgent 在图生成任务中的零样本竞争性能。与相同规模的模型（如 Llama3-8b）甚至更大的 Mistral-Nemo 相比，该模型实现了显著更低的困惑度 (PPL) 分数。此外，GraphAgent 在生成 GovReport 摘要时，与领先的闭源和开源大语言模型的流畅度相当。这些发现表明，我们通过从输入内容中自动提取并利用语义知识图谱，结合多样化的多任务图训练，能够实现强大的零样本性能。

3.5 QUALITATIVE ANALYSIS OF GRAPH-ENHANCED TEXT GENERATION TASKS

3.5 图增强文本生成任务的定性分析

We evaluated GraphAgent against Llama3-8b and Llama3-70b on two distinct graph-enhanced text generation tasks, with results presented in Tables 8 and 7 (Appendix). The experiments demonstrate GraphAgent’s significant performance advantages over Llama3-8b while achieving comparable results to the much larger Llama3-70b. Notably, in academic writing tasks (Table 8), GraphAgent effectively leverages knowledge graphs to capture citation relationships and research development paths, producing well-organized summaries (highlighted in green ). In contrast, Llama3-8b exhibits no- table limitations in both instruction following and citation formatting accuracy (highlighted in

我们在两个不同的图增强文本生成任务上对 GraphAgent 与 Llama3-8b 和 Llama3-70b 进行了评估，结果如表 8 和表 7（附录）所示。实验表明，GraphAgent 相比 Llama3-8b 具有显著的性能优势，同时在与更大的 Llama3-70b 相比时取得了相当的结果。值得注意的是，在学术写作任务中（表 8），GraphAgent 有效地利用知识图谱捕捉引用关系和研究发展路径，生成了结构良好的摘要（以绿色高亮显示）。相比之下，Llama3-8b 在指令遵循和引用格式准确性方面表现出明显的局限性（以高亮显示）。

Figure 6: Comparative evaluation results: GPT4o as judge assessing our proposed GraphAgent framework against state-of-the-art open-source LLMs.

图 6: 对比评估结果：GPT4o 作为评判者，评估我们提出的 GraphAgent 框架与最先进的开源大语言模型。

This section presents our automatically generated semantic knowledge graphs (SKGs) through two visualized examples in Tables 9 and 10 from GovReport and Arxiv datasets. We visualize each SKG at two levels: $k=0$ hop showing high-level aspect nodes (highlighted in green ) and $k=1$ hop displaying keyword nodes (highlighted in blue ), along with augmented textual attributes for $k,=,0$ scaffold nodes. These examples demonstrate our Graph Generation Agent’s capability in extracting crucial information, revealing multi-hop relationships, and generating coherent semantic representations. For the related work text generation, while containing paper-based SKGs as multiple sub-graphs, individual papers follow similar SKG patterns to the Arxiv examples.

本节通过表 9 和表 10 中的两个可视化示例展示了我们从 GovReport 和 Arxiv 数据集中自动生成的语义知识图谱 (SKG)。我们在两个层次上对每个 SKG 进行可视化：$k=0$ 跳显示高层方面节点（以绿色高亮显示），$k=1$ 跳显示关键词节点（以蓝色高亮显示），同时为 $k,=,0$ 的支架节点增加了文本属性。这些示例展示了我们的图生成智能体在提取关键信息、揭示多跳关系以及生成连贯语义表示方面的能力。对于相关工作文本生成，虽然包含基于论文的 SKG 作为多个子图，但个别论文遵循与 Arxiv 示例类似的 SKG 模式。

Table 3: Illustrative Examples of GraphAgent’s Multi-Agent Pipeline.

表 3: GraphAgent 多智能体管道的示例

[Agent (Task Planning Agent)]

[智能体 (任务规划智能体)]

Sure! I will first decompose your request as follows for further actions: <graph source: [uploaded files]>, , : <"Can you tell me which category does the movie with the target node? Is it action, comedy or drama?">

好的！我将首先分解您的请求以便进一步操作：<图来源：[上传的文件]>，<任务类型：预测预定义>，<用户注释>：<"你能告诉我目标节点的电影属于哪个类别吗？是动作、喜剧还是剧情？">

[Agent (Task Planning Agent)]

[智能体 (任务规划智能体)]

Since pre-defined graphs are given, I will try to ground the edge and node list files into readable objects. Then I will encode the graph and index the neighbors for node ID [7] for further actions.

由于预定义的图已经给出，我将尝试将边和节点列表文件转换为可读对象。然后我将对图进行编码，并为节点 ID [7] 索引邻居，以便进一步操作。

[Function Call: $\mathrm{GBW\mathrm{_{-}T o o l(\cdot)}}\mathrm{]}\mathrm{~~\rightarrow~~}\mathrm{\mathcal{G}^{\mathrm{exp}}\mathrm{,~}}\mathbf{E}^{\mathrm{gnn}}$

函数调用：<图接地> $\mathrm{GBW\mathrm{_{-}T o o l(\cdot)}}\mathrm{]}\mathrm{~~\rightarrow~~}\mathrm{\mathcal{G}^{\mathrm{exp}}\mathrm{,~}}\mathbf{E}^{\mathrm{gnn}}$

[Agent (Graph Generation Agent)]

[智能体 (图生成智能体)]

Now the graph tokens for the pre-defined graph are ready. I will further try to discover a semantic knowledge graph for the target node and the sampled neighbors.

现在预定义图的图 Token 已准备就绪。我将进一步尝试为目标节点和采样的邻居发现一个语义知识图。

[Agent (Graph Action Agent)]

[智能体 (图动作智能体)]

Since we already have the pre-defined graph and the built SKG, I will construct an appropriate system prompt according to the user request.

由于我们已经有了预定义的图和构建的SKG，我将根据用户请求构建一个适当的系统提示。

: You are a powerful assistant in accomplishing diverse user required tasks with the help of graphs... [Agent (Graph Action Agent)]

: 你是一个强大的助手，能够借助图表完成用户所需的各种任务... [AI智能体 (Graph Action Agent)]

According to the provided information, the neighbor nodes and the semantic knowledge graph nodes, the most likely category for the movie is ... The reasoning process behind is...

根据提供的信息，邻居节点和语义知识图谱节点，电影最可能的类别是……其背后的推理过程是……

Table 3 demonstrates the multi-agent collaboration workflow using the IMDB movie classification dataset as an example. The process begins when the Task Planning Agent receives a user request, performing request decomposition and task planning, while also processing graph data in scenarios with pre-defined graphs. Next, the Graph Generation Agent constructs a task-relevant semantic knowledge graph. Finally, the Multi-Task Agent automatically constructs a system prompt based on the task type and executes task inference. This workflow highlights GraphAgent’s versatility in automatically adapting to different scenarios, whether handling pre-defined graphs or generating new ones, while maintaining consistent performance across various task types.

表 3 展示了使用 IMDB 电影分类数据集作为示例的多智能体协作工作流程。该流程始于任务规划智能体接收到用户请求时，执行请求分解和任务规划，同时在有预定义图的场景中处理图数据。接下来，图生成智能体构建与任务相关的语义知识图。最后，多任务智能体根据任务类型自动构建系统提示并执行任务推理。该工作流程突显了 GraphAgent 在自动适应不同场景时的多功能性，无论是处理预定义的图还是生成新的图，同时在不同任务类型中保持一致的性能。

3.6 ABLATION STUDY

3.6 消融研究

To evaluate each component in GraphAgent, we conducted an ablation study with the following variants: (-) SKG: Removes the graph generation agent and excludes semantic knowledge graph tokens from LLM input. (-) Alignment: Omits the graph-instruction alignment tuning described in Section 2.4.2, training directly with instruction input-output pairs. (-) Cur. Strategy: Eliminates the curriculum learning strat- egy for agent task training (Section 2.4.3), in- stead training all tasks simultaneously across all epochs. Figure 7 presents the comparative results between GraphAgent and its variants on

为了评估 GraphAgent 中的每个组件，我们进行了以下变体的消融实验：
(-) SKG：移除图生成智能体，并从大语言模型输入中排除语义知识图谱 Token。
(-) Alignment：省略第 2.4.2 节中描述的图-指令对齐微调，直接使用指令输入-输出对进行训练。
(-) Cur. Strategy：取消智能体任务训练的课程学习策略（第 2.4.3 节），改为在所有轮次中同时训练所有任务。

图 7 展示了 GraphAgent 及其变体在实验中的对比结果。

Figure 7: Ablation study comparing GraphAgent with its variants on both graph-related prediction and graph-enhanced text generation tasks.

图 7: 对比 GraphAgent 及其变体在图相关预测和图增强文本生成任务上的消融研究。

both predictive and generative tasks. Our analysis reveals two key findings:

我们的分析揭示了两个关键发现：

• For predictive tasks, semantic knowledge graphs generated by the graph generation agent show the strongest impact, as their supplementary information substantially enhances model performance. In contrast, for generative tasks, the alignment component proves crucial for maintaining high performance, likely because these tasks demand sophisticated reasoning capabilities, making alignment tuning essential for developing deeper graph-instruction understanding.

• 对于预测任务，图生成智能体生成的语义知识图谱显示出最强的影响，因为它们的补充信息显著提升了模型性能。相比之下，对于生成任务，对齐组件对于保持高性能至关重要，这可能是因为这些任务需要复杂的推理能力，使得对齐调优对于发展更深层次的图-指令理解至关重要。

• The curriculum training strategy shows consistent improvements across both task types. By enabling gradual progression from simpler predictive tasks to more complex generative ones, this approach allows the model to more effectively assimilate knowledge from various graph-instruction pairs, resulting in more robust overall performance.

• 课程训练策略在两种任务类型上均显示出持续的改进。通过从较简单的预测任务逐步过渡到更复杂的生成任务，这种方法使模型能够更有效地吸收来自各种图-指令对的知识，从而实现更稳健的整体性能。

4 相关工作

Graph Representation Learning enables analysis of complex relationships through specialized graph embedding techniques Chen et al. (2020); Wu et al. (2020). Graph Neural Networks serve as its foundation, capturing node dependencies through message-passing mechanisms Dwivedi et al. (2023); Huang et al. (2024a). Key architectures include Graph Convolutional Networks (GCNs)Kipf & Welling (2017); Jin et al. (2020); Wu et al. (2024), which use localized convolutions for neighbor aggregation, and Graph Attention Networks (GAT)Veli c ko vic´ et al. (2018); Zhang et al. (2022); Hao et al. (2023), which incorporate attention mechanisms to weigh neighboring nodes’ importance. In our GraphAgent, GNNs act as graph tokenizers, facilitating effective integration with LLMs.

图表示学习 (Graph Representation Learning) 通过专门的图嵌入技术实现对复杂关系的分析 Chen et al. (2020); Wu et al. (2020)。图神经网络 (Graph Neural Networks, GNNs) 是其基础，通过消息传递机制捕捉节点之间的依赖关系 Dwivedi et al. (2023); Huang et al. (2024a)。关键架构包括图卷积网络 (Graph Convolutional Networks, GCNs) Kipf & Welling (2017); Jin et al. (2020); Wu et al. (2024)，它使用局部卷积进行邻居聚合；以及图注意力网络 (Graph Attention Networks, GAT) Veli c ko vic´ et al. (2018); Zhang et al. (2022); Hao et al. (2023)，它结合注意力机制来加权邻居节点的重要性。在我们的 GraphAgent 中，GNNs 充当图分词器，促进与大语言模型的有效集成。

Graph Language Models. With the success of Large Language Models (LLMs), recent studies have focused on enhancing the generalization capabilities of graph models by integrating LLMs with Graph Neural Networks (GNNs) Tang et al. (2024b). For instance, GraphGPT Tang et al. (2024a) enables LLMs to understand graph structural information by combining a graph encoder with an LLM through an alignment projector. LLaGA Chen et al. (2024b) enhances LLM capabilities for graph data by reorganizing nodes into structure-aware sequences. Additionally, ZeroG Li et al. (2024) has been developed for zero-shot transfer learning in graph learning, leveraging language models to achieve effective cross-dataset generalization. However, most current graph language models primarily focus on capturing the topological information of explicit graph connections for standard representation learning tasks. In this work, we introduce a fully automated and easy-to-use agent framework that goes beyond traditional graph language models. Our framework is designed to tackle complex real-world data scenarios, which often involve both explicit relational graph connections and implicit graph-enhanced semantic dependencies. This allows us to address various downstream applications, including both graph-related predictive and text generative tasks.

图语言模型。随着大语言模型 (LLMs) 的成功，最近的研究集中在通过将大语言模型与图神经网络 (GNNs) 结合来增强图模型的泛化能力 (Tang et al., 2024b)。例如，GraphGPT (Tang et al., 2024a) 通过将图编码器与大语言模型结合，使大语言模型能够理解图结构信息。LLaGA (Chen et al., 2024b) 通过将节点重新组织为结构感知序列，增强了大语言模型对图数据的处理能力。此外，ZeroG (Li et al., 2024) 被开发用于图学习中的零样本迁移学习，利用语言模型实现跨数据集的有效泛化。然而，当前大多数图语言模型主要关注于捕捉显式图连接的拓扑信息，以用于标准的表示学习任务。在本工作中，我们引入了一个完全自动化且易于使用的智能体框架，超越了传统的图语言模型。我们的框架旨在应对复杂的现实世界数据场景，这些场景通常涉及显式关系图连接和隐式图增强的语义依赖。这使得我们能够处理各种下游应用，包括图相关的预测任务和文本生成任务。

LLM-empowered Agents. LLM-empowered agents enhance user interactions by connecting complex data with intuitive communication. They utilize LLMs to efficiently integrate diverse information, allowing them to handle a broader range of tasks Shinn et al. (2023); Xie et al. (2023). For exa

[论文翻译]GRAPHAGENT: 图语言助手智能体

原文地址：https://arxiv.org/pdf/2412.17029

GRAPHAGENT: AGENTIC GRAPH LANGUAGE ASSISTANT

GRAPHAGENT: 图语言助手智能体

ABSTRACT

摘要

1 INTRODUCTION

1 引言

2 METHODOLOGY

2 方法论

2.1 PRELIMINARIES

2.1 预备知识

2.3 TASK PLANNING AGENT

2.3 任务规划智能体 (Task Planning Agent)

2.3.1 Intent Identification and Task Formulation

2.3.1 意图识别与任务制定

2.3.2 Graph-Token Grounding

2.3.2 图-Token 对齐

2.3.3 Graph Token iz ation

2.3.3 图 Token 化

2.4 GRAPH ACTION AGENT

2.4 图动作智能体

2.4.1 Cross-Task Graph Agent

2.4.1 跨任务图智能体

2.4.2 Graph-Instruction Alignment

2.4.2 图-指令对齐

2.4.3 Agent Task Finetuning

2.4.3 AI智能体任务微调

3 EVALUATION

3 评估

3.1 EXPERIMENTAL SETTINGS

3.1 实验设置

3.1.1 Implementation Details

3.1.1 实现细节

3.1.2 Datasets

3.1.2 数据集

3.1.3 Baseline Methods

3.1.3 基线方法

3.1.4 Evaluation Protocols

3.1.4 评估协议

3.2 GRAPH PREDICTION TASK WITH EXPLICIT AND IMPLICIT GRAPH CONTEXTS (RQ1)

3.2 显式和隐式图上下文下的图预测任务 (RQ1)

3.3 GRAPH PREDICTION WITH IMPLICIT SEMANTIC INTERDEPENDENCIES (RQ2)

3.3 具有隐式语义依赖关系的图预测 (RQ2)

3.4 GRAPH-ENHANCED TEXT GENERATION (RQ3)

3.4 图增强文本生成 (RQ3)

Table 3: Illustrative Examples of GraphAgent’s Multi-Agent Pipeline.

[Agent (Task Planning Agent)]

[智能体 (任务规划智能体)]

[Agent (Task Planning Agent)]

[智能体 (任务规划智能体)]

[Agent (Graph Generation Agent)]

[智能体 (图生成智能体)]

[Agent (Graph Action Agent)]

[智能体 (图动作智能体)]

3.6 ABLATION STUDY

3.6 消融研究

4 RELATED WORK

4 相关工作