A SURVEY ON KNOWLEDGE EDITING OF NEURAL NETWORKS

神经网络知识编辑研究综述

A PREPRINT

预印本

Vittorio Mazzia Alexa AI, Amazon vmazzia@amazon.com

Alessandro Pedrani Alexa AI, Amazon pedrana@amazon.com

Andrea Caciolai Alexa AI, Amazon andccl@amazon.com

Kay Rottmann Alexa AI, Amazon krrottm@amazon.com

Davide Bernardi Alexa AI, Amazon dvdbe@amazon.com

ABSTRACT

摘要

Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance on a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or up-to-date information has become a common workaround in practical applications. However, the well-known phenomenon of catastrophic forgetting poses a challenge in achieving precise changes in the implicitly memorized knowledge of neural network parameters, often requiring a full model re-training to achieve desired behaviors. That is expensive, unreliable, and incompatible with the current trend of large self-supervised pre-training, making it necessary to find more efficient and effective methods for adapting neural network models to changing data. To address this need, knowledge editing is emerging as a novel area of research that aims to enable reliable, data-efficient, and fast changes to a pre-trained target model, without affecting model behaviors on previously learned tasks. In this survey, we provide a brief review of this recent artificial intelligence field of research. We first introduce the problem of editing neural networks, formalize it in a common framework and differentiate it from more notorious branches of research such as continuous learning. Next, we provide a review of the most relevant knowledge editing approaches and datasets proposed so far, grouping works under four different families: regular iz ation techniques, meta-learning, direct model editing, and architectural strategies. Finally, we outline some intersections with other fields of research and potential directions for future works.

深度神经网络在学术界和工业界正变得日益普遍，在众多领域及相关任务中达到甚至超越人类表现。然而，与人类相似，即便是最大规模的人工神经网络也会犯错，且随着时间推移，曾经正确的预测可能失效。通过添加纠错样本或更新数据来扩充数据集已成为实际应用中的常见解决方案。但众所周知的灾难性遗忘现象，对精确调整神经网络参数中隐式记忆的知识提出了挑战，通常需要完整重新训练模型才能实现预期行为。这种做法成本高昂、可靠性低，且与当前大规模自监督预训练的趋势不相容，因此亟需寻找更高效的方法来使神经网络模型适应动态变化的数据。

为应对这一需求，知识编辑 (knowledge editing) 正成为一个新兴研究领域，其目标是在不影响模型已学习任务表现的前提下，实现对预训练模型的可靠、数据高效且快速的修改。本综述对这一人工智能研究新领域进行了简要梳理：首先阐述神经网络编辑问题，通过统一框架进行形式化定义，并将其与持续学习等更受争议的研究分支区分；随后系统回顾当前最相关的知识编辑方法与数据集，将现有工作归纳为四大类——正则化技术、元学习、直接模型编辑和架构策略；最后探讨该领域与其他研究的交叉点及未来潜在方向。

Keywords Knowledge Editing $\cdot$ Model Editing $\cdot$ Neural Networks Editing $\cdot$ Continual Learning

关键词知识编辑 $\cdot$ 模型编辑 $\cdot$ 神经网络编辑 $\cdot$ 持续学习

1 Introduction

1 引言

In stark contrast to artificial neural networks (ANN), Cichon and Gan (2015), humans and other animals seem capable of learning and editing their knowledge continuously. Indeed, literature studies indicate that the mammalian brain could prevent catastrophic forgetting Ratcliff (1990) by safeguarding previously acquired knowledge, thereby reducing the plasticity of a proportion of synapses and ensuring their long-term stability Benna and Fusi (2016); Yang et al. (2009); Cichon and Gan (2015). On the contrary, ANNs not only struggle to learn new tasks in a sequential fashion Kirkpatrick et al. (2017), but also edit acquired knowledge on the same data distribution and task Huang et al. (2023). Indeed, unlike conventional knowledge base systems that explicitly store knowledge, neural models implicitly memorize facts and tasks in their parameters, making it difficult to directly access and interpret their computation and memories Voita et al. (2019); Belinkov and Glass (2019). Making even minor modifications can lead to a decrease in performance on previously learnt tasks, or even cause the entire computation to fail due to the well-documented issue of catastrophic forgetting Ratcliff (1990). Therefore, modifying their acquired knowledge is a challenging problem.

与人工神经网络(ANN)形成鲜明对比的是，Cichon和Gan(2015)指出，人类和其他动物似乎能够持续学习和编辑知识。文献研究表明，哺乳动物大脑通过保护已获得的知识来防止灾难性遗忘(Ratcliff 1990)，从而降低部分突触的可塑性并确保其长期稳定性(Benna和Fusi 2016；Yang等2009；Cichon和Gan 2015)。相反，人工神经网络不仅难以按顺序学习新任务(Kirkpatrick等2017)，甚至无法在同一数据分布和任务上编辑已获得的知识(Huang等2023)。与传统显式存储知识的知识库系统不同，神经模型将事实和任务隐式记忆在参数中，这使得直接访问和解释其计算与记忆变得困难(Voita等2019；Belinkov和Glass 2019)。即使进行微小修改，也可能导致先前学习任务的性能下降，甚至因众所周知的灾难性遗忘问题(Ratcliff 1990)而导致整个计算失败。因此，修改已获得知识是一个具有挑战性的问题。

Just as humans, ANNs make mistakes and as we trust them with increasingly important decisions, the cost of such mistakes grows ever higher Sinitsin et al. (2020). Therefore, given that mistakes are inevitable, it is crucial for deep learning practitioners to possess the ability to adjust model behaviors by promptly correcting errors as they surface. Currently, practical applications employing deep learning techniques have been relying on different workaround s to tackle this problem. In particular, a full re-training using datasets augmented with samples that account for the mistakes or up-to-date information is a common choice Sinitsin et al. (2020). The endeavor needed for fine-tuning atop pre-trained models Sarzynska-Wawer et al. (2021); Devlin et al. (2018); Oquab et al. (2023); Weiss et al. (2016) is frequently substantiated by the diminished dataset size and computational resources needed. On the other hand, this is not always true and manually curated, deterministic mechanisms that overrule model predictions on problematic samples can be the preferred choice Sinitsin et al. (2020). However, while being simple, this second approach is fully localized and not robust to factor of variations of the input space (e.g., different viewpoint of the same object in computer vision or paraphrasing in natural language processing tasks). Furthermore, while these workaround s may provide a temporary solution, they can be expensive, unreliable, and incompatible with the current trend of large neural models Zhao et al. (2023); Chen et al. (2022). Indeed, these large networks are typically deployed as static artifacts, whose behavior is difficult to modify during deployment without a costly full re-training Lazaridou et al. (2021). Thus, in all those cases, in order to adapt to changes in the environment, or to address instances of under fitting or over fitting in the original training data, it is desirable to have the ability to quickly make targeted updates to the model’s behavior after it has been deployed De Cao et al. (2021).

正如人类一样，人工神经网络 (ANN) 也会犯错，而随着我们让其参与越来越重要的决策，这类错误的代价正变得愈发高昂 [Sinitsin et al., 2020]。因此，既然错误无法避免，深度学习从业者必须具备通过及时修正错误来调整模型行为的能力。当前，采用深度学习技术的实际应用主要依赖不同变通方案来解决该问题。其中常见做法是使用包含错误样本或最新信息的增强数据集进行完整重新训练 [Sinitsin et al., 2020]。在预训练模型基础上进行微调 [Sarzynska-Wawer et al., 2021; Devlin et al., 2018; Oquab et al., 2023; Weiss et al., 2016] 所需的工作量，通常因数据集规模和计算资源需求降低而获得合理性。但另一方面，这种方式并非总是适用，针对问题样本手动构建确定性机制来覆盖模型预测可能是更优选择 [Sinitsin et al., 2020]。然而，尽管第二种方法简单直接，但它完全局限于局部且对输入空间的变化因素（如计算机视觉中同一物体的不同视角，或自然语言处理任务中的文本改写）缺乏鲁棒性。此外，这些变通方案虽能提供临时解决方案，但可能成本高昂、不可靠，且与当前大模型趋势 [Zhao et al., 2023; Chen et al., 2022] 不兼容。事实上，这些大型网络通常作为静态成品部署，其行为在部署后难以修改，除非付出昂贵代价进行完整重新训练 [Lazaridou et al., 2021]。因此，无论是为适应环境变化，还是解决原始训练数据中的欠拟合或过拟合实例，都亟需具备在模型部署后快速进行针对性行为更新的能力 [De Cao et al., 2021]。

To address this need, knowledge editing methods have been recently proposed to efficiently change a model’s behaviors without affecting previous performance on the same task Sinitsin et al. (2020). These approaches take inspiration from several fields of artificial intelligence research and range from simple fine-tuning with regular iz ation methods Zhu et al. (2020) to meta-learning techniques that adopt hyper network models to learn how to update parameters De Cao et al. (2021); Mitchell et al. (2022a). Due to its recent appearance in the literature, Sinitsin et al. (2020), the field still lacks accordance in the taxonomy, naming convention, datasets, and target applications. Indeed, most of the works have been motivated by large language models (LLMs), Zhao et al. (2023); Brown et al. (2020); Soltan et al. (2022), focusing mostly on tasks such as question answering (QA), Levy et al. (2017), machine translation (MT) De Cao et al. (2021), modifying knowledge graph embeddings Cheng et al. (2024), or even simpler NLP problems Thorne et al. (2018). However, it is also possible to find applications of knowledge editing to computer vision problems Sinitsin et al. (2020). Furthermore, its potential scope is poised to expand across diverse machine learning domains, encompassing areas such as medicine Shehab et al. (2022) robotics Soori et al. (2023), and precision agriculture Sharma et al. (2020) in the future.

为满足这一需求，近期提出的知识编辑方法能在不影响模型原有任务表现的前提下高效修改其行为 (Sinitsin et al., 2020)。这些方法汲取了人工智能多个研究领域的灵感，涵盖从采用正则化方法的简单微调 (Zhu et al., 2020) 到利用超网络模型学习参数更新策略的元学习技术 (De Cao et al., 2021; Mitchell et al., 2022a)。由于该领域在文献中刚刚兴起 (Sinitsin et al., 2020)，目前在分类体系、命名规范、数据集和目标应用方面尚未形成统一标准。事实上，大多数研究都受大语言模型 (LLM) (Zhao et al., 2023; Brown et al., 2020; Soltan et al., 2022) 推动，主要聚焦于问答系统 (QA) (Levy et al., 2017)、机器翻译 (MT) (De Cao et al., 2021)、知识图谱嵌入修改 (Cheng et al., 2024) 乃至更基础的NLP问题 (Thorne et al., 2018)。但知识编辑在计算机视觉领域也有应用实例 (Sinitsin et al., 2020)。未来其应用范围有望扩展到医疗 (Shehab et al., 2022)、机器人 (Soori et al., 2023) 和精准农业 (Sharma et al., 2020) 等多元机器学习领域。

1.1 Organization of the survey

1.1 综述结构

The objective of this survey is to provide a comprehensive review of existing literature on knowledge editing, formalizing the task and providing a categorization of the approaches into distinct families. To our knowledge, this is the first work to undertake such an effort, and we hope that it will facilitate future research in this increasingly important area of study. Indeed, the need for more formalization and organizational structure can already be seen by a recent study Yao et al. (2023), which attempts to benchmark and compare some knowledge editing methodologies specifically for LLMs editing.

本次综述旨在对知识编辑领域的现有文献进行全面梳理，通过形式化任务定义并将现有方法划分为不同类别。据我们所知，这是首个系统性的整理工作，我们希望这能推动这个日益重要的研究领域的未来发展。事实上，Yao等人 (2023) 近期针对大语言模型编辑的基准研究已反映出该领域对更规范体系架构的迫切需求。

The rest of the survey is organized as follows. 2 introduces the problem of knowledge editing, using previous works to formalize it under a common definition. 3 explores the tasks and datasets that are most commonly considered when solving the knowledge editing problem. 4 provides an overview of the most relevant knowledge editing approaches in the literature, identifying four distinct families: regular iz ation techniques, meta-learning, direct model editing, and architectural strategies. Finally, 5 concludes the survey by discussing some intersection with knowledge editing and other fields of research, and outlining some possible future risks and directions.

本综述的其余部分结构如下。第2章介绍知识编辑 (knowledge editing) 问题，通过先前研究给出通用定义的形式化表述。第3章探讨解决知识编辑问题最常考虑的任务和数据集。第4章概述文献中最相关的知识编辑方法，将其归纳为四大类：正则化技术 (regularization techniques)、元学习 (meta-learning)、直接模型编辑 (direct model editing) 和架构策略 (architectural strategies)。最后，第5章总结全文，讨论知识编辑与其他研究领域的交叉点，并展望未来潜在风险与发展方向。

2 Overview

2 概述

This section begins by presenting an introduction to the concept of knowledge editing, which is also referred to as model editing in the literature. First, we review various definitions and interpretations of knowledge editing proposed by different works. Next, we establish a common definition of knowledge editing that generalizes to all existing works in the field.

本节首先介绍知识编辑(Knowledge Editing)的概念，该概念在文献中也被称为模型编辑(Model Editing)。我们先回顾不同研究提出的知识编辑定义与解释，随后建立一个能涵盖该领域所有现有研究的通用定义。

Figure 1: The knowledge editing problem has been firstly proposed as the task of modifying a model based on a set of individual pairs of edits, in a non-sequential manner (a), Sinitsin et al. (2020). Successive works extended the problem to batch of edits (b), sequential individual edits (c), and sequential batch of edits (d). Evaluation metrics are similar to all cases, as described in Section 2.5.

图 1: 知识编辑问题最初被提出时，是以非连续方式基于一组独立编辑对来修改模型的任务 (a)，如 Sinitsin 等人 (2020) 所述。后续研究将该问题扩展到批量编辑 (b)、连续独立编辑 (c) 和连续批量编辑 (d)。如第 2.5 节所述，所有情况的评估指标均相似。

2.1 Background

2.1 背景

The concept of knowledge editing was first introduced in Sinitsin et al. (2020), which formalizes it as the task of correcting a model’s mistake on a specific sample while preserving the model’s overall behavior, akin to continuous learning. Indeed, as specified by the authors, “The problem of efficient neural network patching differs from continual learning, (...) [because] is not sequential in nature. However, correction (...) of mislabeled samples must not affect its behavior on other samples, which is close to overcoming [the] catastrophic forgetting task.”. Therefore, Sinitsin et al. (2020) define the problem as performing individual edits reliably and efficiently, not sequentially, and on the same task learned by the target model without affecting its overall behavior (i.e., being local to the edit). Concurrently, authors of Zhu et al. (2020) worked specifically on the task of modifying memories in Transformer models Vaswani et al. (2017), providing their own definition of knowledge editing. They expand the scope of the problem to a subset of knowledge, i.e., a batch of edits. Similarly, several other studies have also formalized the problem of model editing, following similar steps to Zhu et al. (2020). For instance, works by Mitchell et al. (2022a), Meng et al. (2022a), and Mitchell et al. (2022b) have defined the task as the process of performing individual or batch edits on a target model trained for a specific task. These studies emphasize the importance of ensuring that edits are resilient to factors of variation, i.e., that they are general iz able. While injecting individual changes is already a challenging and interesting task for the scientific community, multiple simultaneous general model edits represent a more realistic scenario that deserves further exploration.

知识编辑的概念最早由Sinitsin等人(2020)提出，他们将其形式化为在保持模型整体行为的同时修正模型在特定样本上的错误的任务，类似于持续学习。正如作者所述："高效的神经网络修补问题不同于持续学习(...)[因为]本质上不是顺序性的。然而对错误标记样本的修正(...)不得影响其在其他样本上的行为，这与克服[灾难性遗忘]任务相近。"因此，Sinitsin等人(2020)将该问题定义为在不影响模型整体行为(即编辑的局部性)的前提下，对目标模型已学习的同一任务进行可靠且高效的非顺序性单独编辑。与此同时，Zhu等人(2020)的研究团队专门针对Transformer模型(Vaswani等人，2017)中的记忆修改任务开展工作，提出了他们自己的知识编辑定义。他们将问题范围扩展到知识子集，即批量编辑。类似地，其他几项研究也遵循Zhu等人(2020)的步骤形式化了模型编辑问题。例如Mitchell等人(2022a)、Meng等人(2022a)和Mitchell等人(2022b)的工作将该任务定义为对针对特定任务训练的目标模型进行单独或批量编辑的过程。这些研究强调了确保编辑对变异因素具有弹性(即可泛化性)的重要性。虽然对科学界而言注入单个变更已是具有挑战性的有趣任务，但多重同步的通用模型编辑代表了更现实的场景，值得进一步探索。

More recently, Hartvigsen et al. (2022) and Huang et al. (2023) argued that the conventional "one-mistake-fixing scenario" does not accurately reflect the complexity of real-world knowledge editing challenges. As such, they proposed to extend the scope of knowledge editing to a sequential problem to facilitate the development of more practical editing methods. While their proposal only accounts for subsequent individual edits, considering multiple simultaneous and sequential edits represents a more general case where the number of edits varies at each step. Importantly, in the case of iterative model editing, it is desirable to respect not only the new editing task constraints but also the previous ones, which is closely related to the concept of continual learning. Nevertheless, it is crucial to highlight that while the new definition of knowledge editing acknowledges a sequential process, differing from continuous learning, its scope remains limited to the modification of knowledge of the initially learned task by the model. On the contrary, continuous learning operates without such constraints, researching for methodologies that allow the model to expand to new tasks and adapt dynamically to entirely new information.

最近，Hartvigsen等人(2022) 和 Huang等人(2023) 指出传统的"单次纠错场景"无法准确反映现实世界知识编辑挑战的复杂性。为此，他们提出将知识编辑范围扩展为序列问题，以促进更实用编辑方法的发展。虽然他们的方案仅考虑后续的单个编辑，但处理多组同步和顺序编辑才是更普遍的情况——其中每一步的编辑数量都可能变化。值得注意的是，在迭代式模型编辑中，不仅需要满足新编辑任务的约束，还需兼顾先前约束，这与持续学习(continual learning)概念密切相关。但必须强调：尽管新版知识编辑定义承认了序列过程（与持续学习不同），其范围仍局限于模型对初始学习任务知识的修改；而持续学习则不受此限，致力于研究让模型拓展新任务、动态适应全新信息的方法论。

2.2 The knowledge editing problem

2.2 知识编辑问题

To introduce the problem of knowledge editing, it is possible to leverage the terminology used to define a “well-posed learning problem” Mitchell et al. (2007): an algorithm is said to learn from experience $E$ with respect to some class of tasks $T$ and performance measure $P$ , if its performance at tasks in $T$ , as measured by $P$ , improves with experience $E$ . Then, we can say that knowledge editing is the problem of modifying the algorithm such that, given a representative set $S$ of instances of tasks in $T$ , and a subset $S_{e}\subseteq S$ of edit instances, its performance on $S_{e}$ improves as measured by $P$ , while performance on all the other instances $S\setminus S_{e}$ remains unchanged.

为引入知识编辑问题，可以借鉴Mitchell等人(2007)定义"适定学习问题"的术语：若某算法在任务类$T$中的表现(通过性能度量$P$评估)能随经验$E$提升，则称该算法能从经验$E$中学习。由此可将知识编辑定义为：给定任务类$T$的实例代表集$S$及其编辑子集$S_{e}\subseteq S$，修改算法使其在$S_{e}$上的性能(通过$P$衡量)提升，同时保持$S\setminus S_{e}$上所有实例的表现不变。

More practically, we can define the problem of knowledge editing as “the task of modifying a model based on a set of individual or batch edits, $S_{e}$ , pertaining to the same task known by the model, either in a sequential or non-sequential manner. The objective is to update the model’s knowledge representation without significantly altering its original behavior or performance over $S$ and being robust to different factor of variations of the edits.”

更实际地说，我们可以将知识编辑问题定义为"基于一组与模型已知任务相关的个体或批量编辑 $S_{e}$ (以顺序或非顺序方式)修改模型的任务。其目标是在不显著改变模型在 $S$ 上的原始行为或性能的前提下更新模型的知识表示，并对编辑的不同变化因素保持鲁棒性。"

More formally, let $\mathbb{X}$ , Y be an input and output space, respectively. Let $f_{0}$ be a function that maps an input $x\in\mathbb{X}$ to an output $y\in\mathbb{Y}$ , para met rize d by $\theta_{0}\in\Theta$ , then

更正式地说，设$\mathbb{X}$和$\mathbb{Y}$分别为输入和输出空间。设$f_{0}$是一个将输入$x\in\mathbb{X}$映射到输出$y\in\mathbb{Y}$的函数，由参数$\theta_{0}\in\Theta$参数化。

$$
f_{0}\in\mathbb{F}:=(\mathbb{X}\times\Theta)^{\mathbb{Y}}
$$

We use the subscript zero to indicate that this is the starting model, i.e., the model we want to edit. We define an edit pair as an input-output pair $(x_{e},y_{e})\in\mathbb{X}\times\mathbb{Y}$ , such that $f_{0}(x_{e})\ne y_{e}$ . Then, in its simplest form, given an individual edit example pair $(x_{e},y_{e})$ , a knowledge editing (KE) methodology can be defined as follows

我们用下标零表示这是初始模型，即待编辑的模型。将编辑对定义为一个输入-输出对 $(x_{e},y_{e})\in\mathbb{X}\times\mathbb{Y}$ ，使得 $f_{0}(x_{e})\ne y_{e}$ 。那么在最简形式下，给定单个编辑示例对 $(x_{e},y_{e})$ ，知识编辑（Knowledge Editing, KE）方法可定义如下

$$
\mathrm{KE}:\mathbb{F}\times\mathbb{X}\times\mathbb{Y}\rightarrow\mathbb{F}
$$

i.e. a function that takes in input the starting model $f_{0}$ and the edit pair $(x_{e},y_{e})$ to produce an edited model $f_{e}$ . If the edited model is such that $f_{e}(x_{e})=y_{e}$ , then we say that the edit was successful. A KE approach can realize the transformation from $f_{0}$ to $f_{e}$ in different ways, and we identify four families of possible realization s: regular iz ation, meta-learning, direct model editing and architectural strategies. We deep dive into more details into each of these families in 4.

即一个以初始模型 $f_{0}$ 和编辑对 $(x_{e},y_{e})$ 作为输入，生成编辑后模型 $f_{e}$ 的函数。若编辑后的模型满足 $f_{e}(x_{e})=y_{e}$ ，则称该编辑成功。知识编辑(KE)方法可通过不同方式实现从 $f_{0}$ 到 $f_{e}$ 的转换，我们将其归纳为四种实现路径：正则化(regularization)、元学习(meta-learning)、直接模型编辑(direct model editing)和架构策略(architectural strategies)。我们将在第4节详细探讨每种路径的具体实现。

2.3 Taxonomy of edits

2.3 编辑分类

Often, it is interesting to edit a model applying multiple edits at once, a sequence of edits, or sequences of batches of edits.

通常，同时应用多次编辑、一系列编辑或批量编辑序列来修改模型会很有趣。

The definition provided in section 2.2 has been given for the simplest case, that we call a single non-successive edit: we only want to change the model for one edit pair. Conversely, for multiple successive edits we can formally define a list of edit requests as:

2.2 节中给出的定义针对最简单的情况，我们称之为单次非连续编辑：只需针对一个编辑对修改模型。相反，对于多次连续编辑，我们可以将编辑请求列表正式定义为：

$$
\mathcal{E}={(x_{e},y_{e})^{(i)}\mathrm{ s.t. }\forall i,j:x_{e}^{i}=x_{e}^{j}\Rightarrow y_{e}^{i}=y_{e}^{j}}
$$

where the logical constraint ensures that there are no conflicting requests, as suggested by Meng et al. (2022a). Individual edits can also be grouped together and form $N$ batches of successive edits each with $B_{e}^{(i)}$ edit pairs, such as

逻辑约束确保不存在冲突请求，如Meng等人 (2022a) 所建议。单个编辑也可分组形成$N$个连续编辑批次，每批包含$B_{e}^{(i)}$个编辑对，例如

$$
\mathcal{B}_ {e}^{(i)}={(x_{e},y_{e})^{0},\cdot\cdot\cdot,(x_{e},y_{e})^{B}}^{(i)}\mathrm{ s.t. }\mathcal{E}=\bigcup_{i=1}^{N}\mathcal{B}_{e}^{(i)}
$$

The difference between successive individual or batch of edits is that some KE methodologies can ingest an entire $\boldsymbol{B}_ {e}^{(i)}$ and produce the resulting $f_{e}$ implementing all the given edits, while other methodologies can only consider one individual sample in sequence at a time. In both cases (a sequence of individual edits is trivially a sequence of single-item batch edits), successive edits assume to work with a starting model $f_{t-1}$ and apply the $t$ -th change as

连续单次或批量编辑的区别在于，某些知识编辑(KE)方法能够一次性处理整个$\boldsymbol{B}_ {e}^{(i)}$并生成实现所有给定编辑的$f_{e}$，而其他方法每次只能依次处理单个样本。在这两种情况下(一系列单次编辑本质上就是一系列单条目批量编辑)，连续编辑假设从初始模型$f_{t-1}$开始工作，并应用第$t$次变更作为

$$
f_{t}=\mathrm{KE}(f_{t-1},B_{e}^{(i)})
$$

proceeding iterative ly, using $f_{t}$ as a starting model for the next edit. Figure 1 summarizes the four types of edits. Finally, as for individual edits, after every change, $f_{e}$ should not only satisfy $f_{e}(x_{e})=y_{e}$ , but a series of other properties as discussed in the next section.

逐步迭代进行，使用 $f_{t}$ 作为下一个编辑的初始模型。图 1: 总结了四种编辑类型。最后，对于单个编辑而言，每次更改后 $f_{e}$ 不仅需要满足 $f_{e}(x_{e})=y_{e}$，还应满足下一节讨论的一系列其他属性。

2.4 Editing properties

2.4 编辑属性

Based on the specific task learned by function $f$ , various properties can be specifically designed. However, at a fundamental level, following Sinitsin et al. (2020); Huang et al. (2023), knowledge editing should aim at satisfying four properties, that we define below.

基于函数 $f$ 学习的具体任务，可以专门设计各种属性。然而，在基础层面上，遵循 Sinitsin et al. (2020) 和 Huang et al. (2023) 的研究，知识编辑应致力于满足以下四个属性。

Property 1 - Reliability: Given an edit pair $(x_{e},y_{e})$ , the edited model $f_{e}$ should output the desired edit:

属性1 - 可靠性：给定一个编辑对$(x_{e},y_{e})$，编辑后的模型$f_{e}$应输出期望的编辑结果：

$$
f_{e}(x_{e})=y_{e}
$$

Property 2 - Generality: The edited model $f_{e}$ should be able to generalize to similar examples to the edit pair. This can be formalized by defining an equivalence neighborhood $N(x_{e})\subset\mathbb{X}$ and requiring that the edited model $f_{e}$ satisfies:

属性2 - 通用性：编辑后的模型$f_{e}$应能泛化到与编辑对相似的示例。这可以通过定义一个等价邻域$N(x_{e})\subset\mathbb{X}$并要求编辑后的模型$f_{e}$满足以下条件来形式化：

$$
f_{e}(\tilde{x}_ {e})=y_{e}\qquad\forall x_{e}\in N(x_{e})
$$

Property 3 - Locality: The edited model $f_{e}$ should not alter the output of examples that are not similar to the edit pair. This can be formalized by defining a locality set

属性3 - 局部性：编辑后的模型$f_{e}$不应改变与编辑对不相似样本的输出。这可以通过定义局部性集合来形式化。

$$
L(x_{e}) = {(x_{\mathrm{loc}}, y_{\mathrm{loc}}) \in \mathbb{X} \times \mathbb{Y} \mid x_{\mathrm{loc}} \notin N(x_{e}) \land f_{0}(x_{\mathrm{loc}}) = y_{\mathrm{loc}}}
$$

and require that the edited model $f_{e}$ satisfies:

并要求编辑后的模型 $f_{e}$ 满足：

$$
f_{e}(x_{l o c})=y_{l o c}\qquad\forall(x_{l o c},y_{l o c})\in L(x_{e})
$$

Property 4 - Efficiency: The proposed knowledge editing methodology should aim to achieve efficient editing, characterized by low compute and space complexity.

属性4 - 效率性：所提出的知识编辑方法应追求高效编辑，具备低计算复杂度和低空间复杂度的特性。

It is worth noting that some works in the literature have defined these same properties, or a subset of them, with different names and notations. Specifically, reliability may also be referred to as efficacy, locality as specificity, and generality as paraphrase, Meng et al. (2022b); De Cao et al. (2021). Moreover, as previously mentioned, depending on the specific field of application, other properties and related evaluation metrics can arise, such as fluency and consistency Meng et al. (2022b) when editing Language Models. Finally, the last property, namely efficiency, tends to be disregarded in academic literature. However, it is one of the main reasons KE is appealing over a simpler re-training of the base model. Furthermore, it plays a pivotal role in performing comparative analyses with baseline models and showcasing the ability of a particular methodology to modify a neural network with a substantial number of parameters. We encourage future works to consider (at least) all four properties when measuring the effectiveness of a proposed KE methodology.

值得注意的是，文献中的部分研究对这些相同属性（或其子集）采用了不同的命名和符号表示。具体而言，可靠性(reliability)可能被称为效能(efficacy)，局部性(locality)可能被称为特异性(specificity)，而通用性(generality)可能被称为改写(paraphrase) [Meng et al., 2022b; De Cao et al., 2021]。此外，如前所述，根据具体应用领域的不同，还可能衍生出其他属性及相关评估指标，例如在编辑语言模型时涉及的流畅性(fluency)和一致性(consistency) [Meng et al., 2022b]。最后一项属性——效率(efficiency)往往在学术文献中被忽视，但它正是知识编辑(KE)相较于简单的基础模型重训练更具吸引力的主要原因之一。该属性在进行基线模型对比分析时具有关键作用，能展示特定方法论修改海量参数神经网络的能力。我们建议未来研究在评估知识编辑方法论有效性时至少考虑这四项属性。

2.5 Evaluation metrics

2.5 评估指标

Given a knowledge editing algorithm KE, a list of edit requests $\mathcal{E}$ , a starting model $f_{0}$ and a test set $\mathcal{D}_{t e s t}$ , it is possible to measure the extent to which the properties above hold. We consider working with $N$ batches of successive edits, each comprised of $B$ individual edits, without loss of generality (as mentioned above, if $N=1$ , we have successive individual edits). Moreover, if the KE methodology is unable to process all $B$ edits concurrently, it can apply each edit individually in a non-sequential manner.

给定一个知识编辑算法 KE、一个编辑请求列表 $\mathcal{E}$、一个初始模型 $f_{0}$ 和一个测试集 $\mathcal{D}_{t e s t}$，就可以衡量上述属性在多大程度上成立。在不失一般性的情况下，我们考虑对 $N$ 批连续编辑进行处理，每批包含 $B$ 个单独编辑（如上所述，如果 $N=1$，则表示连续单独编辑）。此外，如果 KE 方法无法同时处理所有 $B$ 个编辑，则可以以非顺序方式逐个应用每个编辑。

Following Huang et al. (2023), we represent with $I$ the indicator function, and define the following metrics. Success Rate (SR): It is used to evaluate reliability, and it is simply the proportion of edits for which the methodology succeeds in changing the knowledge of a starting model $f_{t}$ .

遵循Huang等人(2023)的研究，我们用$I$表示指示函数，并定义以下指标。成功率(SR)：用于评估可靠性，即方法成功改变初始模型$f_{t}$知识的编辑比例。

$$
\begin{array}{r}{\mathrm{SR}=\displaystyle\frac{1}{N}\frac{1}{B}\sum_{n=1}^{N}\sum_{b=1}^{B}I(f_{n,B}(x_{e;n,b})=y_{e;n,b})}\ {\mathrm{s.t.~}\bigcup_{n=1}^{N}\bigcup_{b=1}^{B}(x_{e},y_{e})_{n,b}=\mathcal{E}}\end{array}
$$

In the case of non-sequential individual edits, $f_{n,B}=f_{n,b}$ . Moreover, Eq. 10 provides an overall value after $N$ uccessive edits, but it can be of interest to measure SR every $n$ edits, tracking changes over the sequence.

对于非连续的单次编辑，$f_{n,B}=f_{n,b}$。此外，式10给出了$N$次连续编辑后的整体值，但也可以每隔$n$次编辑测量一次SR (Super-Resolution) ，以跟踪序列中的变化。

Generalization Rate (GR): It is used to evaluate generality, testing the post-edit model $f_{e}$ , on the equivalence neighborhood set $N(x_{e;n,b},y_{e;n,b})$ , where $(x_{e;n,b},y_{e;n,b})$ is the $n$ -th batch, $b$ -th edit pair. GR can be written as,

泛化率 (GR): 用于评估泛化性，在等价邻域集 $N(x_{e;n,b},y_{e;n,b})$ 上测试编辑后模型 $f_{e}$，其中 $(x_{e;n,b},y_{e;n,b})$ 是第 $n$ 批次第 $b$ 个编辑对。GR可表示为：

$$
\begin{array}{l}{\displaystyle\mathrm{GR}=\frac{1}{N}\frac{1}{B}\frac{1}{N_{b}}\sum_{n=1}^{N}\sum_{b=1}^{B}\sum_{i=1}^{N_{b}}I(f_{n,B}(\tilde{x}_ {e;n,b,i})=\tilde{y}_ {e;n,b,i})}\ {\mathrm{s.t.}~\forall n,b\bigcup_{i=1}^{N}(\tilde{x}_ {e},\tilde{y}_ {e})_ {n,b,i}\subseteq N(x_{e},y_{e})_{n,b}}\end{array}
$$

where $N_{b}$ is the number of equivalent samples of the $b$ -th edit pair. Following Mitchell et al. (2022a), we can also define Edit Success (ES) to summarize both SR and GR. It can be computed as the average accuracy of the edited model $f_{e}$ on the edit input(s), as well as inputs drawn from the equivalence neighborhood(s), that is,

其中 $N_{b}$ 是第 $b$ 个编辑对的等效样本数。根据 Mitchell 等人 (2022a) 的研究，我们还可以定义编辑成功率 (Edit Success, ES) 来综合衡量 SR 和 GR。其计算方式为编辑后模型 $f_{e}$ 在编辑输入及其等效邻域采样输入上的平均准确率，即

$$
\begin{array}{r}{\displaystyle\mathrm{ES}=\frac{1}{N}\frac{1}{B}\sum_{n=1}^{N}\sum_{b=1}^{B}\biggl(I\bigl(f_{n,B}(x_{e;n,b})=y_{e;n,b}\bigr)}\ {+\sum_{i=1}^{N_{b}}\frac{1}{N_{b}}I\bigl(f_{n,B}(x_{e;n,b,i})=y_{e;n,b,i}\bigr)\biggr)}\end{array}
$$

where the same conditions for SR and GR hold and have been omitted for brevity.

在相同条件下，SR和GR成立，为简洁起见已省略。

Drawdown (DD): It is used to evaluate locality, and it is defined as the performance degradation of the edited model over $\mathcal{D}_ {t e s t}$ . It is computed using the final edited model, $f_{N}$ , that in case of successive edits is the result of $N$ steps.

回撤 (DD): 用于评估局部性, 定义为编辑后模型在 $\mathcal{D}_ {t e s t}$ 上的性能下降。该指标通过最终编辑模型 $f_{N}$ 计算得出, 在连续编辑场景下该模型是经过 $N$ 次编辑后的结果。

$$
\mathrm{DD}=1-\frac{\sum_{(x,y)\in\mathcal{D}_ {t e s t}}I(f_{N}(x)=y)}{\sum_{(x,y)\in\mathcal{D}_ {t e s t}}I(f_{0}(x)=y)}
$$

Finally, as suggested by Huang et al. (2023) in the case of multiple successive edits, it is also important to evaluate SR and GR using the final model $f_{N}$ , in order to assess how past edits are retained. Therefore, it is possible to define three additional metrics, Success Retain Rate (SRR), Generalization Retain Rate (GRR), and Edit Success Retain (ESR), simply using in Eq. 10 and 11, $f_{N}$ instead of $f_{n,B}$ .

最后，正如Huang等人 (2023) 所建议的，在多次连续编辑的情况下，使用最终模型 $f_{N}$ 来评估SR和GR也很重要，以便评估过去编辑的保留情况。因此，可以定义三个额外的指标：成功保留率 (SRR)、泛化保留率 (GRR) 和编辑成功保留 (ESR)，只需在公式10和11中使用 $f_{N}$ 代替 $f_{n,B}$。

3 Tasks and Datasets

3 任务与数据集

The formalization of the knowledge editing problem provided in the previous section is general, and many applications of knowledge editing to different tasks encompassing various fields can be formulated within that framework. The brief, but rich history of the field has so far seen applications mainly to two broad fields: Computer Vision and Natural Language Processing. Indeed, Sinitsin et al. (2020) provides experimental results on image classification and machine translation, and almost all the works that come after (and even before Garnelo et al. (2018); Kirkpatrick et al. (2017)) demonstrate the effectiveness of their proposed approaches in one or more applications in these two fields.

前一节提供的知识编辑问题形式化具有普遍性，在该框架下可以制定知识编辑应用于不同领域各种任务的多类场景。该领域虽发展历史短暂但成果丰硕，目前主要应用于两大方向：计算机视觉 (Computer Vision) 与自然语言处理 (Natural Language Processing)。Sinitsin等 (2020) 通过图像分类和机器翻译任务验证了其有效性，此后几乎所有研究 (甚至早于Garnelo等 (2018) 和Kirkpatrick等 (2017) 的工作) 都在这两个领域的一个或多个应用中证明了所提方法的有效性。

Even though the knowledge editing framework can be defined independently of the target domain and task, each specific application has its unique challenges and intricacies, which we explore in this section. 3.1 covers the most common tasks in the Computer Vision domain, as well as the datasets on which the tasks are usually addressed, and how the knowledge editing problem can be instantiated in this context. Section 3.2 provides a similar overview for applications to the Natural Language Processing domain. Finally, 3.3 describes tasks and datasets that do not strictly fit in any of the two domains above.

尽管知识编辑框架可以独立于目标领域和任务进行定义，但每个具体应用都有其独特的挑战和复杂性，我们将在本节探讨这些内容。3.1节涵盖计算机视觉领域最常见的任务、通常用于解决这些任务的数据集，以及在此背景下如何实例化知识编辑问题。3.2节为自然语言处理领域的应用提供了类似的概述。最后，3.3节描述了不完全属于上述两个领域的任务和数据集。

3.1 Computer Vision

3.1 计算机视觉

Computer Vision is a broad field with a long history, which generally attempts to extract meaningful representations from visual media to derive an understanding of the represented scenes Szeliski (2022). Over the last years, deep learning methods have been shown to outperform previous state-of-the-art techniques in several applications in the field Voulodimos et al. (2018), and while more “traditional” techniques are still relevant for some applications, neural networks-based approaches have become the de facto standard for many others O’Mahony et al. (2020). Due to the importance and breadth of the field, and the relevance of neural networks therein, knowledge editing literature has found fertile grounds in Computer Vision, and has so far gravitated towards two primary tasks: Image Classification and Image Completion. A number of datasets are customarily used to test approaches to solve these tasks. They vary in terms of number of examples, classes and channels, as well as resolution of the representative images; Table 1 provides an overview of the most commonly used ones.

计算机视觉是一个历史悠久且广泛的领域，其核心目标是从视觉媒体中提取有意义的表征，进而理解所呈现的场景 Szeliski (2022)。近年来，深度学习方法在该领域的多个应用中展现出超越传统技术的性能 Voulodimos et al. (2018)，虽然某些场景仍适用"传统"技术，但基于神经网络的方法已成为多数任务的实际标准 O’Mahony et al. (2020)。鉴于该领域的重要性与广泛性，以及神经网络在其中扮演的关键角色，知识编辑研究在计算机视觉领域获得了丰沃土壤，目前主要聚焦于两大任务：图像分类与图像补全。为测试这些任务的解决方案，研究者通常使用多种数据集，这些数据集在样本数量、类别、通道数以及图像分辨率等方面存在差异；表1列举了最常用的数据集概况。

Image Classification The task of image classification is straightforward, we wish to label a complete image (or predetermined portion) with its most likely semantic category, e.g., horse, cat, or car Szeliski (2022). In this context, an example is an image and its semantic label. The image, of predefined dimension (or resolution), is encoded as a 3D tensor $\boldsymbol{x}^{(i)}\in\mathbb{R}^{W\times}\bar{H}\times C$ , where $W$ is the width of the image, $H$ the height of the image, and $C$ the number of channels, depending on whether the image is grayscale (1) or RGB (3) or something else (e.g., RGBD Firman (2016) with an additional depth channel). The editing task is then often formulated by artificially corrupting a subset of either the iamnda gseusb soer qtuhee nltalyb eilnst.r oTdhuec ilnatgt erra insd tohme lmaborele sph rue fv fa lil neng t,w aitnhdi nu sa uwaliltyh hinelvdo lsveet st otr cari eni ante g amn oeddeitl ss eotn $(x_{e}^{(i)},y_{e}^{(i)})_ {i=1}^{N}$ , dw a th aes re et originally $y^{(i)}\neq y_{e}^{(i)}$ . Other works, such as Sotoudeh and Thakur (2021), introduce e.g. motion blur or fog Mu and Gilmer (2019) to corrupt a subset of the images, creating an edit set where originally $\boldsymbol{x}^{(i)}\neq\boldsymbol{x}_{e}^{(i)}$ .

图像分类
图像分类的任务很直观，我们希望用最可能的语义类别（如马、猫或汽车 Szeliski (2022)）标注完整图像（或预定部分）。在此背景下，一个示例就是图像及其语义标签。图像以预定义尺寸（或分辨率）编码为三维张量 $\boldsymbol{x}^{(i)}\in\mathbb{R}^{W\times H\times C}$，其中 $W$ 为图像宽度，$H$ 为图像高度，$C$ 为通道数，具体取决于图像是灰度（1）、RGB（3）还是其他格式（例如带额外深度通道的 RGBD Firman (2016)）。编辑任务通常通过人为破坏部分图像或标签来构建，例如随机修改标签以生成训练集 $(x_{e}^{(i)},y_{e}^{(i)})_ {i=1}^{N}$，其中原始标签满足 $y^{(i)}\neq y_{e}^{(i)}$。其他工作如 Sotoudeh 和 Thakur (2021) 会引入运动模糊或雾效 Mu 和 Gilmer (2019) 来破坏部分图像，构建编辑集使得原始图像满足 $\boldsymbol{x}^{(i)}\neq\boldsymbol{x}_{e}^{(i)}$。

Several datasets support knowledge editing experiments for this task, and a distinction is often made among “toy” and “large-scale” datasets. Well-known datasets like MNIST LeCun et al. (2010) and CIFAR-10 Krizhevsky (2009), which are widely used in the literature, are frequently employed for experimentation and belong to the former category. For more challenging and realistic scenarios, researchers turn to larger scale datasets, of which the most popular is surely the extensive ImageNet Database Deng et al. (2009a), which now encompasses over 10 million labeled images, spanning more than 20,000 object categories. Specifically, studies such as Sinitsin et al. (2020) and Lee et al. (2019) explore datasets derived from the ImageNet Large Scale Visual Recognition Challenges (ILSVRC) Russ a kov sky et al. (2015). To further accentuate the complexity, Sinitsin et al. (2020) introduces a highly challenging configuration, leveraging the Natural Adversarial Examples (NAE) dataset Hendrycks et al. (2021), consisting of 7500 natural images notorious for their arduous classification nature, where pre-trained models exhibit correct prediction rates of less than $1%$ for NAEs.

多个数据集支持该任务的知识编辑实验，通常分为"玩具级"和"大规模"数据集。文献中广泛使用的经典数据集如MNIST (LeCun et al., 2010) 和CIFAR-10 (Krizhevsky, 2009) 常被用于实验，属于前一类。针对更具挑战性的现实场景，研究者转向更大规模的数据集，其中最流行的当属包含1000多万张标注图像、涵盖2万多个物体类别的ImageNet数据库 (Deng et al., 2009a)。具体而言，Sinitsin et al. (2020) 和Lee et al. (2019) 等研究探索了源自ImageNet大规模视觉识别挑战赛 (ILSVRC) (Russakovsky et al., 2015) 的数据集。为突显复杂性，Sinitsin et al. (2020) 引入了极具挑战性的配置方案，采用以分类困难著称的自然对抗样本 (NAE) 数据集 (Hendrycks et al., 2021)——该数据集包含7500张自然图像，预训练模型对其正确预测率不足$1%$。

Dataset	Tasks	#Examples	#Classes	#Channels	Resolution
MNIST	Classification, Inpainting	70k	10	1	28x28
CIFAR-10	Classification	60k	10	3	32x32
CIFAR-100	Classification	600k	100	3	32x32
ImageNet	Classification	1.2 M	1000+	3	224x224+
NAE	Classification	7.5k	200+	3	224x224+
CelebA	Inpainting	200k		3	178x218

Table 1: Most important datasets used in Computer Vision for Knowledge editing. MNIST, CIFAR-10, CIFAR-100 are generally regarded as “toy” datasets while ImageNet, NAE, CelebA as more challenging testbeds.

数据集	任务	样本数	类别数	通道数	分辨率
MNIST	分类、图像修复	70k	10	1	28x28
CIFAR-10	分类	60k	10	3	32x32
CIFAR-100	分类	600k	100	3	32x32
ImageNet	分类	1.2 M	1000+	3	224x224+
NAE	分类	7.5k	200+	3	224x224+
CelebA	图像修复	200k		3	178x218

表 1: 知识编辑领域最重要的计算机视觉数据集。MNIST、CIFAR-10、CIFAR-100通常被视为"玩具"数据集，而ImageNet、NAE、CelebA则被视为更具挑战性的测试平台。

Image Inpainting Image inpainting, also known as image completion, is the task of reconstructing missing regions in an image Szeliski (2022). The problem is formalized as a regression over functions mapping pixel coordinates within $[0,1]^{2}$ to pixel values in [0, 1] (grayscale) or $[0,1]^{2}$ to pixel values in $[0,1]^{3}$ (RGB). This task has so far received less attention from the knowledge editing community Garnelo et al. (2018), leveraging datasets that encompass both the MNIST dataset, once again serving as a rudimentary example, and the CelebFaces Attributes Dataset (CelebA) Liu et al. (2015) for more challenging scenarios. The CelebA dataset presents a more demanding scenario, offering over 200,000 celebrity images, each accompanied by 40 attribute annotations, making it a challenging and comprehensive dataset for exploration.

图像修复
图像修复（Image Inpainting），也称为图像补全，是指重建图像中缺失区域的任务 Szeliski (2022)。该问题被形式化为一个回归问题，即从像素坐标空间 $[0,1]^{2}$ 映射到灰度值 [0, 1] 或 RGB 值 $[0,1]^{3}$ 的函数回归。目前知识编辑领域对此任务的关注较少 Garnelo et al. (2018)，主要利用包含 MNIST 数据集（作为基础示例）和 CelebFaces Attributes Dataset (CelebA) Liu et al. (2015) 的数据集进行挑战性场景研究。CelebA 数据集提供了超过 20 万张名人图像，每张图像附带 40 个属性标注，是一个具有挑战性且全面的探索数据集。

3.2 Natural Language Processing

3.2 自然语言处理

Natural Language Processing (NLP) is also a broad field, concerned with giving computers the ability to process and understand human language Eisenstein (2019). Like in computer vision, in recent years researchers and practitioners in the field have leveraged the power of neural networks with many outstanding results Otter et al. (2020); Soltan et al. (2022); FitzGerald et al. (2022). With the recent paradigm shift from supervised learning to pre-training followed by fine-tuning Wang et al. (2022), and the trend towards larger and larger models Zhao et al. (2023), the ability to perform a cheap edit of a model instead of an expensive fine-tuning has motivated an intense interest from the knowledge editing community. Undoubtedly, within the NLP realm, the most widely targeted tasks for knowledge editing are Fact-Checking when dealing with classification, and (closed book) Question Answering for language generation. Some recent works also explore open Text Generation, editing of factual relations with had hoc datasets and also Document Classification. Table 2 provides an overview of the datasets commonly used for those tasks.

自然语言处理 (NLP) 同样是一个广泛的领域，致力于让计算机具备处理和理解人类语言的能力 Eisenstein (2019)。与计算机视觉类似，近年来该领域的研究人员和从业者利用神经网络的力量取得了许多杰出成果 Otter et al. (2020); Soltan et al. (2022); FitzGerald et al. (2022)。随着最近从监督学习转向预训练加微调的模式转变 Wang et al. (2022)，以及模型规模越来越大的趋势 Zhao et al. (2023)，以低成本编辑模型而非昂贵微调的能力激发了知识编辑社区的浓厚兴趣。毫无疑问，在 NLP 领域，知识编辑最广泛针对的任务是分类场景下的事实核查 (Fact-Checking) 和语言生成场景下的 (闭卷) 问答。一些最新研究还探索了开放文本生成、使用特定数据集编辑事实关系以及文档分类。表 2 概述了这些任务常用的数据集。

Fact-checking Fact-checking is the task of assessing whether claims made in written or spoken language are true, often addressed as a binary classification task Guo et al. (2022). In this setting, examples are natural language claims coupled with binary labels, even though occasionally a third neutral option is available. The claim $\boldsymbol{x}^{(i)}$ is encoded with some token iz ation scheme (or pre-trained word embedding) as a sequence of integers (or semantic vectors), while the label $y^{(i)}$ can take one of two values, positive or negative (optionally a third, aforementioned neutral value). One can then have a neural network predict this value with an explicit classification layer, or alternatively a language model producing special tokens (e.g., True/False or Supports/Refutes) for the considered truth values, when prompted with the claim under consideration. The Fact Checking task has been considered particularly appealing for the knowledge editing community De Cao et al. (2021); Mitchell et al. (2022a,b); Huang et al. (2023) for at least a couple of reasons. First, the recent phenomenal success of language models also highlighted their proneness to generate reasonable but factually wrong natural language text Ortega et al. (2021); Ji et al. (2023). This degrades the system performance and fails to meet user expectations in many real-world scenarios, leading to a great interest in the ability to mitigate these hallucinations. Furthermore, reasonable edit sets are fairly easy to obtain, e.g. randomly flipping the labels of claims from pre-existing datasets from Support to Refutes and vice versa. The most widely used datasets for this task are FEVER Thorne et al. (2018) and VitaminC Schuster et al. (2021). Both of them are extracted from Wikipedia and annotated by human experts, to arrive at (evidence, wikipage, claim, label) tuples. In both cases, the label can be Supports, Refutes or Not Enough Info depending on whether the evidence supports or not the claim. To construct proper editing datasets ${(x_{e}^{(i)},y_{e}^{(i)})_{i=1}^{N}}$ out of them, De Cao et al. (2021) (for FEVER) and Mitchell et al. (2022b)

事实核查
事实核查是评估书面或口头陈述真实性的任务，通常被视为二元分类任务 Guo et al. (2022)。在此设定中，样本为自然语言陈述及其二元标签，偶尔会包含第三个中性选项。陈述 $\boldsymbol{x}^{(i)}$ 通过某种 token 化方案（或预训练词嵌入）编码为整数序列（或语义向量），而标签 $y^{(i)}$ 可取正值或负值（可选地包含前文提及的中性值）。随后可通过神经网络的显式分类层预测该值，或由大语言模型针对待验证陈述生成特殊 token（如 True/False 或 Supports/Refutes）。

事实核查任务尤其受到知识编辑领域关注 De Cao et al. (2021); Mitchell et al. (2022a,b); Huang et al. (2023)，主要原因包括：首先，大语言模型近年来的显著成功也暴露了其易生成合理但事实错误的自然文本的倾向 Ortega et al. (2021); Ji et al. (2023)。这种现象会降低系统性能，并在现实场景中无法满足用户预期，因此如何缓解这类幻觉引发了广泛兴趣。此外，合理的编辑集较易获取，例如将现有数据集中陈述的标签随机从 Supports 翻转为 Refutes，反之亦然。

该任务最广泛使用的数据集是 FEVER Thorne et al. (2018) 和 VitaminC Schuster et al. (2021)。两者均从维基百科提取并由专家标注，形成（证据、维基页面、陈述、标签）四元组。两种数据集的标签均为 Supports、Refutes 或 Not Enough Info，取决于证据是否支持陈述。为构建编辑数据集 ${(x_{e}^{(i)},y_{e}^{(i)})_{i=1}^{N}}$，De Cao et al. (2021)（针对 FEVER）和 Mitchell et al. (2022b)

(for VitaminC) grouped facts based on the same pages, augmented each fact $x_{e}^{(i)}$ with some rephrases $\tilde{x}_{e}^{(i)}$ , to assess generality, and randomly flipped the label ye ̸= y(i) in each group.

(for VitaminC) 将基于相同页面的事实分组，通过添加一些改写版本 $\tilde{x}_ {e}^{(i)}$ 来增强每个事实 $x_{e}^{(i)}$ ，以评估泛化性，并在每组中随机翻转标签 ye ̸= y(i)。

Question Answering The task of training a model to provide a correct natural language answer to a given question is referred to as question answering; more specifically, closed-book question answering is restricted to the case when the model is only fed with the question, and not a selection of possible answers, or supporting corpus from which an answer can be extracted Roberts et al. (2020). In this case, both the input $x$ and the target label $y$ are natural language sequences, and we test the extent to which the model parameters implicitly store the required knowledge to provide correct answers. The most widely adopted dataset is surely the zero-shot Relation Extraction dataset (zsRE) Levy et al. (2017), used in particular by Zhu et al. (2020); De Cao et al. (2021); Mitchell et al. (2022a,b); Meng et al. (2022a,b); Huang et al. (2023); Hartvigsen et al. (2022). This task and datasets are particularly appealing for knowledge editing. Indeed, closed-book question answering can benefit greatly from knowledge editing, as pipelines that solve the task usually leverage factual knowledge to answer questions; in the case of neural networks, this knowledge is acquired during training and implicitly stored in the networks’ parameters, and it is unclear how to tweak these parameters to correct wrong or undesired answers, especially as the networks grow bigger in size. Meng et al. (2022b) hypothesize that this factual knowledge takes the form of (relation, subject, object) triples, with intermediate layers acting as key-value storage units. This formalization lends itself nicely to the definition of an editing objective, rather than directly the open-ended natural language generation task. Furthermore, Levy et al. (2017) demonstrates that it is possible to reduce relation extraction to the problem of answering simple reading comprehension questions and provided in their dataset multiple templates for each relation. For example, the triple (occupation, s, o) can be naturally extracted by answering one of the following questions: What did s do for a living?, What is s ’s job? What is the profession of $s{\mathit{?}}$ . The subject $s$ can then be modified to create editing examples. Section 4.3 further discusses factual knowledge and how different works have modeled it for improving knowledge editing. Besides zsRE, knowledge editing of models solving Question Answering has been studied leveraging also on additional datasets such as t-REX Elsahar et al. (2018) and Natural Questions (NQ) Kwiatkowski et al. (2019). Finally, as a more challenging flavor of the same task with added counter factual information Meng et al. (2022b) introduced a new dataset called Counteract.

问答
训练模型为给定问题提供正确的自然语言答案的任务被称为问答；更具体地说，闭卷问答限制于模型仅接收问题本身，而不提供可能的答案选项或可提取答案的支持语料的情况 (Roberts et al., 2020)。此时，输入 $x$ 和目标标签 $y$ 均为自然语言序列，我们测试模型参数隐式存储所需知识以提供正确答案的程度。最广泛采用的数据集无疑是零样本关系抽取数据集 (zsRE) (Levy et al., 2017)，被 Zhu et al. (2020)、De Cao et al. (2021)、Mitchell et al. (2022a,b)、Meng et al. (2022a,b)、Huang et al. (2023)、Hartvigsen et al. (2022) 等研究特别使用。该任务和数据集对知识编辑特别有吸引力。实际上，闭卷问答可以极大受益于知识编辑，因为解决该任务的流程通常利用事实知识来回答问题；对于神经网络，这些知识在训练期间获得并隐式存储于网络参数中，而如何调整这些参数以修正错误或不想要的答案尚不明确，尤其是随着网络规模增大。Meng et al. (2022b) 假设这种事实知识以 (关系, 主语, 宾语) 三元组的形式存在，中间层充当键值存储单元。这种形式化很好地服务于编辑目标的定义，而非直接开放式的自然语言生成任务。此外，Levy et al. (2017) 证明可以将关系抽取简化为回答简单阅读理解问题，并在数据集中为每个关系提供了多个模板。例如，三元组 (职业, s, o) 可以通过回答以下问题之一自然提取：s 以什么为生？s 的工作是什么？$s{\mathit{?}}$ 的职业是什么。然后可以修改主语 $s$ 以创建编辑示例。第4.3节进一步讨论事实知识以及不同工作如何建模以改进知识编辑。除 zsRE 外，还利用 t-REX (Elsahar et al., 2018) 和自然问题 (NQ) (Kwiatkowski et al., 2019) 等额外数据集研究了问答模型的知识编辑。最后，作为同一任务的更具挑战性变体，Meng et al. (2022b) 引入了一个包含反事实信息的新数据集 Counteract。

Further NLP tasks Beside the two popular tasks outlined above, Mitchell et al. (2022a) tested editing text generation from auto regressive GTP-like models on a special version of WikiText-103 Merity et al. (2016), where they are considering as prompts $(x_{e})$ passages sampled from WikiText itself and as edit targets $(y_{e})$ 10-token samples from a pre-trained distilGPT-2 model. This was for them a valid challenging editing setup since for their target model for editing a greedy 10-token prediction agrees with these edit targets for $\bar{<}1%$ of examples they extracted. Finally, more recently, Hartvigsen et al. (2022) tested their methodology on a novel task for knowledge editing using the SCOTUS dataset from Chalkidis et al. (2022). The classification task is to categorize U.S. Supreme Court documents over multiple decades into 11 topics. What makes this task interesting is that, over time, categorization rules change, so that label distributions shift. We note how this is off the shelf particularly realistic for knowledge editing as much of the world knowledge memorized by networks evolves over time just like those labels shifts in the dataset and the target of knowledge editing can be seen as keeping updated such world knowledge.

其他NLP任务
除了上述两个常见任务外，Mitchell等人(2022a)在WikiText-103 Merity等人(2016)的特殊版本上测试了自回归GTP类模型的文本生成编辑。他们将来自WikiText本身的段落样本作为提示$(x_{e})$，并将预训练distilGPT-2模型生成的10个token样本作为编辑目标$(y_{e})$。这对他们而言是一个有效的挑战性编辑设置，因为在目标模型中，贪婪的10-token预测与这些编辑目标的一致性仅占提取样本的$\bar{<}1%$。最近，Hartvigsen等人(2022)在Chalkidis等人(2022)的SCOTUS数据集上测试了其知识编辑新任务的方法。该分类任务需将数十年的美国最高法院文件归类至11个主题。该任务的独特之处在于分类规则会随时间变化，导致标签分布发生偏移。我们注意到，这种现成的设置对知识编辑特别真实，因为网络记忆的大部分世界知识会像数据集中标签偏移那样随时间演变，而知识编辑的目标可视为保持此类世界知识的更新。

Dataset	Tasks	Format	#Examples	#Classes
FEVER	Fact Checking	(evidence, wikipage, claim, label)	420k	3
VitaminC	Fact Checking	(evidence, wikipage, claim, label)	450k	3
ZsRE	Question Answering	(subject, relation, object)	120M
T-REx	Question Answering	(subject,relation,object)	11M
NQ	Question Answering	(question,answer)	320k
CounterFact	Question Answering	(subject, relation,true object,false object)	22k
Wikitext	Text Generation	tokens	100M
SCOTUS	DocumentClassification	(date,text,label)	9.2k	11

Table 2: Most important datasets for knowledge editing used in NLP. We report characteristics of the original datasets even though often for knowledge editing ad hoc version, pre processed to make editing meaningful are used.

数据集	任务	格式	样本数量	类别数
FEVER	事实核查	(证据, 维基页面, 声明, 标签)	420k	3
VitaminC	事实核查	(证据, 维基页面, 声明, 标签)	450k	3
ZsRE	问答	(主语, 关系, 宾语)	120M
T-REx	问答	(主语, 关系, 宾语)	11M
NQ	问答	(问题, 答案)	320k
CounterFact	问答	(主语, 关系, 真实宾语, 虚假宾语)	22k
Wikitext	文本生成	Token	100M
SCOTUS	文档分类	(日期, 文本, 标签)	9.2k	11

表 2: 自然语言处理中知识编辑使用的最重要数据集。我们报告了原始数据集的特性，尽管知识编辑通常使用经过预处理以使其有意义的特定版本。

3.3 Other Applications

3.3 其他应用

Even though the majority of works in the knowledge editing literature has focused on the Computer Vision and Natural Language Processing fields, as described above, the general nature of the editing problem yielded interesting results also in other fields, and will likely yield more in more diverse fields and applications in the years to come. Among these, to the best of our knowledge, the most notable examples are applications in safety-critical scenarios and to graph neural networks; in the following we briefly review works from both.

尽管知识编辑领域的大多数研究集中在计算机视觉和自然语言处理领域，但如上所述，编辑问题的普适性也在其他领域产生了有趣成果，并有望在未来更广泛的领域和应用中催生更多突破。据我们所知，其中最显著的案例是安全关键场景的应用和图神经网络的应用。下文我们将简要回顾这两类工作。

Safety-critical Systems Safety-critical systems are those systems whose failure may lead to consequences that are determined to be unacceptable, such as significant damage to properties, the environment, or people Knight (2002). Deep neural networks have grown in popularity over the past decade and are now being used in safety-critical domains such as self-driving cars Gupta et al. (2021), healthcare Tekkesin et al. (2019) and aviation Sridhar (2020). Clearly, in such critical scenarios, being able to find and correct unsafe neural network behavior becomes a crucial objective. This has motivated a line of research within the knowledge editing community, that so far has only touched the aviation domain, specifically the aircraft collision avoidance problem. The systems currently employed provide safety guarantees at the cost of being poorly data efficient, and efforts have been made to integrate neural networks into the pipeline Julian et al. (2016); Julian and Koch ender fer (2019). As a consequence, several subsequent works Sotoudeh and Thakur (2021); Fu and Li (2021); Liang et al. (2023) from the knowledge editing community have proposed approaches for fixing unsafe behavior of neural networks integrated in safety-critical pipelines.

安全关键系统
安全关键系统是指那些一旦失效可能导致不可接受后果的系统，例如对财产、环境或人员造成重大损害 [Knight (2002)]。过去十年中，深度神经网络日益普及，现已被应用于自动驾驶汽车 [Gupta et al. (2021)]、医疗保健 [Tekkesin et al. (2019)] 和航空 [Sridhar (2020)] 等安全关键领域。显然，在此类关键场景中，发现并修正神经网络的不安全行为成为至关重要的目标。这推动了知识编辑领域的一系列研究，目前仅涉及航空领域，特别是飞机防撞问题。现有系统以数据效率低下为代价提供安全保障，而学界已尝试将神经网络整合到流程中 [Julian et al. (2016); Julian and Kochenderfer (2019)]。因此，知识编辑领域的后续研究 [Sotoudeh and Thakur (2021); Fu and Li (2021); Liang et al. (2023)] 提出了修正安全关键流程中神经网络不安全行为的方法。

Developing a robust collision avoidance algorithm that reliably prevents collision without alerting excessively is challenging due to sensor error and uncertainty in the future behavior of the aircraft. The Airborne Collision Avoidance System X (ACAS X) Koch ender fer and Chr ys sant haco poul os (2011) family of collision avoidance systems formulates the problem of collision avoidance as a partially observable Markov decision process. The variant for unmanned aircraft, ACAS Xu, uses dynamic programming (DP) to then find a solution in terms of resolution advisories that avoid collisions while minimizing disruptive alerts. The DP process makes use of a massive lookup table, that makes storage costly and certification time-consuming for certified avionics systems. Therefore, Julian et al. (2016) propose using a deep neural network for compressing the table without loss of performance, as measured by a set of safety and operational metrics. There are seven real-valued state variables that define an aircraft encounter, describing its geometry in terms of the two aircraft involved: (1) distance from ownship to intruder (2) angle to intruder relative to ownship heading direction (3) heading angle of intruder relative to ownship heading direction (4) speed of ownship 5) speed of intruder 6) time until loss of vertical separation (7) previous advisory action. There are then five possible horizontal maneuver advisories that the system can produce: clear-of-conflict, or adjusting course by turning left or right at two fixed angles (hence 4 more possibilities). The state variables are usually disc ret i zed arriving at $\approx120$ million points, and the aforementioned lookup table associates scores to all pairs of 120 million states and five actions. This table is what makes up the ACAS $\mathrm{Xu}$ dataset: ${(\boldsymbol{x}^{(i)},\boldsymbol{y}^{(i)})_ {i=1}^{N}}$ with $N=5\times120$ million, where $\boldsymbol{x}^{(i)}$ represents a disc ret i zed seven-dimensional state, and $y^{(i)}\in\mathbb{R}^{5}$ is the vector of scores associated to each of the five possible actions in that state. With 600 million floating-point numbers, the table requires over 2 GB of storage. The task for the neural network is to regress this table, minimizing parametric knowledge required (i.e., number of parameters) and error with respect to the table. It is interesting to note that this is an atypical regression problem, since we aim for the guarantee that the optimal advisory remains the same. When the difference between the scores of the first and second-best advisories is relatively small, simple regression techniques (e.g., minimizing the Mean Squared Error) can lead to the network realizing a different strategy from that of the original table. This is reflected in the design of the loss function. The network or its further refinements Julian and Koch ender fer (2019) is then verified via tools such as Wang et al. (2018); Katz et al. (2017), that are able to prove several input-output-based security properties, e.g., that a clear-of-conflict advisory will always be issued if the intruder is sufficiently far away, thus providing formal guarantees about DNN behavior. These properties are formalized as implications of the form $\forall x,x\in B\implies f_{\theta}(x)\in C$ , where $f_{\theta}$ is the function approx im at or realized by the DNN with parameters $\theta$ , $B$ is a bounded region of the input space and $C$ a bounded region of the output space. The ACAS $\mathrm{Xu}$ case study has so far been of great interest for the knowledge editing community since one such security property has been found to not be satisfied by the original network, exposing an input on which the network was inconsistent with the lookup table $\phi_{8}$ in Katz et al. (2017). This discrepancy would then be addressed by retraining the DNN, thus leading to the central question of knowledge editing: how to fix the network behavior on a limited set of points without affecting its behavior on unrelated points. Each work mentioned at the beginning of the paragraph has addressed this problem differently, but sharing the same setup: once a failing security property for a network is identified, and one is able to generate counter-examples, i.e., pairs $(x^{(i)},y^{(i)})$ such that $\boldsymbol{x}^{(i)}\in\boldsymbol{B}$ but $y^{(i)}\notin C$ , a certain strategy for defining candidate edit pairs can be formalized, i.e., how to assign a particular $\bar{y}^{(i)}$ to $y^{(i)}$ . Then, usually a subset of this becomes the edit set, while the remaining portion is chosen as generality set; finally, a locality set is defined by points correctly classified by the network, i.e., input-output pairs for which the properties under scrutiny hold true.

由于传感器误差和飞行器未来行为的不确定性，开发一种既能可靠防撞又不会过度告警的鲁棒避碰算法具有挑战性。机载防撞系统X (ACAS X) [Koch ender fer和Chr ys sant haco poul os (2011)] 系列将避碰问题建模为部分可观测马尔可夫决策过程。其无人飞行器版本ACAS Xu采用动态规划(DP)生成解决方案，在避免碰撞的同时最小化干扰性告警。该DP过程依赖庞大的查找表，导致航电系统存储成本高且认证耗时长。为此，Julian等人(2016)提出使用深度神经网络无损压缩该表，并通过安全性与操作性指标验证性能。

飞行器遭遇场景由七个实值状态变量定义，描述两架飞行器的几何关系：(1) 本机与入侵者的距离 (2) 相对于本机航向的入侵者方位角 (3) 相对于本机航向的入侵者航向角 (4) 本机速度 (5) 入侵者速度 (6) 垂直间隔丧失倒计时 (7) 先前告警动作。系统可生成五种水平机动告警：冲突解除，或以两种固定角度左转/右转调整航向(共4种可能)。状态变量经离散化后形成约1.2亿个点，查找表为所有状态-动作对关联评分，构成ACAS Xu数据集：${(\boldsymbol{x}^{(i)},\boldsymbol{y}^{(i)})_{i=1}^{N}}$（$N=6$亿），其中$\boldsymbol{x}^{(i)}$为离散化七维状态，$y^{(i)}\in\mathbb{R}^{5}$是该状态下五个动作的评分向量。该表含6亿浮点数，存储需超2GB。

神经网络的任务是以最小参数量回归该表，同时保持最优告警策略不变。当最优与次优告警评分接近时，传统回归方法(如最小化均方误差)可能导致策略偏移，这体现在损失函数设计中。经Wang等人(2018)、Katz等人(2017)等工具验证，该网络[Julian和Koch ender fer (2019)]能保证形式化安全属性，例如$\forall x,x\in B\implies f_{\theta}(x)\in C$，其中$f_{\theta}$是DNN实现的函数近似器，$B$和$C$分别为输入/输出空间的限定区域。

ACAS Xu案例因原始网络未满足某项安全属性[Katz等人(2017)中的$\phi_{8}$不一致性]而成为知识编辑领域的研究热点。该差异需通过DNN重训练解决，引出了知识编辑的核心问题：如何修正有限点集上的网络行为而不影响无关点。相关研究均遵循相同框架：发现失效属性并生成反例$(x^{(i)},y^{(i)})$后，定义候选编辑对策略(如分配$\bar{y}^{(i)}$)，将其分为编辑集与泛化集，并建立由正确分类点组成的局部性验证集。

Graph Neural Networks Deep learning models have been particularly successful when dealing with signals such as speech, images, or video, in which there is an underlying Euclidean structure; however, recently, there has been a growing interest in trying to apply learning on non-Euclidean geometric data, for instance represented in the form of graphs Bronstein et al. (2017). Graph Neural Networks (GNNs) learn node representations by applying shared permutation invariant functions over local neighborhoods of nodes in the graph Bronstein et al. (2021). These representations can then be used for tasks like node classification. For instance, assigning a category to each paper in a citation graph Wu et al. (2020). GNNs have achieved prominent results in learning features and topology of graph data, however, knowledge editing for GNNs is rarely explored, despite their widespread applicability; therefore, Liu et al. (2023) propose a method to edit these models, restricting to the aforementioned node classification task.

图神经网络 (Graph Neural Networks)
深度学习模型在处理具有欧几里得结构的信号(如语音、图像或视频)时取得了显著成功。然而，近年来人们越来越关注在非欧几里得几何数据(例如以图形式表示的数据)上应用学习技术 Bronstein et al. (2017)。图神经网络通过在图节点的局部邻域上应用共享的置换不变函数来学习节点表示 Bronstein et al. (2021)。这些表示可用于节点分类等任务，例如为引文图中的每篇论文分配类别 Wu et al. (2020)。虽然GNN在学习图数据的特征和拓扑结构方面取得了突出成果，但尽管其应用广泛，针对GNN的知识编辑研究却很少。为此，Liu et al. (2023)提出了一种编辑这些模型的方法，但仅限于上述节点分类任务。

The task can be formalized as follows: let $G=(V,E)$ be an undirected graph with $V=\left(v_{1},\ldots,v_{|V|}\right)$ and $E=$ $(e_{1},\dots,e_{|E|})$ being the set of nodes and edges, respectively. Given a feature space $\mathcal{X}$ (e.g., the space of real-valued $d$ -dimensional feature vectors $\mathbb{R}^{d}$ ), a node feature tensor $X\in\mathcal{X}^{\vert V\vert}$ and a label space $\mathcal{V}$ , the goal is to learn a representation $h_{v}$ from which a label $y_{v}\in\mathcal{V}$ for each node $v\in V$ can be easily predicted. Many datasets for node classification exist in the literature, comprised of data from various domains like citation networks and social networks, and we find again the distinction between small-scale and large-scale datasets $\mathrm{Wu}$ et al. (2020). Among the former we find datasets like Cora, which contains a selection of Machine Learning papers collected from the web, and the references automatically parsed from their bibliographies McCallum et al. (2000). In particular, the network contains $|V|=2708$ nodes (articles) and $|E|=5429$ links (citations); the feature space is $\bar{\mathcal{X}}={0,1}^{1433}$ , i.e., each node is described in terms of the presence (or absence) of certain keywords, taken from a dictionary of 1433 unique words (bag-of-words content representation); the label space is $y={1,\ldots,7}$ , i.e. the task is to predict to which of seven classes (e.g., Theory or Reinforcement Learning) each publication belongs. The Reddit dataset is instead a popular representative of large-scale node classification datasets. Hamilton et al. (2017). constructed a graph dataset from Reddit posts made in the month of September 2014. The node label $y_{v}\in\mathcal{V}$ in this case is the community, or “subreddit”, that a post belongs to, considering $|\mathcal{V}=50$ large communities. A post-to-post graph is constructed connecting posts if the same user comments on both. In total, this dataset contains $\lvert V\rvert=232,965$ posts with an average degree of 492 ( $|E|$ is in the order of 100 million edges). In both of these cases, and the many other instances of node classification datasets, constructing an edit dataset is fairly straightforward, and done in the same manner as for the image classification task in Computer Vision. After training the model $f_{0}(\cdot)$ under consideration on a subgraph, one evaluates it on the whole graph: each pair $(x^{(i)},y^{(i)})$ such that $\hat{y}^{(i)}=\arg\operatorname*{max}f(x^{(i)})$ is incorrect, becomes an edit pair $(x_{e}^{(i)},y_{e}^{(i)})$ . The geometry of graphs lends itself nicely to defining also generality and locality sets: indeed, since the task under consideration is node classification, as we have seen a single example $x^{\mathit{(i)}}$ describes a node $v$ , then one can define its neighborhood $N(x^{(i)})$ to be its actual neighborhood in the graph $\bar{N}_{G}(v)={w\in V\mid\exists(v,w)\in E}$ ; from this definition, generality, and locality sets follow consequently, as seen in earlier sections.

该任务可形式化描述如下：设无向图$G=(V,E)$中$V=\left(v_{1},\ldots,v_{|V|}\right)$为节点集合，$E=(e_{1},\dots,e_{|E|})$为边集合。给定特征空间$\mathcal{X}$（例如实值$d$维特征向量空间$\mathbb{R}^{d}$）、节点特征张量$X\in\mathcal{X}^{\vert V\vert}$和标签空间$\mathcal{V}$，目标是学习表征$h_{v}$以轻松预测每个节点$v\in V$的标签$y_{v}\in\mathcal{V}$。现有文献包含众多节点分类数据集，涵盖引文网络、社交网络等多个领域，并延续了Wu等人(2020)提出的小规模与大规模数据集划分标准。

小规模数据集代表如Cora，收录了从网络收集的机器学习论文及其参考文献自动解析结果(McCallum等, 2000)。该网络包含$|V|=2708$个节点（论文）和$|E|=5429$条边（引用），特征空间为$\bar{\mathcal{X}}={0,1}^{1433}$（基于1433个关键词的词袋表示），标签空间$y={1,\ldots,7}$对应七类研究主题（如理论或强化学习）。

Reddit数据集则是大规模节点分类的典型代表(Hamilton等, 2017)，构建自2014年9月的Reddit发帖数据。节点标签$y_{v}\in\mathcal{V}$表示帖子所属的50个大型社区（subreddit），通过用户评论关系构建帖子关联图。该数据集共含$\lvert V\rvert=232,965$个帖子，平均度数492（边数$|E|$约1亿条）。

对于此类节点分类数据集，编辑数据集的构建方式与计算机视觉中的图像分类任务类似：在子图上训练目标模型$f_{0}(\cdot)$后，在全图评估时将所有满足$\hat{y}^{(i)}=\arg\operatorname*{max}f(x^{(i)})$错误的预测对$(x^{(i)},y^{(i)})$转为编辑对$(x_{e}^{(i)},y_{e}^{(i)})$。图结构特性天然支持泛化集与局部集的定义：由于任务涉及节点分类，单个样本$x^{\mathit{(i)}}$对应节点$v$时，可将其邻域$N(x^{(i)})$定义为图结构邻域$\bar{N}_{G}(v)={w\in V\mid\exists(v,w)\in E}$，进而推导出泛化集与局部集（如前述章节所示）。

4 Knowledge Editing Methodologies

4 知识编辑方法论

In recent times, several "knowledge editing" methods have been introduced to effectively modify the behaviors of models while maintaining their previous performance on the same task Sinitsin et al. (2020). These approaches draw inspiration from various fields of artificial intelligence research and can be broadly categorized into four distinct families: regular iz ation techniques, meta-learning, direct model editing, and architectural strategies.

近来，多种"知识编辑"方法被提出，旨在有效修改模型行为的同时保持其原有任务性能 [20]。这些方法汲取了人工智能研究不同领域的灵感，可大致分为四类：正则化技术、元学习、直接模型编辑和架构策略。

Regular iz ation techniques utilize various forms of regular iz ation to guide the model’s learning process during fine-tuning, encouraging it to incorporate the desired edits while retaining its original capabilities Zhu et al. (2020). Meta-learning approaches, on the other hand, employ hyper network models to learn parameter updates, enabling efficient adaptation to new tasks or knowledge De Cao et al. (2021); Mitchell et al. (2022a). Direct model editing methods involve directly modifying the model’s parameters or representations to incorporate the desired changes. These techniques can range from simple parameter updates to more complex approaches that leverage the model’s internal representations. Finally, architectural strategies explore modifying the model’s architecture itself, either by introducing new components or restructuring existing ones, to facilitate the integration of new knowledge or behaviors. In the upcoming sections, we will provide detailed discussions on each of these families of knowledge editing methodologies, highlighting their respective areas of application, advantages, limitations, and notable techniques within each category.

正则化技术利用各种形式的正则化来指导模型在微调过程中的学习，鼓励其融入所需的编辑内容同时保留原始能力 (Zhu et al., 2020) 。元学习方法则采用超网络模型来学习参数更新，从而高效适应新任务或新知识 (De Cao et al., 2021; Mitchell et al., 2022a) 。直接模型编辑方法通过直接修改模型参数或表征来实现目标变更，其技术跨度从简单参数更新到利用模型内部表征的复杂方法。最后，架构策略通过引入新组件或重组现有结构来修改模型架构本身，以促进新知识或行为的整合。后续章节将详细讨论这些知识编辑方法族，重点阐述其应用领域、优势、局限及各分类下的代表性技术。

The objective of this section is to categorize various knowledge editing techniques discussed in the literature into the four distinct families mentioned above. All presented works have different characteristics, target different areas of application, types of edits, and adopt diverse experimentation strategies. Nevertheless, the objective of all the works reported can be formulated within the formal framework given in 2.2. A comparison between the most notable KE methodologies at the time of writing can be found in Table 3. On the other hand, Table 4 presents a comparison with non-sequential single batch edits of factual knowledge.

本节的目标是将文献中讨论的各种知识编辑技术归类到上述四个不同的类别中。所有呈现的工作具有不同的特点，针对不同的应用领域、编辑类型，并采用多样化的实验策略。尽管如此，所报告的所有工作的目标都可以在2.2节给出的正式框架内表述。截至撰写本文时，最著名的知识编辑方法比较见表3。另一方面，表4展示了与非顺序单批次事实知识编辑的比较。

KEMethodology	KE Cathegory	Training Required	Preserves Architecture	Only (ce, ye)	SNS	BNS	SSE	BSE	Scales to LM
FT + L2	Regularization	False		×					×
FT + KL	Regularization	False	√	×		√	√
EWC	Regularization	False		×	√		√	√
CNP	Architectural	True	×	√		√			√
ENN	Meta-Learning	True	×						x
KnowledgeEditor	Meta-Learning	True		√				×	√
MEND	Meta-Learning	True		√
MALMEN	Meta-Learning	True	√					×
ROME	Direct Editing	False	√			x		×
MEMIT	Direct Editing	False
PMET	Direct Editing	False			√	√	人	√
SERAC	Architectural	True			√	√		√
CaliNet	Architectural	True						×
T-Patcher	Architectural	False				√
GRACE	Architectural	True	×		√			√

KE方法论	KE类别	需要训练	保留架构	仅(ce, ye)	SNS	BNS	SSE	BSE	可扩展至大语言模型
FT + L2	正则化	否		×					×
FT + KL	正则化	否	√	×		√	√
EWC	正则化	否		×	√		√	√
CNP	架构调整	是	×	√		√			√
ENN	元学习	是	×						×
KnowledgeEditor	元学习	是		√				×	√
MEND	元学习	是		√
MALMEN	元学习	是	√					×
ROME	直接编辑	否	√			×		×
MEMIT	直接编辑	否
PMET	直接编辑	否			√	√	√	√
SERAC	架构调整	是			√	√		√
CaliNet	架构调整	是						×
T-Patcher	架构调整	否				√
GRACE	架构调整	是	×		√			√

Table 3: Comparison of the most notable KE methodologies in the literature. Different characteristics are reported for each approach, highlighting main advantages and disadvantages. For all approaches, we report: the category and whether it requires training of an auxiliary model; if it preserves the architecture of the edited model or requires the introduction of new components; whether it needs only the edit pair $(x_{e},y_{e})$ , or requires additional input to perform the edit; if it is able to handle single non-successive edits (SNS), batched non-successive edits (BNS), single successive edits (SSE), and batched successive edits. Finally, we report if it can scale to Large Models (LM), that, following the definition in Zhao et al. (2023), are models with more than 10B parameters.

表 3: 文献中最显著的知识编辑(KE)方法对比。每种方法均标注了不同特征，突出主要优缺点。所有方法均包含以下维度：所属类别、是否需要训练辅助模型；是否保留被编辑模型的原始架构或需引入新组件；仅需编辑对$(x_{e},y_{e})$或需额外输入执行编辑；能否处理单次非连续编辑(SNS)、批量非连续编辑(BNS)、单次连续编辑(SSE)及批量连续编辑。最后标注是否可扩展至大模型(LM)——根据Zhao等(2023)的定义，即参数量超过100亿的模型。

4.1 Regular iz ation Techniques

4.1 正则化技术

Catastrophic forgetting, Kemker et al. (2018), is a well-known phenomenon in literature, fundamentally limiting the flexibility of editing networks once trained or fine-tuned, Lee et al. (2019). Indeed, in the absence of any regular iz ation process, the regular fine-tuning signal has the ability to easily execute a specific edit, albeit with a tendency to over-fit to the provided edit examples. However, this approach lacks in providing generality to the edit and has a negative impact on locality, owing to the absence of a plasticity mechanism, Sinitsin et al. (2020). Similar to continual learning, regular iz ation techniques for knowledge editing aim to modify the standard fine-tuning signal of the target edit to ensure reliability and locality. Therefore, for regular iz ation techniques, KE is not para met rize d and does not require any pre-training, but it is nothing more than gradient descent computed with given edits and some specific regular iz ation terms. While not all of these techniques were originally developed for the specific task of knowledge editing, they have proven to be in some way effective and are commonly used as useful baselines for comparison. Moreover, due to their simplicity, they can easily adapt to work with different types of edits: from single non-successive edits to batches of successive edits. However, depending on the number of layers fine-tuned, they rarely scale to models with large number of parameters such as Large Language Models (LLMs), that, according to Zhao et al. (2023) are models with more than 10B parameters. In these cases, methodologies discussed in the following sections, may

[论文翻译]神经网络知识编辑研究综述

原文地址：https://arxiv.org/pdf/2310.19704