[论文翻译]神经网络知识编辑研究综述


原文地址:https://arxiv.org/pdf/2310.19704


A SURVEY ON KNOWLEDGE EDITING OF NEURAL NETWORKS

神经网络知识编辑研究综述

A PREPRINT

预印本

Vittorio Mazzia Alexa AI, Amazon vmazzia@amazon.com

Vittorio Mazzia Alexa AI, Amazon vmazzia@amazon.com

Alessandro Pedrani Alexa AI, Amazon pedrana@amazon.com

Alessandro Pedrani Alexa AI, Amazon pedrana@amazon.com

Andrea Caciolai Alexa AI, Amazon andccl@amazon.com

Andrea Caciolai Alexa AI, Amazon andccl@amazon.com

Kay Rottmann Alexa AI, Amazon krrottm@amazon.com

Kay Rottmann Alexa AI, Amazon krrottm@amazon.com

Davide Bernardi Alexa AI, Amazon dvdbe@amazon.com

Davide Bernardi Alexa AI, Amazon dvdbe@amazon.com

ABSTRACT

摘要

Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance on a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or up-to-date information has become a common workaround in practical applications. However, the well-known phenomenon of catastrophic forgetting poses a challenge in achieving precise changes in the implicitly memorized knowledge of neural network parameters, often requiring a full model re-training to achieve desired behaviors. That is expensive, unreliable, and incompatible with the current trend of large self-supervised pre-training, making it necessary to find more efficient and effective methods for adapting neural network models to changing data. To address this need, knowledge editing is emerging as a novel area of research that aims to enable reliable, data-efficient, and fast changes to a pre-trained target model, without affecting model behaviors on previously learned tasks. In this survey, we provide a brief review of this recent artificial intelligence field of research. We first introduce the problem of editing neural networks, formalize it in a common framework and differentiate it from more notorious branches of research such as continuous learning. Next, we provide a review of the most relevant knowledge editing approaches and datasets proposed so far, grouping works under four different families: regular iz ation techniques, meta-learning, direct model editing, and architectural strategies. Finally, we outline some intersections with other fields of research and potential directions for future works.

深度神经网络在学术界和工业界正变得日益普遍,在众多领域及相关任务中达到甚至超越人类表现。然而,与人类相似,即便是最大规模的人工神经网络也会犯错,且随着时间推移,曾经正确的预测可能失效。通过添加纠错样本或更新数据来扩充数据集已成为实际应用中的常见解决方案。但众所周知的灾难性遗忘现象,对精确调整神经网络参数中隐式记忆的知识提出了挑战,通常需要完整重新训练模型才能实现预期行为。这种做法成本高昂、可靠性低,且与当前大规模自监督预训练的趋势不相容,因此亟需寻找更高效的方法来使神经网络模型适应动态变化的数据。

为应对这一需求,知识编辑 (knowledge editing) 正成为一个新兴研究领域,其目标是在不影响模型已学习任务表现的前提下,实现对预训练模型的可靠、数据高效且快速的修改。本综述对这一人工智能研究新领域进行了简要梳理:首先阐述神经网络编辑问题,通过统一框架进行形式化定义,并将其与持续学习等更受争议的研究分支区分;随后系统回顾当前最相关的知识编辑方法与数据集,将现有工作归纳为四大类——正则化技术、元学习、直接模型编辑和架构策略;最后探讨该领域与其他研究的交叉点及未来潜在方向。

Keywords Knowledge Editing $\cdot$ Model Editing $\cdot$ Neural Networks Editing $\cdot$ Continual Learning

关键词 知识编辑 $\cdot$ 模型编辑 $\cdot$ 神经网络编辑 $\cdot$ 持续学习

1 Introduction

1 引言

In stark contrast to artificial neural networks (ANN), Cichon and Gan (2015), humans and other animals seem capable of learning and editing their knowledge continuously. Indeed, literature studies indicate that the mammalian brain could prevent catastrophic forgetting Ratcliff (1990) by safeguarding previously acquired knowledge, thereby reducing the plasticity of a proportion of synapses and ensuring their long-term stability Benna and Fusi (2016); Yang et al. (2009); Cichon and Gan (2015). On the contrary, ANNs not only struggle to learn new tasks in a sequential fashion Kirkpatrick et al. (2017), but also edit acquired knowledge on the same data distribution and task Huang et al. (2023). Indeed, unlike conventional knowledge base systems that explicitly store knowledge, neural models implicitly memorize facts and tasks in their parameters, making it difficult to directly access and interpret their computation and memories Voita et al. (2019); Belinkov and Glass (2019). Making even minor modifications can lead to a decrease in performance on previously learnt tasks, or even cause the entire computation to fail due to the well-documented issue of catastrophic forgetting Ratcliff (1990). Therefore, modifying their acquired knowledge is a challenging problem.

与人工神经网络(ANN)形成鲜明对比的是,Cichon和Gan(2015)指出,人类和其他动物似乎能够持续学习和编辑知识。文献研究表明,哺乳动物大脑通过保护已获得的知识来防止灾难性遗忘(Ratcliff 1990),从而降低部分突触的可塑性并确保其长期稳定性(Benna和Fusi 2016;Yang等2009;Cichon和Gan 2015)。相反,人工神经网络不仅难以按顺序学习新任务(Kirkpatrick等2017),甚至无法在同一数据分布和任务上编辑已获得的知识(Huang等2023)。与传统显式存储知识的知识库系统不同,神经模型将事实和任务隐式记忆在参数中,这使得直接访问和解释其计算与记忆变得困难(Voita等2019;Belinkov和Glass 2019)。即使进行微小修改,也可能导致先前学习任务的性能下降,甚至因众所周知的灾难性遗忘问题(Ratcliff 1990)而导致整个计算失败。因此,修改已获得知识是一个具有挑战性的问题。

Just as humans, ANNs make mistakes and as we trust them with increasingly important decisions, the cost of such mistakes grows ever higher Sinitsin et al. (2020). Therefore, given that mistakes are inevitable, it is crucial for deep learning practitioners to possess the ability to adjust model behaviors by promptly correcting errors as they surface. Currently, practical applications employing deep learning techniques have been relying on different workaround s to tackle this problem. In particular, a full re-training using datasets augmented with samples that account for the mistakes or up-to-date information is a common choice Sinitsin et al. (2020). The endeavor needed for fine-tuning atop pre-trained models Sarzynska-Wawer et al. (2021); Devlin et al. (2018); Oquab et al. (2023); Weiss et al. (2016) is frequently substantiated by the diminished dataset size and computational resources needed. On the other hand, this is not always true and manually curated, deterministic mechanisms that overrule model predictions on problematic samples can be the preferred choice Sinitsin et al. (2020). However, while being simple, this second approach is fully localized and not robust to factor of variations of the input space (e.g., different viewpoint of the same object in computer vision or paraphrasing in natural language processing tasks). Furthermore, while these workaround s may provide a temporary solution, they can be expensive, unreliable, and incompatible with the current trend of large neural models Zhao et al. (2023); Chen et al. (2022). Indeed, these large networks are typically deployed as static artifacts, whose behavior is difficult to modify during deployment without a costly full re-training Lazaridou et al. (2021). Thus, in all those cases, in order to adapt to changes in the environment, or to address instances of under fitting or over fitting in the original training data, it is desirable to have the ability to quickly make targeted updates to the model’s behavior after it has been deployed De Cao et al. (2021).

正如人类一样,人工神经网络 (ANN) 也会犯错,而随着我们让其参与越来越重要的决策,这类错误的代价正变得愈发高昂 [Sinitsin et al., 2020]。因此,既然错误无法避免,深度学习从业者必须具备通过及时修正错误来调整模型行为的能力。当前,采用深度学习技术的实际应用主要依赖不同变通方案来解决该问题。其中常见做法是使用包含错误样本或最新信息的增强数据集进行完整重新训练 [Sinitsin et al., 2020]。在预训练模型基础上进行微调 [Sarzynska-Wawer et al., 2021; Devlin et al., 2018; Oquab et al., 2023; Weiss et al., 2016] 所需的工作量,通常因数据集规模和计算资源需求降低而获得合理性。但另一方面,这种方式并非总是适用,针对问题样本手动构建确定性机制来覆盖模型预测可能是更优选择 [Sinitsin et al., 2020]。然而,尽管第二种方法简单直接,但它完全局限于局部且对输入空间的变化因素(如计算机视觉中同一物体的不同视角,或自然语言处理任务中的文本改写)缺乏鲁棒性。此外,这些变通方案虽能提供临时解决方案,但可能成本高昂、不可靠,且与当前大模型趋势 [Zhao et al., 2023; Chen et al., 2022] 不兼容。事实上,这些大型网络通常作为静态成品部署,其行为在部署后难以修改,除非付出昂贵代价进行完整重新训练 [Lazaridou et al., 2021]。因此,无论是为适应环境变化,还是解决原始训练数据中的欠拟合或过拟合实例,都亟需具备在模型部署后快速进行针对性行为更新的能力 [De Cao et al., 2021]。

To address this need, knowledge editing methods have been recently proposed to efficiently change a model’s behaviors without affecting previous performance on the same task Sinitsin et al. (2020). These approaches take inspiration from several fields of artificial intelligence research and range from simple fine-tuning with regular iz ation methods Zhu et al. (2020) to meta-learning techniques that adopt hyper network models to learn how to update parameters De Cao et al. (2021); Mitchell et al. (2022a). Due to its recent appearance in the literature, Sinitsin et al. (2020), the field still lacks accordance in the taxonomy, naming convention, datasets, and target applications. Indeed, most of the works have been motivated by large language models (LLMs), Zhao et al. (2023); Brown et al. (2020); Soltan et al. (2022), focusing mostly on tasks such as question answering (QA), Levy et al. (2017), machine translation (MT) De Cao et al. (2021), modifying knowledge graph embeddings Cheng et al. (2024), or even simpler NLP problems Thorne et al. (2018). However, it is also possible to find applications of knowledge editing to computer vision problems Sinitsin et al. (2020). Furthermore, its potential scope is poised to expand across diverse machine learning domains, encompassing areas such as medicine Shehab et al. (2022) robotics Soori et al. (2023), and precision agriculture Sharma et al. (2020) in the future.

为满足这一需求,近期提出的知识编辑方法能在不影响模型原有任务表现的前提下高效修改其行为 (Sinitsin et al., 2020)。这些方法汲取了人工智能多个研究领域的灵感,涵盖从采用正则化方法的简单微调 (Zhu et al., 2020) 到利用超网络模型学习参数更新策略的元学习技术 (De Cao et al., 2021; Mitchell et al., 2022a)。由于该领域在文献中刚刚兴起 (Sinitsin et al., 2020),目前在分类体系、命名规范、数据集和目标应用方面尚未形成统一标准。事实上,大多数研究都受大语言模型 (LLM) (Zhao et al., 2023; Brown et al., 2020; Soltan et al., 2022) 推动,主要聚焦于问答系统 (QA) (Levy et al., 2017)、机器翻译 (MT) (De Cao et al., 2021)、知识图谱嵌入修改 (Cheng et al., 2024) 乃至更基础的NLP问题 (Thorne et al., 2018)。但知识编辑在计算机视觉领域也有应用实例 (Sinitsin et al., 2020)。未来其应用范围有望扩展到医疗 (Shehab et al., 2022)、机器人 (Soori et al., 2023) 和精准农业 (Sharma et al., 2020) 等多元机器学习领域。

1.1 Organization of the survey

1.1 综述结构

The objective of this survey is to provide a comprehensive review of existing literature on knowledge editing, formalizing the task and providing a categorization of the approaches into distinct families. To our knowledge, this is the first work to undertake such an effort, and we hope that it will facilitate future research in this increasingly important area of study. Indeed, the need for more formalization and organizational structure can already be seen by a recent study Yao et al. (2023), which attempts to benchmark and compare some knowledge editing methodologies specifically for LLMs editing.

本次综述旨在对知识编辑领域的现有文献进行全面梳理,通过形式化任务定义并将现有方法划分为不同类别。据我们所知,这是首个系统性的整理工作,我们希望这能推动这个日益重要的研究领域的未来发展。事实上,Yao等人 (2023) 近期针对大语言模型编辑的基准研究已反映出该领域对更规范体系架构的迫切需求。

The rest of the survey is organized as follows. 2 introduces the problem of knowledge editing, using previous works to formalize it under a common definition. 3 explores the tasks and datasets that are most commonly considered when solving the knowledge editing problem. 4 provides an overview of the most relevant knowledge editing approaches in the literature, identifying four distinct families: regular iz ation techniques, meta-learning, direct model editing, and architectural strategies. Finally, 5 concludes the survey by discussing some intersection with knowledge editing and other fields of research, and outlining some possible future risks and directions.

本综述的其余部分结构如下。第2章介绍知识编辑 (knowledge editing) 问题,通过先前研究给出通用定义的形式化表述。第3章探讨解决知识编辑问题最常考虑的任务和数据集。第4章概述文献中最相关的知识编辑方法,将其归纳为四大类:正则化技术 (regularization techniques)、元学习 (meta-learning)、直接模型编辑 (direct model editing) 和架构策略 (architectural strategies)。最后,第5章总结全文,讨论知识编辑与其他研究领域的交叉点,并展望未来潜在风险与发展方向。

2 Overview

2 概述

This section begins by presenting an introduction to the concept of knowledge editing, which is also referred to as model editing in the literature. First, we review various definitions and interpretations of knowledge editing proposed by different works. Next, we establish a common definition of knowledge editing that generalizes to all existing works in the field.

本节首先介绍知识编辑(Knowledge Editing)的概念,该概念在文献中也被称为模型编辑(Model Editing)。我们先回顾不同研究提出的知识编辑定义与解释,随后建立一个能涵盖该领域所有现有研究的通用定义。


Figure 1: The knowledge editing problem has been firstly proposed as the task of modifying a model based on a set of individual pairs of edits, in a non-sequential manner (a), Sinitsin et al. (2020). Successive works extended the problem to batch of edits (b), sequential individual edits (c), and sequential batch of edits (d). Evaluation metrics are similar to all cases, as described in Section 2.5.

图 1: 知识编辑问题最初被提出时,是以非连续方式基于一组独立编辑对来修改模型的任务 (a),如 Sinitsin 等人 (2020) 所述。后续研究将该问题扩展到批量编辑 (b)、连续独立编辑 (c) 和连续批量编辑 (d)。如第 2.5 节所述,所有情况的评估指标均相似。

2.1 Background

2.1 背景

The concept of knowledge editing was first introduced in Sinitsin et al. (2020), which formalizes it as the task of correcting a model’s mistake on a specific sample while preserving the model’s overall behavior, akin to continuous learning. Indeed, as specified by the authors, “The problem of efficient neural network patching differs from continual learning, (...) [because] is not sequential in nature. However, correction (...) of mislabeled samples must not affect its behavior on other samples, which is close to overcoming [the] catastrophic forgetting task.”. Therefore, Sinitsin et al. (2020) define the problem as performing individual edits reliably and efficiently, not sequentially, and on the same task learned by the target model without affecting its overall behavior (i.e., being local to the edit). Concurrently, authors of Zhu et al. (2020) worked specifically on the task of modifying memories in Transformer models Vaswani et al. (2017), providing their own definition of knowledge editing. They expand the scope of the problem to a subset of knowledge, i.e., a batch of edits. Similarly, several other studies have also formalized the problem of model editing, following similar steps to Zhu et al. (2020). For instance, works by Mitchell et al. (2022a), Meng et al. (2022a), and Mitchell et al. (2022b) have defined the task as the process of performing individual or batch edits on a target model trained for a specific task. These studies emphasize the importance of ensuring that edits are resilient to factors of variation, i.e., that they are general iz able. While injecting individual changes is already a challenging and interesting task for the scientific community, multiple simultaneous general model edits represent a more realistic scenario that deserves further exploration.

知识编辑的概念最早由Sinitsin等人(2020)提出,他们将其形式化为在保持模型整体行为的同时修正模型在特定样本上的错误的任务,类似于持续学习。正如作者所述:"高效的神经网络修补问题不同于持续学习(...)[因为]本质上不是顺序性的。然而对错误标记样本的修正(...)不得影响其在其他样本上的行为,这与克服[灾难性遗忘]任务相近。"因此,Sinitsin等人(2020)将该问题定义为在不影响模型整体行为(即编辑的局部性)的前提下,对目标模型已学习的同一任务进行可靠且高效的非顺序性单独编辑。与此同时,Zhu等人(2020)的研究团队专门针对Transformer模型(Vaswani等人,2017)中的记忆修改任务开展工作,提出了他们自己的知识编辑定义。他们将问题范围扩展到知识子集,即批量编辑。类似地,其他几项研究也遵循Zhu等人(2020)的步骤形式化了模型编辑问题。例如Mitchell等人(2022a)、Meng等人(2022a)和Mitchell等人(2022b)的工作将该任务定义为对针对特定任务训练的目标模型进行单独或批量编辑的过程。这些研究强调了确保编辑对变异因素具有弹性(即可泛化性)的重要性。虽然对科学界而言注入单个变更已是具有挑战性的有趣任务,但多重同步的通用模型编辑代表了更现实的场景,值得进一步探索。

More recently, Hartvigsen et al. (2022) and Huang et al. (2023) argued that the conventional "one-mistake-fixing scenario" does not accurately reflect the complexity of real-world knowledge editing challenges. As such, they proposed to extend the scope of knowledge editing to a sequential problem to facilitate the development of more practical editing methods. While their proposal only accounts for subsequent individual edits, considering multiple simultaneous and sequential edits represents a more general case where the number of edits varies at each step. Importantly, in the case of iterative model editing, it is desirable to respect not only the new editing task constraints but also the previous ones, which is closely related to the concept of continual learning. Nevertheless, it is crucial to highlight that while the new definition of knowledge editing acknowledges a sequential process, differing from continuous learning, its scope remains limited to the modification of knowledge of the initially learned task by the model. On the contrary, continuous learning operates without such constraints, researching for methodologies that allow the model to expand to new tasks and adapt dynamically to entirely new information.

最近,Hartvigsen等人(2022) 和 Huang等人(2023) 指出传统的"单次纠错场景"无法准确反映现实世界知识编辑挑战的复杂性。为此,他们提出将知识编辑范围扩展为序列问题,以促进更实用编辑方法的发展。虽然他们的方案仅考虑后续的单个编辑,但处理多组同步和顺序编辑才是更普遍的情况——其中每一步的编辑数量都可能变化。值得注意的是,在迭代式模型编辑中,不仅需要满足新编辑任务的约束,还需兼顾先前约束,这与持续学习(continual learning)概念密切相关。但必须强调:尽管新版知识编辑定义承认了序列过程(与持续学习不同),其范围仍局限于模型对初始学习任务知识的修改;而持续学习则不受此限,致力于研究让模型拓展新任务、动态适应全新信息的方法论。

2.2 The knowledge editing problem

2.2 知识编辑问题

To introduce the problem of knowledge editing, it is possible to leverage the terminology used to define a “well-posed learning problem” Mitchell et al. (2007): an algorithm is said to learn from experience $E$ with respect to some class of tasks $T$ and performance measure $P$ , if its performance at tasks in $T$ , as measured by $P$ , improves with experience $E$ . Then, we can say that knowledge editing is the problem of modifying the algorithm such that, given a representative set $S$ of instances of tasks in $T$ , and a subset $S_{e}\subseteq S$ of edit instances, its performance on $S_{e}$ improves as measured by $P$ , while performance on all the other instances $S\setminus S_{e}$ remains unchanged.

为引入知识编辑问题,可以借鉴Mitchell等人(2007)定义"适定学习问题"的术语:若某算法在任务类$T$中的表现(通过性能度量$P$评估)能随经验$E$提升,则称该算法能从经验$E$中学习。由此可将知识编辑定义为:给定任务类$T$的实例代表集$S$及其编辑子集$S_{e}\subseteq S$,修改算法使其在$S_{e}$上的性能(通过$P$衡量)提升,同时保持$S\setminus S_{e}$上所有实例的表现不变。

More practically, we can define the problem of knowledge editing as “the task of modifying a model based on a set of individual or batch edits, $S_{e}$ , pertaining to the same task known by the model, either in a sequential or non-sequential manner. The objective is to update the model’s knowledge representation without significantly altering its original behavior or performance over $S$ and being robust to different factor of variations of the edits.”

更实际地说,我们可以将知识编辑问题定义为"基于一组与模型已知任务相关的个体或批量编辑 $S_{e}$ (以顺序或非顺序方式)修改模型的任务。其目标是在不显著改变模型在 $S$ 上的原始行为或性能的前提下更新模型的知识表示,并对编辑的不同变化因素保持鲁棒性。"

More formally, let $\mathbb{X}$ , Y be an input and output space, respectively. Let $f_{0}$ be a function that maps an input $x\in\mathbb{X}$ to an output $y\in\mathbb{Y}$ , para met rize d by $\theta_{0}\in\Theta$ , then

更正式地说,设$\mathbb{X}$和$\mathbb{Y}$分别为输入和输出空间。设$f_{0}$是一个将输入$x\in\mathbb{X}$映射到输出$y\in\mathbb{Y}$的函数,由参数$\theta_{0}\in\Theta$参数化。

$$
f_{0}\in\mathbb{F}:=(\mathbb{X}\times\Theta)^{\mathbb{Y}}
$$

$$
f_{0}\in\mathbb{F}:=(\mathbb{X}\times\Theta)^{\mathbb{Y}}
$$

We use the subscript zero to indicate that this is the starting model, i.e., the model we want to edit. We define an edit pair as an input-output pair $(x_{e},y_{e})\in\mathbb{X}\times\mathbb{Y}$ , such that $f_{0}(x_{e})\ne y_{e}$ . Then, in its simplest form, given an individual edit example pair $(x_{e},y_{e})$ , a knowledge editing (KE) methodology can be defined as follows

我们用下标零表示这是初始模型,即待编辑的模型。将编辑对定义为一个输入-输出对 $(x_{e},y_{e})\in\mathbb{X}\times\mathbb{Y}$ ,使得 $f_{0}(x_{e})\ne y_{e}$ 。那么在最简形式下,给定单个编辑示例对 $(x_{e},y_{e})$ ,知识编辑(Knowledge Editing, KE)方法可定义如下

$$
\mathrm{KE}:\mathbb{F}\times\mathbb{X}\times\mathbb{Y}\rightarrow\mathbb{F}
$$

$$
\mathrm{KE}:\mathbb{F}\times\mathbb{X}\times\mathbb{Y}\rightarrow\mathbb{F}
$$

i.e. a function that takes in input the starting model $f_{0}$ and the edit pair $(x_{e},y_{e})$ to produce an edited model $f_{e}$ . If the edited model is such that $f_{e}(x_{e})=y_{e}$ , then we say that the edit was successful. A KE approach can realize the transformation from $f_{0}$ to $f_{e}$ in different ways, and we identify four families of possible realization s: regular iz ation, meta-learning, direct model editing and architectural strategies. We deep dive into more details into each of these families in 4.

即一个以初始模型 $f_{0}$ 和编辑对 $(x_{e},y_{e})$ 作为输入,生成编辑后模型 $f_{e}$ 的函数。若编辑后的模型满足 $f_{e}(x_{e})=y_{e}$ ,则称该编辑成功。知识编辑(KE)方法可通过不同方式实现从 $f_{0}$ 到 $f_{e}$ 的转换,我们将其归纳为四种实现路径:正则化(regularization)、元学习(meta-learning)、直接模型编辑(direct model editing)和架构策略(architectural strategies)。我们将在第4节详细探讨每种路径的具体实现。

2.3 Taxonomy of edits

2.3 编辑分类

Often, it is interesting to edit a model applying multiple edits at once, a sequence of edits, or sequences of batches of edits.

通常,同时应用多次编辑、一系列编辑或批量编辑序列来修改模型会很有趣。

The definition provided in section 2.2 has been given for the simplest case, that we call a single non-successive edit: we only want to change the model for one edit pair. Conversely, for multiple successive edits we can formally define a list of edit requests as:

2.2 节中给出的定义针对最简单的情况,我们称之为单次非连续编辑:只需针对一个编辑对修改模型。相反,对于多次连续编辑,我们可以将编辑请求列表正式定义为:

$$
\mathcal{E}={(x_{e},y_{e})^{(i)}\mathrm{ s.t. }\forall i,j:x_{e}^{i}=x_{e}^{j}\Rightarrow y_{e}^{i}=y_{e}^{j}}
$$

$$
\mathcal{E}={(x_{e},y_{e})^{(i)}\mathrm{ s.t. }\forall i,j:x_{e}^{i}=x_{e}^{j}\Rightarrow y_{e}^{i}=y_{e}^{j}}
$$

where the logical constraint ensures that there are no conflicting requests, as suggested by Meng et al. (2022a). Individual edits can also be grouped together and form $N$ batches of successive edits each with $B_{e}^{(i)}$ edit pairs, such as

逻辑约束确保不存在冲突请求,如Meng等人 (2022a) 所建议。单个编辑也可分组形成$N$个连续编辑批次,每批包含$B_{e}^{(i)}$个编辑对,例如

$$
\mathcal{B}_ {e}^{(i)}={(x_{e},y_{e})^{0},\cdot\cdot\cdot,(x_{e},y_{e})^{B}}^{(i)}\mathrm{ s.t. }\mathcal{E}=\bigcup_{i=1}^{N}\mathcal{B}_{e}^{(i)}
$$

$$
\mathcal{B}_ {e}^{(i)}={(x_{e},y_{e})^{0},\cdot\cdot\cdot,(x_{e},y_{e})^{B}}^{(i)}\mathrm{ s.t. }\mathcal{E}=\bigcup_{i=1}^{N}\mathcal{B}_{e}^{(i)}
$$

The difference between successive individual or batch of edits is that some KE methodologies can ingest an entire $\boldsymbol{B}_ {e}^{(i)}$ and produce the resulting $f_{e}$ implementing all the given edits, while other methodologies can only consider one individual sample in sequence at a time. In both cases (a sequence of individual edits is trivially a sequence of single-item batch edits), successive edits assume to work with a starting model $f_{t-1}$ and apply the $t$ -th change as

连续单次或批量编辑的区别在于,某些知识编辑(KE)方法能够一次性处理整个$\boldsymbol{B}_ {e}^{(i)}$并生成实现所有给定编辑的$f_{e}$,而其他方法每次只能依次处理单个样本。在这两种情况下(一系列单次编辑本质上就是一系列单条目批量编辑),连续编辑假设从初始模型$f_{t-1}$开始工作,并应用第$t$次变更作为

$$
f_{t}=\mathrm{KE}(f_{t-1},B_{e}^{(i)})
$$

$$
f_{t}=\mathrm{KE}(f_{t-1},B_{e}^{(i)})
$$

proceeding iterative ly, using $f_{t}$ as a starting model for the next edit. Figure 1 summarizes the four types of edits. Finally, as for individual edits, after every change, $f_{e}$ should not only satisfy $f_{e}(x_{e})=y_{e}$ , but a series of other properties as discussed in the next section.

逐步迭代进行,使用 $f_{t}$ 作为下一个编辑的初始模型。图 1: 总结了四种编辑类型。最后,对于单个编辑而言,每次更改后 $f_{e}$ 不仅需要满足 $f_{e}(x_{e})=y_{e}$,还应满足下一节讨论的一系列其他属性。

2.4 Editing properties

2.4 编辑属性

Based on the specific task learned by function $f$ , various properties can be specifically designed. However, at a fundamental level, following Sinitsin et al. (2020); Huang et al. (2023), knowledge editing should aim at satisfying four properties, that we define below.

基于函数 $f$ 学习的具体任务,可以专门设计各种属性。然而,在基础层面上,遵循 Sinitsin et al. (2020) 和 Huang et al. (2023) 的研究,知识编辑应致力于满足以下四个属性。

Property 1 - Reliability: Given an edit pair $(x_{e},y_{e})$ , the edited model $f_{e}$ should output the desired edit:

属性1 - 可靠性:给定一个编辑对$(x_{e},y_{e})$,编辑后的模型$f_{e}$应输出期望的编辑结果:

$$
f_{e}(x_{e})=y_{e}
$$

$$
f_{e}(x_{e})=y_{e}
$$

Property 2 - Generality: The edited model $f_{e}$ should be able to generalize to similar examples to the edit pair. This can be formalized by defining an equivalence neighborhood $N(x_{e})\subset\mathbb{X}$ and requiring that the edited model $f_{e}$ satisfies:

属性2 - 通用性:编辑后的模型$f_{e}$应能泛化到与编辑对相似的示例。这可以通过定义一个等价邻域$N(x_{e})\subset\mathbb{X}$并要求编辑后的模型$f_{e}$满足以下条件来形式化:

$$
f_{e}(\tilde{x}_ {e})=y_{e}\qquad\forall x_{e}\in N(x_{e})
$$

$$
f_{e}(\tilde{x}_ {e})=y_{e}\qquad\forall x_{e}\in N(x_{e})
$$

Property 3 - Locality: The edited model $f_{e}$ should not alter the output of examples that are not similar to the edit pair. This can be formalized by defining a locality set

属性3 - 局部性:编辑后的模型$f_{e}$不应改变与编辑对不相似样本的输出。这可以通过定义局部性集合来形式化。

$$
L(x_{e}) = {(x_{\mathrm{loc}}, y_{\mathrm{loc}}) \in \mathbb{X} \times \mathbb{Y} \mid x_{\mathrm{loc}} \notin N(x_{e}) \land f_{0}(x_{\mathrm{loc}}) = y_{\mathrm{loc}}}
$$

$$
L(x_{e}) = {(x_{\mathrm{loc}}, y_{\mathrm{loc}}) \in \mathbb{X} \times \mathbb{Y} \mid x_{\mathrm{loc}} \notin N(x_{e}) \land f_{0}(x_{\mathrm{loc}}) = y_{\mathrm{loc}}}
$$

and require that the edited model $f_{e}$ satisfies:

并要求编辑后的模型 $f_{e}$ 满足:

$$
f_{e}(x_{l o c})=y_{l o c}\qquad\forall(x_{l o c},y_{l o c})\in L(x_{e})
$$

$$
f_{e}(x_{l o c})=y_{l o c}\qquad\forall(x_{l o c},y_{l o c})\in L(x_{e})
$$

Property 4 - Efficiency: The proposed knowledge editing methodology should aim to achieve efficient editing, characterized by low compute and space complexity.

属性4 - 效率性:所提出的知识编辑方法应追求高效编辑,具备低计算复杂度和低空间复杂度的特性。

It is worth noting that some works in the literature have defined these same properties, or a subset of them, with different names and notations. Specifically, reliability may also be referred to as efficacy, locality as specificity, and generality as paraphrase, Meng et al. (2022b); De Cao et al. (2021). Moreover, as previously mentioned, depending on the specific field of application, other properties and related evaluation metrics can arise, such as fluency and consistency Meng et al. (2022b) when editing Language Models. Finally, the last property, namely efficiency, tends to be disregarded in academic literature. However, it is one of the main reasons KE is appealing over a simpler re-training of the base model. Furthermore, it plays a pivotal role in performing comparative analyses with baseline models and showcasing the ability of a particular methodology to modify a neural network with a substantial number of parameters. We encourage future works to consider (at least) all four properties when measuring the effectiveness of a proposed KE methodology.

值得注意的是,文献中的部分研究对这些相同属性(或其子集)采用了不同的命名和符号表示。具体而言,可靠性(reliability)可能被称为效能(efficacy),局部性(locality)可能被称为特异性(specificity),而通用性(generality)可能被称为改写(paraphrase) [Meng et al., 2022b; De Cao et al., 2021]。此外,如前所述,根据具体应用领域的不同,还可能衍生出其他属性及相关评估指标,例如在编辑语言模型时涉及的流畅性(fluency)和一致性(consistency) [Meng et al., 2022b]。最后一项属性——效率(efficiency)往往在学术文献中被忽视,但它正是知识编辑(KE)相较于简单的基础模型重训练更具吸引力的主要原因之一。该属性在进行基线模型对比分析时具有关键作用,能展示特定方法论修改海量参数神经网络的能力。我们建议未来研究在评估知识编辑方法论有效性时至少考虑这四项属性。

2.5 Evaluation metrics

2.5 评估指标

Given a knowledge editing algorithm KE, a list of edit requests $\mathcal{E}$ , a starting model $f_{0}$ and a test set $\mathcal{D}_{t e s t}$ , it is possible to measure the extent to which the properties above hold. We consider working with $N$ batches of successive edits, each comprised of $B$ individual edits, without loss of generality (as mentioned above, if $N=1$ , we have successive individual edits). Moreover, if the KE methodology is unable to process all $B$ edits concurrently, it can apply each edit individually in a non-sequential manner.

给定一个知识编辑算法 KE、一个编辑请求列表 $\mathcal{E}$、一个初始模型 $f_{0}$ 和一个测试集 $\mathcal{D}_{t e s t}$,就可以衡量上述属性在多大程度上成立。在不失一般性的情况下,我们考虑对 $N$ 批连续编辑进行处理,每批包含 $B$ 个单独编辑(如上所述,如果 $N=1$,则表示连续单独编辑)。此外,如果 KE 方法无法同时处理所有 $B$ 个编辑,则可以以非顺序方式逐个应用每个编辑。

Following Huang et al. (2023), we represent with $I$ the indicator function, and define the following metrics. Success Rate (SR): It is used to evaluate reliability, and it is simply the proportion of edits for which the methodology succeeds in changing the knowledge of a starting model $f_{t}$ .

遵循Huang等人(2023)的研究,我们用$I$表示指示函数,并定义以下指标。成功率(SR):用于评估可靠性,即方法成功改变初始模型$f_{t}$知识的编辑比例。

$$
\begin{array}{r}{\mathrm{SR}=\displaystyle\frac{1}{N}\frac{1}{B}\sum_{n=1}^{N}\sum_{b=1}^{B}I(f_{n,B}(x_{e;n,b})=y_{e;n,b})}\ {\mathrm{s.t.~}\bigcup_{n=1}^{N}\bigcup_{b=1}^{B}(x_{e},y_{e})_{n,b}=\mathcal{E}}\end{array}
$$

$$
\begin{array}{r}{\mathrm{SR}=\displaystyle\frac{1}{N}\frac{1}{B}\sum_{n=1}^{N}\sum_{b=1}^{B}I(f_{n,B}(x_{e;n,b})=y_{e;n,b})}\ {\mathrm{s.t.~}\bigcup_{n=1}^{N}\bigcup_{b=1}^{B}(x_{e},y_{e})_{n,b}=\mathcal{E}}\end{array}
$$

In the case of non-sequential individual edits, $f_{n,B}=f_{n,b}$ . Moreover, Eq. 10 provides an overall value after $N$ uccessive edits, but it can be of interest to measure SR every $n$ edits, tracking changes over the sequence.

对于非连续的单次编辑,$f_{n,B}=f_{n,b}$。此外,式10给出了$N$次连续编辑后的整体值,但也可以每隔$n$次编辑测量一次SR (Super-Resolution) ,以跟踪序列中的变化。

Generalization Rate (GR): It is used to evaluate generality, testing the post-edit model $f_{e}$ , on the equivalence neighborhood set $N(x_{e;n,b},y_{e;n,b})$ , where $(x_{e;n,b},y_{e;n,b})$ is the $n$ -th batch, $b$ -th edit pair. GR can be written as,

泛化率 (GR): 用于评估泛化性,在等价邻域集 $N(x_{e;n,b},y_{e;n,b})$ 上测试编辑后模型 $f_{e}$,其中 $(x_{e;n,b},y_{e;n,b})$ 是第 $n$ 批次第 $b$ 个编辑对。GR可表示为:

$$
\begin{array}{l}{\displaystyle\mathrm{GR}=\frac{1}{N}\frac{1}{B}\frac{1}{N_{b}}\sum_{n=1}^{N}\sum_{b=1}^{B}\sum_{i=1}^{N_{b}}I(f_{n,B}(\tilde{x}_ {e;n,b,i})=\tilde{y}_ {e;n,b,i})}\ {\mathrm{s.t.}~\forall n,b\bigcup_{i=1}^{N}(\tilde{x}_ {e},\tilde{y}_ {e})_ {n,b,i}\subseteq N(x_{e},y_{e})_{n,b}}\end{array}
$$

$$
\begin{array}{l}{\displaystyle\mathrm{GR}=\frac{1}{N}\frac{1}{B}\frac{1}{N_{b}}\sum_{n=1}^{N}\sum_{b=1}^{B}\sum_{i=1}^{N_{b}}I(f_{n,B}(\tilde{x}_ {e;n,b,i})=\tilde{y}_ {e;n,b,i})}\ {\mathrm{s.t.}~\forall n,b\bigcup_{i=1}^{N}(\tilde{x}_ {e},\tilde{y}_ {e})_ {n,b,i}\subseteq N(x_{e},y_{e})_{n,b}}\end{array}
$$

where $N_{b}$ is the number of equivalent samples of the $b$ -th edit pair. Following Mitchell et al. (2022a), we can also define Edit Success (ES) to summarize both SR and GR. It can be computed as the average accuracy of the edited model $f_{e}$ on the edit input(s), as well as inputs drawn from the equivalence neighborhood(s), that is,

其中 $N_{b}$ 是第 $b$ 个编辑对的等效样本数。根据 Mitchell 等人 (2022a) 的研究,我们还可以定义编辑成功率 (Edit Success, ES) 来综合衡量 SR 和 GR。其计算方式为编辑后模型 $f_{e}$ 在编辑输入及其等效邻域采样输入上的平均准确率,即

$$
\begin{array}{r}{\displaystyle\mathrm{ES}=\frac{1}{N}\frac{1}{B}\sum_{n=1}^{N}\sum_{b=1}^{B}\biggl(I\bigl(f_{n,B}(x_{e;n,b})=y_{e;n,b}\bigr)}\ {+\sum_{i=1}^{N_{b}}\frac{1}{N_{b}}I\bigl(f_{n,B}(x_{e;n,b,i})=y_{e;n,b,i}\bigr)\biggr)}\end{array}
$$

$$
\begin{array}{r}{\displaystyle\mathrm{ES}=\frac{1}{N}\frac{1}{B}\sum_{n=1}^{N}\sum_{b=1}^{B}\biggl(I\bigl(f_{n,B}(x_{e;n,b})=y_{e;n,b}\bigr)}\ {+\sum_{i=1}^{N_{b}}\frac{1}{N_{b}}I\bigl(f_{n,B}(x_{e;n,b,i})=y_{e;n,b,i}\bigr)\biggr)}\end{array}
$$

where the same conditions for SR and GR hold and have been omitted for brevity.

在相同条件下,SR和GR成立,为简洁起见已省略。

Drawdown (DD): It is used to evaluate locality, and it is defined as the performance degradation of the edited model over $\mathcal{D}_ {t e s t}$ . It is computed using the final edited model, $f_{N}$ , that in case of successive edits is the result of $N$ steps.

回撤 (DD): 用于评估局部性, 定义为编辑后模型在 $\mathcal{D}_ {t e s t}$ 上的性能下降。该指标通过最终编辑模型 $f_{N}$ 计算得出, 在连续编辑场景下该模型是经过 $N$ 次编辑后的结果。

$$
\mathrm{DD}=1-\frac{\sum_{(x,y)\in\mathcal{D}_ {t e s t}}I(f_{N}(x)=y)}{\sum_{(x,y)\in\mathcal{D}_ {t e s t}}I(f_{0}(x)=y)}
$$

$$
\mathrm{DD}=1-\frac{\sum_{(x,y)\in\mathcal{D}_ {t e s t}}I(f_{N}(x)=y)}{\sum_{(x,y)\in\mathcal{D}_ {t e s t}}I(f_{0}(x)=y)}
$$

Finally, as suggested by Huang et al. (2023) in the case of multiple successive edits, it is also important to evaluate SR and GR using the final model $f_{N}$ , in order to assess how past edits are retained. Therefore, it is possible to define three additional metrics, Success Retain Rate (SRR), Generalization Retain Rate (GRR), and Edit Success Retain (ESR), simply using in Eq. 10 and 11, $f_{N}$ instead of $f_{n,B}$ .

最后,正如Huang等人 (2023) 所建议的,在多次连续编辑的情况下,使用最终模型 $f_{N}$ 来评估SR和GR也很重要,以便评估过去编辑的保留情况。因此,可以定义三个额外的指标:成功保留率 (SRR)、泛化保留率 (GRR) 和编辑成功保留 (ESR),只需在公式10和11中使用 $f_{N}$ 代替 $f_{n,B}$。

3 Tasks and Datasets

3 任务与数据集

The formalization of the knowledge editing problem provided in the previous section is general, and many applications of knowledge editing to different tasks encompassing various fields can be formulated within that framework. The brief, but rich history of the field has so far seen applications mainly to two broad fields: Computer Vision and Natural Language Processing. Indeed, Sinitsin et al. (2020) provides experimental results on image classification and machine translation, and almost all the works that come after (and even before Garnelo et al. (2018); Kirkpatrick et al. (2017)) demonstrate the effectiveness of their proposed approaches in one or more applications in these two fields.

前一节提供的知识编辑问题形式化具有普遍性,在该框架下可以制定知识编辑应用于不同领域各种任务的多类场景。该领域虽发展历史短暂但成果丰硕,目前主要应用于两大方向:计算机视觉 (Computer Vision) 与自然语言处理 (Natural Language Processing)。Sinitsin等 (2020) 通过图像分类和机器翻译任务验证了其有效性,此后几乎所有研究 (甚至早于Garnelo等 (2018) 和Kirkpatrick等 (2017) 的工作) 都在这两个领域的一个或多个应用中证明了所提方法的有效性。

Even though the knowledge editing framework can be defined independently of the target domain and task, each specific application has its unique challenges and intricacies, which we explore in this section. 3.1 covers the most common tasks in the Computer Vision domain, as well as the datasets on which the tasks are usually addressed, and how the knowledge editing problem can be instantiated in this context. Section 3.2 provides a similar overview for applications to the Natural Language Processing domain. Finally, 3.3 describes tasks and datasets that do not strictly fit in any of the two domains above.

尽管知识编辑框架可以独立于目标领域和任务进行定义,但每个具体应用都有其独特的挑战和复杂性,我们将在本节探讨这些内容。3.1节涵盖计算机视觉领域最常见的任务、通常用于解决这些任务的数据集,以及在此背景下如何实例化知识编辑问题。3.2节为自然语言处理领域的应用提供了类似的概述。最后,3.3节描述了不完全属于上述两个领域的任务和数据集。

3.1 Computer Vision

3.1 计算机视觉

Computer Vision is a broad field with a long history, which generally attempts to extract meaningful representations from visual media to derive an understanding of the represented scenes Szeliski (2022). Over the last years, deep learning methods have been shown to outperform previous state-of-the-art techniques in several applications in the field Voulodimos et al. (2018), and while more “traditional” techniques are still relevant for some applications, neural networks-based approaches have become the de facto standard for many others O’Mahony et al. (2020). Due to the importance and breadth of the field, and the relevance of neural networks therein, knowledge editing literature has found fertile grounds in Computer Vision, and has so far gravitated towards two primary tasks: Image Classification and Image Completion. A number of datasets are customarily used to test approaches to solve these tasks. They vary in terms of number of examples, classes and channels, as well as resolution of the representative images; Table 1 provides an overview of the most commonly used ones.

计算机视觉是一个历史悠久且广泛的领域,其核心目标是从视觉媒体中提取有意义的表征,进而理解所呈现的场景 Szeliski (2022)。近年来,深度学习方法在该领域的多个应用中展现出超越传统技术的性能 Voulodimos et al. (2018),虽然某些场景仍适用"传统"技术,但基于神经网络的方法已成为多数任务的实际标准 O’Mahony et al. (2020)。鉴于该领域的重要性与广泛性,以及神经网络在其中扮演的关键角色,知识编辑研究在计算机视觉领域获得了丰沃土壤,目前主要聚焦于两大任务:图像分类与图像补全。为测试这些任务的解决方案,研究者通常使用多种数据集,这些数据集在样本数量、类别、通道数以及图像分辨率等方面存在差异;表1列举了最常用的数据集概况。

Image Classification The task of image classification is straightforward, we wish to label a complete image (or predetermined portion) with its most likely semantic category, e.g., horse, cat, or car Szeliski (2022). In this context, an example is an image and its semantic label. The image, of predefined dimension (or resolution), is encoded as a 3D tensor $\boldsymbol{x}^{(i)}\in\mathbb{R}^{W\times}\bar{H}\times C$ , where $W$ is the width of the image, $H$ the height of the image, and $C$ the number of channels, depending on whether the image is grayscale (1) or RGB (3) or something else (e.g., RGBD Firman (2016) with an additional depth channel). The editing task is then often formulated by artificially corrupting a subset of either the iamnda gseusb soer qtuhee nltalyb eilnst.r oTdhuec ilnatgt erra insd tohme lmaborele sph rue fv fa lil neng t,w aitnhdi nu sa uwaliltyh hinelvdo lsveet st otr cari eni ante g amn oeddeitl ss eotn $(x_{e}^{(i)},y_{e}^{(i)})_ {i=1}^{N}$ , dw a th aes re et originally $y^{(i)}\neq y_{e}^{(i)}$ . Other works, such as Sotoudeh and Thakur (2021), introduce e.g. motion blur or fog Mu and Gilmer (2019) to corrupt a subset of the images, creating an edit set where originally $\boldsymbol{x}^{(i)}\neq\boldsymbol{x}_{e}^{(i)}$ .

图像分类
图像分类的任务很直观,我们希望用最可能的语义类别(如马、猫或汽车 Szeliski (2022))标注完整图像(或预定部分)。在此背景下,一个示例就是图像及其语义标签。图像以预定义尺寸(或分辨率)编码为三维张量 $\boldsymbol{x}^{(i)}\in\mathbb{R}^{W\times H\times C}$,其中 $W$ 为图像宽度,$H$ 为图像高度,$C$ 为通道数,具体取决于图像是灰度(1)、RGB(3)还是其他格式(例如带额外深度通道的 RGBD Firman (2016))。编辑任务通常通过人为破坏部分图像或标签来构建,例如随机修改标签以生成训练集 $(x_{e}^{(i)},y_{e}^{(i)})_ {i=1}^{N}$,其中原始标签满足 $y^{(i)}\neq y_{e}^{(i)}$。其他工作如 Sotoudeh 和 Thakur (2021) 会引入运动模糊或雾效 Mu 和 Gilmer (2019) 来破坏部分图像,构建编辑集使得原始图像满足 $\boldsymbol{x}^{(i)}\neq\boldsymbol{x}_{e}^{(i)}$。

Several datasets support knowledge editing experiments for this task, and a distinction is often made among “toy” and “large-scale” datasets. Well-known datasets like MNIST LeCun et al. (2010) and CIFAR-10 Krizhevsky (2009), which are widely used in the literature, are frequently employed for experimentation and belong to the former category. For more challenging and realistic scenarios, researchers turn to larger scale datasets, of which the most popular is surely the extensive ImageNet Database Deng et al. (2009a), which now encompasses over 10 million labeled images, spanning more than 20,000 object categories. Specifically, studies such as Sinitsin et al. (2020) and Lee et al. (2019) explore datasets derived from the ImageNet Large Scale Visual Recognition Challenges (ILSVRC) Russ a kov sky et al. (2015). To further accentuate the complexity, Sinitsin et al. (2020) introduces a highly challenging configuration, leveraging the Natural Adversarial Examples (NAE) dataset Hendrycks et al. (2021), consisting of 7500 natural images notorious for their arduous classification nature, where pre-trained models exhibit correct prediction rates of less than $1%$ for NAEs.

多个数据集支持该任务的知识编辑实验,通常分为"玩具级"和"大规模"数据集。文献中广泛使用的经典数据集如MNIST (LeCun et al., 2010) 和CIFAR-10 (Krizhevsky, 2009) 常被用于实验,属于前一类。针对更具挑战性的现实场景,研究者转向更大规模的数据集,其中最流行的当属包含1000多万张标注图像、涵盖2万多个物体类别的ImageNet数据库 (Deng et al., 2009a)。具体而言,Sinitsin et al. (2020) 和Lee et al. (2019) 等研究探索了源自ImageNet大规模视觉识别挑战赛 (ILSVRC) (Russakovsky et al., 2015) 的数据集。为突显复杂性,Sinitsin et al. (2020) 引入了极具挑战性的配置方案,采用以分类困难著称的自然对抗样本 (NAE) 数据集 (Hendrycks et al., 2021)——该数据集包含7500张自然图像,预训练模型对其正确预测率不足$1%$。

DatasetTasks#Examples#Classes#ChannelsResolution
MNISTClassification, Inpainting70k10128x28
CIFAR-10Classification60k10332x32
CIFAR-100Classification600k100332x32
ImageNetClassification1.2 M1000+3224x224+
NAEClassification7.5k200+3224x224+
CelebAInpainting200k3178x218

Table 1: Most important datasets used in Computer Vision for Knowledge editing. MNIST, CIFAR-10, CIFAR-100 are generally regarded as “toy” datasets while ImageNet, NAE, CelebA as more challenging testbeds.

数据集 任务 样本数 类别数 通道数 分辨率
MNIST 分类、图像修复 70k 10 1 28x28
CIFAR-10 分类 60k 10 3 32x32
CIFAR-100 分类 600k 100 3 32x32
ImageNet 分类 1.2 M 1000+ 3 224x224+
NAE 分类 7.5k 200+ 3 224x224+
CelebA 图像修复 200k 3 178x218

表 1: 知识编辑领域最重要的计算机视觉数据集。MNIST、CIFAR-10、CIFAR-100通常被视为"玩具"数据集,而ImageNet、NAE、CelebA则被视为更具挑战性的测试平台。

Image Inpainting Image inpainting, also known as image completion, is the task of reconstructing missing regions in an image Szeliski (2022). The problem is formalized as a regression over functions mapping pixel coordinates within $[0,1]^{2}$ to pixel values in [0, 1] (grayscale) or $[0,1]^{2}$ to pixel values in $[0,1]^{3}$ (RGB). This task has so far received less attention from the knowledge editing community Garnelo et al. (2018), leveraging datasets that encompass both the MNIST dataset, once again serving as a rudimentary example, and the CelebFaces Attributes Dataset (CelebA) Liu et al. (2015) for more challenging scenarios. The CelebA dataset presents a more demanding scenario, offering over 200,000 celebrity images, each accompanied by 40 attribute annotations, making it a challenging and comprehensive dataset for exploration.

图像修复
图像修复(Image Inpainting),也称为图像补全,是指重建图像中缺失区域的任务 Szeliski (2022)。该问题被形式化为一个回归问题,即从像素坐标空间 $[0,1]^{2}$ 映射到灰度值 [0, 1] 或 RGB 值 $[0,1]^{3}$ 的函数回归。目前知识编辑领域对此任务的关注较少 Garnelo et al. (2018),主要利用包含 MNIST 数据集(作为基础示例)和 CelebFaces Attributes Dataset (CelebA) Liu et al. (2015) 的数据集进行挑战性场景研究。CelebA 数据集提供了超过 20 万张名人图像,每张图像附带 40 个属性标注,是一个具有挑战性且全面的探索数据集。

3.2 Natural Language Processing

3.2 自然语言处理

Natural Language Processing (NLP) is also a broad field, concerned with giving computers the ability to process and understand human language Eisenstein (2019). Like in computer vision, in recent years researchers and practitioners in the field have leveraged the power of neural networks with many outstanding results Otter et al. (2020); Soltan et al. (2022); FitzGerald et al. (2022). With the recent paradigm shift from supervised learning to pre-training followed by fine-tuning Wang et al. (2022), and the trend towards larger and larger models Zhao et al. (2023), the ability to perform a cheap edit of a model instead of an expensive fine-tuning has motivated an intense interest from the knowledge editing community. Undoubtedly, within the NLP realm, the most widely targeted tasks for knowledge editing are Fact-Checking when dealing with classification, and (closed book) Question Answering for language generation. Some recent works also explore open Text Generation, editing of factual relations with had hoc datasets and also Document Classification. Table 2 provides an overview of the datasets commonly used for those tasks.

自然语言处理 (NLP) 同样是一个广泛的领域,致力于让计算机具备处理和理解人类语言的能力 Eisenstein (2019)。与计算机视觉类似,近年来该领域的研究人员和从业者利用神经网络的力量取得了许多杰出成果 Otter et al. (2020); Soltan et al. (2022); FitzGerald et al. (2022)。随着最近从监督学习转向预训练加微调的模式转变 Wang et al. (2022),以及模型规模越来越大的趋势 Zhao et al. (2023),以低成本编辑模型而非昂贵微调的能力激发了知识编辑社区的浓厚兴趣。毫无疑问,在 NLP 领域,知识编辑最广泛针对的任务是分类场景下的事实核查 (Fact-Checking) 和语言生成场景下的 (闭卷) 问答。一些最新研究还探索了开放文本生成、使用特定数据集编辑事实关系以及文档分类。表 2 概述了这些任务常用的数据集。

Fact-checking Fact-checking is the task of assessing whether claims made in written or spoken language are true, often addressed as a binary classification task Guo et al. (2022). In this setting, examples are natural language claims coupled with binary labels, even though occasionally a third neutral option is available. The claim $\boldsymbol{x}^{(i)}$ is encoded with some token iz ation scheme (or pre-trained word embedding) as a sequence of integers (or semantic vectors), while the label $y^{(i)}$ can take one of two values, positive or negative (optionally a third, aforementioned neutral value). One can then have a neural network predict this value with an explicit classification layer, or alternatively a language model producing special tokens (e.g., True/False or Supports/Refutes) for the considered truth values, when prompted with the claim under consideration. The Fact Checking task has been considered particularly appealing for the knowledge editing community De Cao et al. (2021); Mitchell et al. (2022a,b); Huang et al. (2023) for at least a couple of reasons. First, the recent phenomenal success of language models also highlighted their proneness to generate reasonable but factually wrong natural language text Ortega et al. (2021); Ji et al. (2023). This degrades the system performance and fails to meet user expectations in many real-world scenarios, leading to a great interest in the ability to mitigate these hallucinations. Furthermore, reasonable edit sets are fairly easy to obtain, e.g. randomly flipping the labels of claims from pre-existing datasets from Support to Refutes and vice versa. The most widely used datasets for this task are FEVER Thorne et al. (2018) and VitaminC Schuster et al. (2021). Both of them are extracted from Wikipedia and annotated by human experts, to arrive at (evidence, wikipage, claim, label) tuples. In both cases, the label can be Supports, Refutes or Not Enough Info depending on whether the evidence supports or not the claim. To construct proper editing datasets ${(x_{e}^{(i)},y_{e}^{(i)})_{i=1}^{N}}$ out of them, De Cao et al. (2021) (for FEVER) and Mitchell et al. (2022b)

事实核查
事实核查是评估书面或口头陈述真实性的任务,通常被视为二元分类任务 Guo et al. (2022)。在此设定中,样本为自然语言陈述及其二元标签,偶尔会包含第三个中性选项。陈述 $\boldsymbol{x}^{(i)}$ 通过某种 token 化方案(或预训练词嵌入)编码为整数序列(或语义向量),而标签 $y^{(i)}$ 可取正值或负值(可选地包含前文提及的中性值)。随后可通过神经网络的显式分类层预测该值,或由大语言模型针对待验证陈述生成特殊 token(如 True/False 或 Supports/Refutes)。

事实核查任务尤其受到知识编辑领域关注 De Cao et al. (2021); Mitchell et al. (2022a,b); Huang et al. (2023),主要原因包括:首先,大语言模型近年来的显著成功也暴露了其易生成合理但事实错误的自然文本的倾向 Ortega et al. (2021); Ji et al. (2023)。这种现象会降低系统性能,并在现实场景中无法满足用户预期,因此如何缓解这类幻觉引发了广泛兴趣。此外,合理的编辑集较易获取,例如将现有数据集中陈述的标签随机从 Supports 翻转为 Refutes,反之亦然。

该任务最广泛使用的数据集是 FEVER Thorne et al. (2018) 和 VitaminC Schuster et al. (2021)。两者均从维基百科提取并由专家标注,形成(证据、维基页面、陈述、标签)四元组。两种数据集的标签均为 Supports、Refutes 或 Not Enough Info,取决于证据是否支持陈述。为构建编辑数据集 ${(x_{e}^{(i)},y_{e}^{(i)})_{i=1}^{N}}$,De Cao et al. (2021)(针对 FEVER)和 Mitchell et al. (2022b)

(for VitaminC) grouped facts based on the same pages, augmented each fact $x_{e}^{(i)}$ with some rephrases $\tilde{x}_{e}^{(i)}$ , to assess generality, and randomly flipped the label ye ̸= y(i) in each group.

(for VitaminC) 将基于相同页面的事实分组,通过添加一些改写版本 $\tilde{x}_ {e}^{(i)}$ 来增强每个事实 $x_{e}^{(i)}$ ,以评估泛化性,并在每组中随机翻转标签 ye ̸= y(i)。

Question Answering The task of training a model to provide a correct natural language answer to a given question is referred to as question answering; more specifically, closed-book question answering is restricted to the case when the model is only fed with the question, and not a selection of possible answers, or supporting corpus from which an answer can be extracted Roberts et al. (2020). In this case, both the input $x$ and the target label $y$ are natural language sequences, and we test the extent to which the model parameters implicitly store the required knowledge to provide correct answers. The most widely adopted dataset is surely the zero-shot Relation Extraction dataset (zsRE) Levy et al. (2017), used in particular by Zhu et al. (2020); De Cao et al. (2021); Mitchell et al. (2022a,b); Meng et al. (2022a,b); Huang et al. (2023); Hartvigsen et al. (2022). This task and datasets are particularly appealing for knowledge editing. Indeed, closed-book question answering can benefit greatly from knowledge editing, as pipelines that solve the task usually leverage factual knowledge to answer questions; in the case of neural networks, this knowledge is acquired during training and implicitly stored in the networks’ parameters, and it is unclear how to tweak these parameters to correct wrong or undesired answers, especially as the networks grow bigger in size. Meng et al. (2022b) hypothesize that this factual knowledge takes the form of (relation, subject, object) triples, with intermediate layers acting as key-value storage units. This formalization lends itself nicely to the definition of an editing objective, rather than directly the open-ended natural language generation task. Furthermore, Levy et al. (2017) demonstrates that it is possible to reduce relation extraction to the problem of answering simple reading comprehension questions and provided in their dataset multiple templates for each relation. For example, the triple (occupation, s, o) can be naturally extracted by answering one of the following questions: What did s do for a living?, What is s ’s job? What is the profession of $s{\mathit{?}}$ . The subject $s$ can then be modified to create editing examples. Section 4.3 further discusses factual knowledge and how different works have modeled it for improving knowledge editing. Besides zsRE, knowledge editing of models solving Question Answering has been studied leveraging also on additional datasets such as t-REX Elsahar et al. (2018) and Natural Questions (NQ) Kwiatkowski et al. (2019). Finally, as a more challenging flavor of the same task with added counter factual information Meng et al. (2022b) introduced a new dataset called Counteract.

问答
训练模型为给定问题提供正确的自然语言答案的任务被称为问答;更具体地说,闭卷问答限制于模型仅接收问题本身,而不提供可能的答案选项或可提取答案的支持语料的情况 (Roberts et al., 2020)。此时,输入 $x$ 和目标标签 $y$ 均为自然语言序列,我们测试模型参数隐式存储所需知识以提供正确答案的程度。最广泛采用的数据集无疑是零样本关系抽取数据集 (zsRE) (Levy et al., 2017),被 Zhu et al. (2020)、De Cao et al. (2021)、Mitchell et al. (2022a,b)、Meng et al. (2022a,b)、Huang et al. (2023)、Hartvigsen et al. (2022) 等研究特别使用。该任务和数据集对知识编辑特别有吸引力。实际上,闭卷问答可以极大受益于知识编辑,因为解决该任务的流程通常利用事实知识来回答问题;对于神经网络,这些知识在训练期间获得并隐式存储于网络参数中,而如何调整这些参数以修正错误或不想要的答案尚不明确,尤其是随着网络规模增大。Meng et al. (2022b) 假设这种事实知识以 (关系, 主语, 宾语) 三元组的形式存在,中间层充当键值存储单元。这种形式化很好地服务于编辑目标的定义,而非直接开放式的自然语言生成任务。此外,Levy et al. (2017) 证明可以将关系抽取简化为回答简单阅读理解问题,并在数据集中为每个关系提供了多个模板。例如,三元组 (职业, s, o) 可以通过回答以下问题之一自然提取:s 以什么为生?s 的工作是什么?$s{\mathit{?}}$ 的职业是什么。然后可以修改主语 $s$ 以创建编辑示例。第4.3节进一步讨论事实知识以及不同工作如何建模以改进知识编辑。除 zsRE 外,还利用 t-REX (Elsahar et al., 2018) 和自然问题 (NQ) (Kwiatkowski et al., 2019) 等额外数据集研究了问答模型的知识编辑。最后,作为同一任务的更具挑战性变体,Meng et al. (2022b) 引入了一个包含反事实信息的新数据集 Counteract。

Further NLP tasks Beside the two popular tasks outlined above, Mitchell et al. (2022a) tested editing text generation from auto regressive GTP-like models on a special version of WikiText-103 Merity et al. (2016), where they are considering as prompts $(x_{e})$ passages sampled from WikiText itself and as edit targets $(y_{e})$ 10-token samples from a pre-trained distilGPT-2 model. This was for them a valid challenging editing setup since for their target model for editing a greedy 10-token prediction agrees with these edit targets for $\bar{<}1%$ of examples they extracted. Finally, more recently, Hartvigsen et al. (2022) tested their methodology on a novel task for knowledge editing using the SCOTUS dataset from Chalkidis et al. (2022). The classification task is to categorize U.S. Supreme Court documents over multiple decades into 11 topics. What makes this task interesting is that, over time, categorization rules change, so that label distributions shift. We note how this is off the shelf particularly realistic for knowledge editing as much of the world knowledge memorized by networks evolves over time just like those labels shifts in the dataset and the target of knowledge editing can be seen as keeping updated such world knowledge.

其他NLP任务
除了上述两个常见任务外,Mitchell等人(2022a)在WikiText-103 Merity等人(2016)的特殊版本上测试了自回归GTP类模型的文本生成编辑。他们将来自WikiText本身的段落样本作为提示$(x_{e})$,并将预训练distilGPT-2模型生成的10个token样本作为编辑目标$(y_{e})$。这对他们而言是一个有效的挑战性编辑设置,因为在目标模型中,贪婪的10-token预测与这些编辑目标的一致性仅占提取样本的$\bar{<}1%$。最近,Hartvigsen等人(2022)在Chalkidis等人(2022)的SCOTUS数据集上测试了其知识编辑新任务的方法。该分类任务需将数十年的美国最高法院文件归类至11个主题。该任务的独特之处在于分类规则会随时间变化,导致标签分布发生偏移。我们注意到,这种现成的设置对知识编辑特别真实,因为网络记忆的大部分世界知识会像数据集中标签偏移那样随时间演变,而知识编辑的目标可视为保持此类世界知识的更新。

DatasetTasksFormat#Examples#Classes
FEVERFact Checking(evidence, wikipage, claim, label)420k3
VitaminCFact Checking(evidence, wikipage, claim, label)450k3
ZsREQuestion Answering(subject, relation, object)120M
T-RExQuestion Answering(subject,relation,object)11M
NQQuestion Answering(question,answer)320k
CounterFactQuestion Answering(subject, relation,true object,false object)22k
WikitextText Generationtokens100M
SCOTUSDocumentClassification(date,text,label)9.2k11

Table 2: Most important datasets for knowledge editing used in NLP. We report characteristics of the original datasets even though often for knowledge editing ad hoc version, pre processed to make editing meaningful are used.

数据集 任务 格式 样本数量 类别数
FEVER 事实核查 (证据, 维基页面, 声明, 标签) 420k 3
VitaminC 事实核查 (证据, 维基页面, 声明, 标签) 450k 3
ZsRE 问答 (主语, 关系, 宾语) 120M
T-REx 问答 (主语, 关系, 宾语) 11M
NQ 问答 (问题, 答案) 320k
CounterFact 问答 (主语, 关系, 真实宾语, 虚假宾语) 22k
Wikitext 文本生成 Token 100M
SCOTUS 文档分类 (日期, 文本, 标签) 9.2k 11

表 2: 自然语言处理中知识编辑使用的最重要数据集。我们报告了原始数据集的特性,尽管知识编辑通常使用经过预处理以使其有意义的特定版本。

3.3 Other Applications

3.3 其他应用

Even though the majority of works in the knowledge editing literature has focused on the Computer Vision and Natural Language Processing fields, as described above, the general nature of the editing problem yielded interesting results also in other fields, and will likely yield more in more diverse fields and applications in the years to come. Among these, to the best of our knowledge, the most notable examples are applications in safety-critical scenarios and to graph neural networks; in the following we briefly review works from both.

尽管知识编辑领域的大多数研究集中在计算机视觉和自然语言处理领域,但如上所述,编辑问题的普适性也在其他领域产生了有趣成果,并有望在未来更广泛的领域和应用中催生更多突破。据我们所知,其中最显著的案例是安全关键场景的应用和图神经网络的应用。下文我们将简要回顾这两类工作。

Safety-critical Systems Safety-critical systems are those systems whose failure may lead to consequences that are determined to be unacceptable, such as significant damage to properties, the environment, or people Knight (2002). Deep neural networks have grown in popularity over the past decade and are now being used in safety-critical domains such as self-driving cars Gupta et al. (2021), healthcare Tekkesin et al. (2019) and aviation Sridhar (2020). Clearly, in such critical scenarios, being able to find and correct unsafe neural network behavior becomes a crucial objective. This has motivated a line of research within the knowledge editing community, that so far has only touched the aviation domain, specifically the aircraft collision avoidance problem. The systems currently employed provide safety guarantees at the cost of being poorly data efficient, and efforts have been made to integrate neural networks into the pipeline Julian et al. (2016); Julian and Koch ender fer (2019). As a consequence, several subsequent works Sotoudeh and Thakur (2021); Fu and Li (2021); Liang et al. (2023) from the knowledge editing community have proposed approaches for fixing unsafe behavior of neural networks integrated in safety-critical pipelines.

安全关键系统
安全关键系统是指那些一旦失效可能导致不可接受后果的系统,例如对财产、环境或人员造成重大损害 [Knight (2002)]。过去十年中,深度神经网络日益普及,现已被应用于自动驾驶汽车 [Gupta et al. (2021)]、医疗保健 [Tekkesin et al. (2019)] 和航空 [Sridhar (2020)] 等安全关键领域。显然,在此类关键场景中,发现并修正神经网络的不安全行为成为至关重要的目标。这推动了知识编辑领域的一系列研究,目前仅涉及航空领域,特别是飞机防撞问题。现有系统以数据效率低下为代价提供安全保障,而学界已尝试将神经网络整合到流程中 [Julian et al. (2016); Julian and Kochenderfer (2019)]。因此,知识编辑领域的后续研究 [Sotoudeh and Thakur (2021); Fu and Li (2021); Liang et al. (2023)] 提出了修正安全关键流程中神经网络不安全行为的方法。

Developing a robust collision avoidance algorithm that reliably prevents collision without alerting excessively is challenging due to sensor error and uncertainty in the future behavior of the aircraft. The Airborne Collision Avoidance System X (ACAS X) Koch ender fer and Chr ys sant haco poul os (2011) family of collision avoidance systems formulates the problem of collision avoidance as a partially observable Markov decision process. The variant for unmanned aircraft, ACAS Xu, uses dynamic programming (DP) to then find a solution in terms of resolution advisories that avoid collisions while minimizing disruptive alerts. The DP process makes use of a massive lookup table, that makes storage costly and certification time-consuming for certified avionics systems. Therefore, Julian et al. (2016) propose using a deep neural network for compressing the table without loss of performance, as measured by a set of safety and operational metrics. There are seven real-valued state variables that define an aircraft encounter, describing its geometry in terms of the two aircraft involved: (1) distance from ownship to intruder (2) angle to intruder relative to ownship heading direction (3) heading angle of intruder relative to ownship heading direction (4) speed of ownship 5) speed of intruder 6) time until loss of vertical separation (7) previous advisory action. There are then five possible horizontal maneuver advisories that the system can produce: clear-of-conflict, or adjusting course by turning left or right at two fixed angles (hence 4 more possibilities). The state variables are usually disc ret i zed arriving at $\approx120$ million points, and the aforementioned lookup table associates scores to all pairs of 120 million states and five actions. This table is what makes up the ACAS $\mathrm{Xu}$ dataset: ${(\boldsymbol{x}^{(i)},\boldsymbol{y}^{(i)})_ {i=1}^{N}}$ with $N=5\times120$ million, where $\boldsymbol{x}^{(i)}$ represents a disc ret i zed seven-dimensional state, and $y^{(i)}\in\mathbb{R}^{5}$ is the vector of scores associated to each of the five possible actions in that state. With 600 million floating-point numbers, the table requires over 2 GB of storage. The task for the neural network is to regress this table, minimizing parametric knowledge required (i.e., number of parameters) and error with respect to the table. It is interesting to note that this is an atypical regression problem, since we aim for the guarantee that the optimal advisory remains the same. When the difference between the scores of the first and second-best advisories is relatively small, simple regression techniques (e.g., minimizing the Mean Squared Error) can lead to the network realizing a different strategy from that of the original table. This is reflected in the design of the loss function. The network or its further refinements Julian and Koch ender fer (2019) is then verified via tools such as Wang et al. (2018); Katz et al. (2017), that are able to prove several input-output-based security properties, e.g., that a clear-of-conflict advisory will always be issued if the intruder is sufficiently far away, thus providing formal guarantees about DNN behavior. These properties are formalized as implications of the form $\forall x,x\in B\implies f_{\theta}(x)\in C$ , where $f_{\theta}$ is the function approx im at or realized by the DNN with parameters $\theta$ , $B$ is a bounded region of the input space and $C$ a bounded region of the output space. The ACAS $\mathrm{Xu}$ case study has so far been of great interest for the knowledge editing community since one such security property has been found to not be satisfied by the original network, exposing an input on which the network was inconsistent with the lookup table $\phi_{8}$ in Katz et al. (2017). This discrepancy would then be addressed by retraining the DNN, thus leading to the central question of knowledge editing: how to fix the network behavior on a limited set of points without affecting its behavior on unrelated points. Each work mentioned at the beginning of the paragraph has addressed this problem differently, but sharing the same setup: once a failing security property for a network is identified, and one is able to generate counter-examples, i.e., pairs $(x^{(i)},y^{(i)})$ such that $\boldsymbol{x}^{(i)}\in\boldsymbol{B}$ but $y^{(i)}\notin C$ , a certain strategy for defining candidate edit pairs can be formalized, i.e., how to assign a particular $\bar{y}^{(i)}$ to $y^{(i)}$ . Then, usually a subset of this becomes the edit set, while the remaining portion is chosen as generality set; finally, a locality set is defined by points correctly classified by the network, i.e., input-output pairs for which the properties under scrutiny hold true.

由于传感器误差和飞行器未来行为的不确定性,开发一种既能可靠防撞又不会过度告警的鲁棒避碰算法具有挑战性。机载防撞系统X (ACAS X) [Koch ender fer和Chr ys sant haco poul os (2011)] 系列将避碰问题建模为部分可观测马尔可夫决策过程。其无人飞行器版本ACAS Xu采用动态规划(DP)生成解决方案,在避免碰撞的同时最小化干扰性告警。该DP过程依赖庞大的查找表,导致航电系统存储成本高且认证耗时长。为此,Julian等人(2016)提出使用深度神经网络无损压缩该表,并通过安全性与操作性指标验证性能。

飞行器遭遇场景由七个实值状态变量定义,描述两架飞行器的几何关系:(1) 本机与入侵者的距离 (2) 相对于本机航向的入侵者方位角 (3) 相对于本机航向的入侵者航向角 (4) 本机速度 (5) 入侵者速度 (6) 垂直间隔丧失倒计时 (7) 先前告警动作。系统可生成五种水平机动告警:冲突解除,或以两种固定角度左转/右转调整航向(共4种可能)。状态变量经离散化后形成约1.2亿个点,查找表为所有状态-动作对关联评分,构成ACAS Xu数据集:${(\boldsymbol{x}^{(i)},\boldsymbol{y}^{(i)})_{i=1}^{N}}$($N=6$亿),其中$\boldsymbol{x}^{(i)}$为离散化七维状态,$y^{(i)}\in\mathbb{R}^{5}$是该状态下五个动作的评分向量。该表含6亿浮点数,存储需超2GB。

神经网络的任务是以最小参数量回归该表,同时保持最优告警策略不变。当最优与次优告警评分接近时,传统回归方法(如最小化均方误差)可能导致策略偏移,这体现在损失函数设计中。经Wang等人(2018)、Katz等人(2017)等工具验证,该网络[Julian和Koch ender fer (2019)]能保证形式化安全属性,例如$\forall x,x\in B\implies f_{\theta}(x)\in C$,其中$f_{\theta}$是DNN实现的函数近似器,$B$和$C$分别为输入/输出空间的限定区域。

ACAS Xu案例因原始网络未满足某项安全属性[Katz等人(2017)中的$\phi_{8}$不一致性]而成为知识编辑领域的研究热点。该差异需通过DNN重训练解决,引出了知识编辑的核心问题:如何修正有限点集上的网络行为而不影响无关点。相关研究均遵循相同框架:发现失效属性并生成反例$(x^{(i)},y^{(i)})$后,定义候选编辑对策略(如分配$\bar{y}^{(i)}$),将其分为编辑集与泛化集,并建立由正确分类点组成的局部性验证集。

Graph Neural Networks Deep learning models have been particularly successful when dealing with signals such as speech, images, or video, in which there is an underlying Euclidean structure; however, recently, there has been a growing interest in trying to apply learning on non-Euclidean geometric data, for instance represented in the form of graphs Bronstein et al. (2017). Graph Neural Networks (GNNs) learn node representations by applying shared permutation invariant functions over local neighborhoods of nodes in the graph Bronstein et al. (2021). These representations can then be used for tasks like node classification. For instance, assigning a category to each paper in a citation graph Wu et al. (2020). GNNs have achieved prominent results in learning features and topology of graph data, however, knowledge editing for GNNs is rarely explored, despite their widespread applicability; therefore, Liu et al. (2023) propose a method to edit these models, restricting to the aforementioned node classification task.

图神经网络 (Graph Neural Networks)
深度学习模型在处理具有欧几里得结构的信号(如语音、图像或视频)时取得了显著成功。然而,近年来人们越来越关注在非欧几里得几何数据(例如以图形式表示的数据)上应用学习技术 Bronstein et al. (2017)。图神经网络通过在图节点的局部邻域上应用共享的置换不变函数来学习节点表示 Bronstein et al. (2021)。这些表示可用于节点分类等任务,例如为引文图中的每篇论文分配类别 Wu et al. (2020)。虽然GNN在学习图数据的特征和拓扑结构方面取得了突出成果,但尽管其应用广泛,针对GNN的知识编辑研究却很少。为此,Liu et al. (2023)提出了一种编辑这些模型的方法,但仅限于上述节点分类任务。

The task can be formalized as follows: let $G=(V,E)$ be an undirected graph with $V=\left(v_{1},\ldots,v_{|V|}\right)$ and $E=$ $(e_{1},\dots,e_{|E|})$ being the set of nodes and edges, respectively. Given a feature space $\mathcal{X}$ (e.g., the space of real-valued $d$ -dimensional feature vectors $\mathbb{R}^{d}$ ), a node feature tensor $X\in\mathcal{X}^{\vert V\vert}$ and a label space $\mathcal{V}$ , the goal is to learn a representation $h_{v}$ from which a label $y_{v}\in\mathcal{V}$ for each node $v\in V$ can be easily predicted. Many datasets for node classification exist in the literature, comprised of data from various domains like citation networks and social networks, and we find again the distinction between small-scale and large-scale datasets $\mathrm{Wu}$ et al. (2020). Among the former we find datasets like Cora, which contains a selection of Machine Learning papers collected from the web, and the references automatically parsed from their bibliographies McCallum et al. (2000). In particular, the network contains $|V|=2708$ nodes (articles) and $|E|=5429$ links (citations); the feature space is $\bar{\mathcal{X}}={0,1}^{1433}$ , i.e., each node is described in terms of the presence (or absence) of certain keywords, taken from a dictionary of 1433 unique words (bag-of-words content representation); the label space is $y={1,\ldots,7}$ , i.e. the task is to predict to which of seven classes (e.g., Theory or Reinforcement Learning) each publication belongs. The Reddit dataset is instead a popular representative of large-scale node classification datasets. Hamilton et al. (2017). constructed a graph dataset from Reddit posts made in the month of September 2014. The node label $y_{v}\in\mathcal{V}$ in this case is the community, or “subreddit”, that a post belongs to, considering $|\mathcal{V}=50$ large communities. A post-to-post graph is constructed connecting posts if the same user comments on both. In total, this dataset contains $\lvert V\rvert=232,965$ posts with an average degree of 492 ( $|E|$ is in the order of 100 million edges). In both of these cases, and the many other instances of node classification datasets, constructing an edit dataset is fairly straightforward, and done in the same manner as for the image classification task in Computer Vision. After training the model $f_{0}(\cdot)$ under consideration on a subgraph, one evaluates it on the whole graph: each pair $(x^{(i)},y^{(i)})$ such that $\hat{y}^{(i)}=\arg\operatorname*{max}f(x^{(i)})$ is incorrect, becomes an edit pair $(x_{e}^{(i)},y_{e}^{(i)})$ . The geometry of graphs lends itself nicely to defining also generality and locality sets: indeed, since the task under consideration is node classification, as we have seen a single example $x^{\mathit{(i)}}$ describes a node $v$ , then one can define its neighborhood $N(x^{(i)})$ to be its actual neighborhood in the graph $\bar{N}_{G}(v)={w\in V\mid\exists(v,w)\in E}$ ; from this definition, generality, and locality sets follow consequently, as seen in earlier sections.

该任务可形式化描述如下:设无向图$G=(V,E)$中$V=\left(v_{1},\ldots,v_{|V|}\right)$为节点集合,$E=(e_{1},\dots,e_{|E|})$为边集合。给定特征空间$\mathcal{X}$(例如实值$d$维特征向量空间$\mathbb{R}^{d}$)、节点特征张量$X\in\mathcal{X}^{\vert V\vert}$和标签空间$\mathcal{V}$,目标是学习表征$h_{v}$以轻松预测每个节点$v\in V$的标签$y_{v}\in\mathcal{V}$。现有文献包含众多节点分类数据集,涵盖引文网络、社交网络等多个领域,并延续了Wu等人(2020)提出的小规模与大规模数据集划分标准。

小规模数据集代表如Cora,收录了从网络收集的机器学习论文及其参考文献自动解析结果(McCallum等, 2000)。该网络包含$|V|=2708$个节点(论文)和$|E|=5429$条边(引用),特征空间为$\bar{\mathcal{X}}={0,1}^{1433}$(基于1433个关键词的词袋表示),标签空间$y={1,\ldots,7}$对应七类研究主题(如理论或强化学习)。

Reddit数据集则是大规模节点分类的典型代表(Hamilton等, 2017),构建自2014年9月的Reddit发帖数据。节点标签$y_{v}\in\mathcal{V}$表示帖子所属的50个大型社区(subreddit),通过用户评论关系构建帖子关联图。该数据集共含$\lvert V\rvert=232,965$个帖子,平均度数492(边数$|E|$约1亿条)。

对于此类节点分类数据集,编辑数据集的构建方式与计算机视觉中的图像分类任务类似:在子图上训练目标模型$f_{0}(\cdot)$后,在全图评估时将所有满足$\hat{y}^{(i)}=\arg\operatorname*{max}f(x^{(i)})$错误的预测对$(x^{(i)},y^{(i)})$转为编辑对$(x_{e}^{(i)},y_{e}^{(i)})$。图结构特性天然支持泛化集与局部集的定义:由于任务涉及节点分类,单个样本$x^{\mathit{(i)}}$对应节点$v$时,可将其邻域$N(x^{(i)})$定义为图结构邻域$\bar{N}_{G}(v)={w\in V\mid\exists(v,w)\in E}$,进而推导出泛化集与局部集(如前述章节所示)。

4 Knowledge Editing Methodologies

4 知识编辑方法论

In recent times, several "knowledge editing" methods have been introduced to effectively modify the behaviors of models while maintaining their previous performance on the same task Sinitsin et al. (2020). These approaches draw inspiration from various fields of artificial intelligence research and can be broadly categorized into four distinct families: regular iz ation techniques, meta-learning, direct model editing, and architectural strategies.

近来,多种"知识编辑"方法被提出,旨在有效修改模型行为的同时保持其原有任务性能 [20]。这些方法汲取了人工智能研究不同领域的灵感,可大致分为四类:正则化技术、元学习、直接模型编辑和架构策略。

Regular iz ation techniques utilize various forms of regular iz ation to guide the model’s learning process during fine-tuning, encouraging it to incorporate the desired edits while retaining its original capabilities Zhu et al. (2020). Meta-learning approaches, on the other hand, employ hyper network models to learn parameter updates, enabling efficient adaptation to new tasks or knowledge De Cao et al. (2021); Mitchell et al. (2022a). Direct model editing methods involve directly modifying the model’s parameters or representations to incorporate the desired changes. These techniques can range from simple parameter updates to more complex approaches that leverage the model’s internal representations. Finally, architectural strategies explore modifying the model’s architecture itself, either by introducing new components or restructuring existing ones, to facilitate the integration of new knowledge or behaviors. In the upcoming sections, we will provide detailed discussions on each of these families of knowledge editing methodologies, highlighting their respective areas of application, advantages, limitations, and notable techniques within each category.

正则化技术利用各种形式的正则化来指导模型在微调过程中的学习,鼓励其融入所需的编辑内容同时保留原始能力 (Zhu et al., 2020) 。元学习方法则采用超网络模型来学习参数更新,从而高效适应新任务或新知识 (De Cao et al., 2021; Mitchell et al., 2022a) 。直接模型编辑方法通过直接修改模型参数或表征来实现目标变更,其技术跨度从简单参数更新到利用模型内部表征的复杂方法。最后,架构策略通过引入新组件或重组现有结构来修改模型架构本身,以促进新知识或行为的整合。后续章节将详细讨论这些知识编辑方法族,重点阐述其应用领域、优势、局限及各分类下的代表性技术。

The objective of this section is to categorize various knowledge editing techniques discussed in the literature into the four distinct families mentioned above. All presented works have different characteristics, target different areas of application, types of edits, and adopt diverse experimentation strategies. Nevertheless, the objective of all the works reported can be formulated within the formal framework given in 2.2. A comparison between the most notable KE methodologies at the time of writing can be found in Table 3. On the other hand, Table 4 presents a comparison with non-sequential single batch edits of factual knowledge.

本节的目标是将文献中讨论的各种知识编辑技术归类到上述四个不同的类别中。所有呈现的工作具有不同的特点,针对不同的应用领域、编辑类型,并采用多样化的实验策略。尽管如此,所报告的所有工作的目标都可以在2.2节给出的正式框架内表述。截至撰写本文时,最著名的知识编辑方法比较见表3。另一方面,表4展示了与非顺序单批次事实知识编辑的比较。

KEMethodologyKE CathegoryTraining RequiredPreserves ArchitectureOnly (ce, ye)SNSBNSSSEBSEScales to LM
FT + L2RegularizationFalse××
FT + KLRegularizationFalse×
EWCRegularizationFalse×
CNPArchitecturalTrue×
ENNMeta-LearningTrue×x
KnowledgeEditorMeta-LearningTrue×
MENDMeta-LearningTrue
MALMENMeta-LearningTrue×
ROMEDirect EditingFalsex×
MEMITDirect EditingFalse
PMETDirect EditingFalse
SERACArchitecturalTrue
CaliNetArchitecturalTrue×
T-PatcherArchitecturalFalse
GRACEArchitecturalTrue×
KE方法论 KE类别 需要训练 保留架构 仅(ce, ye) SNS BNS SSE BSE 可扩展至大语言模型
FT + L2 正则化 × ×
FT + KL 正则化 ×
EWC 正则化 ×
CNP 架构调整 ×
ENN 元学习 × ×
KnowledgeEditor 元学习 ×
MEND 元学习
MALMEN 元学习 ×
ROME 直接编辑 × ×
MEMIT 直接编辑
PMET 直接编辑
SERAC 架构调整
CaliNet 架构调整 ×
T-Patcher 架构调整
GRACE 架构调整 ×

Table 3: Comparison of the most notable KE methodologies in the literature. Different characteristics are reported for each approach, highlighting main advantages and disadvantages. For all approaches, we report: the category and whether it requires training of an auxiliary model; if it preserves the architecture of the edited model or requires the introduction of new components; whether it needs only the edit pair $(x_{e},y_{e})$ , or requires additional input to perform the edit; if it is able to handle single non-successive edits (SNS), batched non-successive edits (BNS), single successive edits (SSE), and batched successive edits. Finally, we report if it can scale to Large Models (LM), that, following the definition in Zhao et al. (2023), are models with more than 10B parameters.

表 3: 文献中最显著的知识编辑(KE)方法对比。每种方法均标注了不同特征,突出主要优缺点。所有方法均包含以下维度:所属类别、是否需要训练辅助模型;是否保留被编辑模型的原始架构或需引入新组件;仅需编辑对$(x_{e},y_{e})$或需额外输入执行编辑;能否处理单次非连续编辑(SNS)、批量非连续编辑(BNS)、单次连续编辑(SSE)及批量连续编辑。最后标注是否可扩展至大模型(LM)——根据Zhao等(2023)的定义,即参数量超过100亿的模型。

4.1 Regular iz ation Techniques

4.1 正则化技术

Catastrophic forgetting, Kemker et al. (2018), is a well-known phenomenon in literature, fundamentally limiting the flexibility of editing networks once trained or fine-tuned, Lee et al. (2019). Indeed, in the absence of any regular iz ation process, the regular fine-tuning signal has the ability to easily execute a specific edit, albeit with a tendency to over-fit to the provided edit examples. However, this approach lacks in providing generality to the edit and has a negative impact on locality, owing to the absence of a plasticity mechanism, Sinitsin et al. (2020). Similar to continual learning, regular iz ation techniques for knowledge editing aim to modify the standard fine-tuning signal of the target edit to ensure reliability and locality. Therefore, for regular iz ation techniques, KE is not para met rize d and does not require any pre-training, but it is nothing more than gradient descent computed with given edits and some specific regular iz ation terms. While not all of these techniques were originally developed for the specific task of knowledge editing, they have proven to be in some way effective and are commonly used as useful baselines for comparison. Moreover, due to their simplicity, they can easily adapt to work with different types of edits: from single non-successive edits to batches of successive edits. However, depending on the number of layers fine-tuned, they rarely scale to models with large number of parameters such as Large Language Models (LLMs), that, according to Zhao et al. (2023) are models with more than 10B parameters. In these cases, methodologies discussed in the following sections, may be better suited for efficiently editing large-scale with constrained resources.

灾难性遗忘 (catastrophic forgetting) 是文献中广为人知的现象 (Kemker et al., 2018) ,从根本上限制了网络在训练或微调后的可编辑性 (Lee et al., 2019) 。事实上,在缺乏任何正则化过程的情况下,常规微调信号虽然能轻松执行特定编辑,但往往会过度拟合提供的编辑样本。由于缺乏可塑性机制 (Sinitsin et al., 2020) ,这种方法既无法保证编辑的泛化性,也会对局部性产生负面影响。

与持续学习类似,知识编辑的正则化技术旨在调整目标编辑的标准微调信号,以确保可靠性和局部性。因此对于正则化技术而言,知识编辑 (KE) 不需要参数化,也无需任何预训练,本质上只是结合给定编辑样本和特定正则项进行的梯度下降。虽然这些技术并非专为知识编辑任务开发,但已被证明具有一定效果,常被用作比较基线。得益于其简洁性,它们能轻松适配不同类型的编辑:从单次非连续编辑到批量连续编辑。

然而根据微调层数的不同,这些技术很难扩展到参数量超过100亿的大语言模型 (LLM) (Zhao et al., 2023) 。针对此类情况,后续章节讨论的方法可能更适合在有限资源下高效编辑大规模模型。

For instance, authors in Sotoudeh and Thakur (2021) introduce a technique called “Neural Network Patching” that allows for the correction of deep neural networks without retraining from scratch. They present the concept of neural network patching, which involves identifying and correcting the faulty or damaged components of a network. The proposed method involves using a set of repair operators to identify and replace damaged components of the network, identifying the minimum $L_{2}$ norm parameter update that reliably edits the model output, while minimizing the deviation from the original target model. Conversely, Zhu et al. (2020) utilize a constrained optimization process instead of penalizing an updated model for deviating from the original one. Their method employs either an $L_{2}$ or $L_{\infty}$ constraint between the original model parameters and the edited ones, highlighting the importance of selecting a subset of parameters to be updated. Additionally, they demonstrate how their method is similar to Elastic Weight Consolidation (EWC), Lee et al. (2019), which involves computing a penalty term for each weight in the neural network based on how much it contributes to the performance of the original task. Similarly to Zhu et al. (2020), loss enriched with the Kullback–Leibler divergence (KL) have been proposed in order to regularize the network’s weights based on the KL divergence between the network’s output on the old task and the network’s output on the new task. This encourages the network to maintain similar weights for the parts of the network that are relevant to both tasks Yoon et al. (2017); Serra et al. (2018); Mitchell et al. (2022a); Huang et al. (2023).

例如,Sotoudeh 和 Thakur (2021) 提出了一种称为“神经网络修补 (Neural Network Patching)”的技术,无需从头训练即可修正深度神经网络。他们阐述了神经网络修补的概念,即识别并修复网络中故障或损坏的组件。该方法通过一组修复算子识别并替换网络中受损的组件,寻找能可靠修正模型输出且与原目标模型偏差最小的最小 $L_{2}$ 范数参数更新。

相反,Zhu 等人 (2020) 采用约束优化过程而非惩罚更新模型与原模型的偏差。他们的方法在原模型参数与编辑后参数之间施加 $L_{2}$ 或 $L_{\infty}$ 约束,强调选择待更新参数子集的重要性。此外,他们指出该方法与弹性权重巩固 (Elastic Weight Consolidation, EWC) (Lee 等人, 2019) 的相似性——后者通过计算神经网络中每个权重对原任务性能的贡献度来确定其惩罚项。

与 Zhu 等人 (2020) 类似,基于旧任务与新任务网络输出的 KL 散度 (Kullback-Leibler divergence),研究者提出用 KL 散度增强损失函数来正则化网络权重 (Yoon 等人, 2017; Serra 等人, 2018; Mitchell 等人, 2022a; Huang 等人, 2023),从而促使网络对双任务相关部分保持权重一致性。

Table 4: Comparison of some of the most notable KE methodologies on zSRE Levy et al. (2017) and ConterFact Meng et al. (2022b) with non-sequential single batch edits of factual knowledge. For each methodology, we report success rate (SR), generalization rate (GR), and Drawdown (DD) metrics. Results adapted from Yao et al. (2023).

DatasetModelEvaluation MetricsFTKEMENDROMEMEMITSERACT-Patcher
ZsREGPT-J[6B]SR54.706.6045.6099.1899.2390.1697.12
GR49.207.8048.0094.9087.1689.9694.95
DD62.765.8211.79000.13.76
SR99.9013.4073.8099.8099.9099.78100.00
GR97.5311.0074.2086.6373.1399.4183.98
CounterFactDD98.985.6296.25001.1191.63

表 4: 在 zSRE Levy et al. (2017) 和 ConterFact Meng et al. (2022b) 数据集上对非连续单批次事实知识编辑的部分最显著知识编辑 (KE) 方法比较。每种方法均报告成功率 (SR)、泛化率 (GR) 和回撤率 (DD) 指标。结果改编自 Yao et al. (2023)。

数据集 模型 评估指标 FT KE MEND ROME MEMIT SERAC T-Patcher
ZsRE GPT-J[6B] SR 54.70 6.60 45.60 99.18 99.23 90.16 97.12
GR 49.20 7.80 48.00 94.90 87.16 89.96 94.95
DD 62.76 5.82 11.79 0 0 0.1 3.76
SR 99.90 13.40 73.80 99.80 99.90 99.78 100.00
GR 97.53 11.00 74.20 86.63 73.13 99.41 83.98
CounterFact DD 98.98 5.62 96.25 0 0 1.11 91.63

As previously mentioned, the techniques outlined in this section have been primarily utilized as baseline approaches in knowledge editing and heavily intersect with continual learning works Mundt et al. (2023). However, in various experiments cited in De Cao et al. (2021); Mitchell et al. (2022a); Huang et al. (2023), these techniques have demonstrated limited efficacy in maintaining model accuracy with regard to previous knowledge. In fact, as noted by De Cao et al. (2021), regular iz ation techniques overlook the highly nonlinear nature of large models and the significant impact that even minor changes in parameter space can have on the output of numerous data points. For those reasons, more elaborated works have been proposed in literature as the ones in the following sections.

如前所述,本节所述技术主要用作知识编辑的基线方法,并与持续学习工作 Mundt et al. (2023) 存在大量交集。然而,在 De Cao et al. (2021)、Mitchell et al. (2022a)、Huang et al. (2023) 引用的多项实验中,这些技术在维持模型对先前知识的准确性方面表现有限。事实上,如 De Cao et al. (2021) 所指出的,正则化技术忽略了大模型的高度非线性特性,以及参数空间中即使微小变化也可能对大量数据点输出产生的重大影响。基于这些原因,后续章节将介绍文献中提出的更精细的工作。

4.2 Meta-Learning and Hyper Networks

4.2 元学习与超网络

"Meta-learning techniques refer to a set of algorithms and approaches that enable machines to learn how to learn Finn et al. (2017). These techniques have proven to be particularly useful in scenarios where a model needs to adapt quickly to new tasks or environments with limited data. For instance, in computer vision Ren et al. (2018), meta-learning can be employed to rapidly fine-tune a pre-trained model to recognize new object categories or adapt to different imaging conditions. Similarly, in natural language processing Gu et al. (2020), meta-learning can facilitate the adaptation of language models to different domains or styles of text with minimal additional training data. Robotics Finn et al. (2017) is another domain where meta-learning has shown promise, enabling robots to rapidly acquire new skills or adapt to changes in their environment. In reinforcement learning Houthooft et al. (2018), meta-learning can help agents generalize their learned policies to new environments or task variations, improving their sample efficiency and adaptability.

元学习技术指的是一组让机器学会如何学习的算法和方法 [Finn et al. (2017)]。这些技术在模型需要快速适应新任务或数据有限的环境时特别有效。例如,在计算机视觉领域 [Ren et al. (2018)],元学习可用于快速微调预训练模型以识别新物体类别或适应不同的成像条件。同样,在自然语言处理领域 [Gu et al. (2020)],元学习能帮助语言模型以最少的额外训练数据适应不同领域或文本风格。机器人技术 [Finn et al. (2017)] 是元学习展现出潜力的另一个领域,它使机器人能够快速掌握新技能或适应环境变化。在强化学习领域 [Houthooft et al. (2018)],元学习可以帮助智能体将学习到的策略泛化到新环境或任务变体中,从而提高样本效率和适应性。

The key advantage of meta-learning techniques lies in their ability to leverage prior knowledge and experience to learn new tasks or adapt to new environments more efficiently. By learning how to learn, these techniques can rapidly incorporate necessary modifications and adapt to new scenarios with significantly reduced data requirements, making them well-suited for applications where data is scarce, or the environment is dynamically changing.

元学习技术的关键优势在于其能够利用先验知识和经验更高效地学习新任务或适应新环境。通过学习如何学习,这些技术可以快速整合必要的修改,并以显著减少的数据需求适应新场景,使其非常适合数据稀缺或环境动态变化的应用场景。

They can be broadly categorized into two types: model-based and optimization-based meta-learning. In the context of knowledge editing, model-based meta-learning focuses on learning a model that can be used to adapt to new data efficiently. This involves learning the structure of the model and its parameters such that it can be generalized to new data. In optimization-based meta-learning, the focus is on learning how to optimize the parameters of a model to adapt to new knowledge.

它们大致可分为两类:基于模型(model-based)和基于优化(optimization-based)的元学习。在知识编辑的背景下,基于模型的元学习侧重于学习一个能够高效适应新数据的模型,这涉及学习模型的结构及其参数,以便泛化到新数据。而基于优化的元学习则专注于学习如何优化模型参数以适应新知识。

In literature, authors of “Editable Neural Networks” (ENN), Sinitsin et al. (2020), firstly exploit the meta-learning paradigm for knowledge editing to “learn to allow effective editing”. The core idea behind “Editable Training” is to enforce the model parameters to be “prepared” for the editor function KE (Section 2.2), that in their experimentation is defined as Stochastic Gradient Descent (SGD) with up to $k$ steps and learning rate $\alpha$ . In particular, they propose to train the starting model $f_{0}$ with a loss that at the same time encourages reliability and locality. That is obtained with two additional terms to the base loss, one that measures the success of an edit, cross-entropy, and another that assesses the distance in the output probability between the edited model and the original one, KL divergence. They prove to be successful on either computer vision datasets for classification Krizhevsky et al. (2009); Deng et al. (2009b) and Natural Adversarial Examples (NAE) Hendrycks et al. (2021) and machine translation Cettolo et al. (2014). They work with single non-successive and successive edits, with good performances on both types of edits. Nevertheless, as pointed out by Mitchell et al. (2022a), the methodology requires to further training the base model before an edit with a pre-training step. That can be critical in scenarios with large models (LM) or when the memory available is a constraint.

在文献中,《可编辑神经网络》(Editable Neural Networks, ENN) 的作者 Sinitsin 等人 (2020) 首次利用元学习范式进行知识编辑,以"学会允许有效编辑"。"可编辑训练"的核心思想是强制模型参数为编辑函数 KE (第 2.2 节) 做好"准备",在他们的实验中,该函数被定义为最多 $k$ 步和学习率 $\alpha$ 的随机梯度下降 (Stochastic Gradient Descent, SGD)。特别是,他们提出用同时鼓励可靠性和局部性的损失函数来训练初始模型 $f_{0}$。这是通过在基础损失函数上增加两个额外项实现的:一项衡量编辑成功的交叉熵,另一项评估编辑后模型与原始模型输出概率之间距离的 KL 散度。他们在计算机视觉分类数据集 Krizhevsky 等人 (2009); Deng 等人 (2009b) 和自然对抗样本 (Natural Adversarial Examples, NAE) Hendrycks 等人 (2021) 以及机器翻译 Cettolo 等人 (2014) 上都证明了该方法的成功。他们处理了单次非连续和连续编辑,在两种编辑类型上都表现良好。然而,正如 Mitchell 等人 (2022a) 所指出的,该方法需要在编辑前通过预训练步骤进一步训练基础模型。这对于大型模型 (Language Model, LM) 或内存受限的场景可能至关重要。

Conversely, De Cao et al. (2021) they propose an optimization-based meta-learning, firstly employing a hyper network Ha et al. (2016) dubbed Knowle Editor with the objective to “learn to update” the parameters of another network. Therefore, since the knowledge editing task requires being locale except for the edit, they frame the learning task as a constrained optimization problem using a Bidirectional-LSTM Schmid huber et al. (1997) and some downstream Feed-Forward Neural Networks (FFNN). Once trained, only the input edit feed the hyper network that predicts vectors to gate and shift the edit fine-tuning gradient of the starting network in respect to a certain weight matrix. Therefore, we can say that Knowledge Editor learns how to modify the gradient in order to provide the property enumerated in Section 2.4. On the other hand, the training itself of the hyper network does not change the weights of the starting model as Sinitsin et al. (2020), but requires some original training samples to estimate the original output probability distribution. Indeed, as Sinitsin et al. (2020), the hyper network is trained with the sum of two loss components: the first one aims at providing reliability and generality with cross-entropy between semantically equivalent edits and predictions of the edited model and the second one provides locality with the KL divergence as Sinitsin et al. (2020). However, in addition to them, they propose to add a margin to KL and iterative ly reduce it to progressively make the edit more local. They test their Knowledge Editor on different NLP tasks, from fact-checking (FC) with a fine-tuned BERT model Devlin et al. (2018) with the FEVER dataset Thorne et al. (2018), to closed book question answering (QA) with BART model Lewis et al. (2020) on the Zero-Shot Relation Extraction (zsRE) dataset Levy et al. (2017). However, they only experimented with single non-successive changes. Nevertheless, authors of Huang et al. (2023) adopt Knowledge Editor as a baseline and their experiments on FEVER and zsRE show that it fails to implement more than a couple of successive single edits. Indeed, as hypothesized by the authors, Knowledge Editor hyper network is trained with the starting model $f_{0}$ and thus strongly coupled with the original parameters. As the editing proceeds, the model becomes more different from the initial one, resulting in their failure.

相反,De Cao等人(2021)提出了一种基于优化的元学习方法,首次采用Ha等人(2016)提出的超网络(称为Knowle Editor),其目标是"学习更新"另一个网络的参数。由于知识编辑任务要求除编辑外保持局部性,他们使用双向LSTM (Schmidhuber等人,1997)和一些下游前馈神经网络(FFNN)将该学习任务构建为约束优化问题。训练完成后,仅输入编辑内容给超网络,该网络会预测向量来门控和偏移起始网络针对特定权重矩阵的编辑微调梯度。因此,可以说Knowledge Editor学会了如何修改梯度以提供第2.4节列举的特性。另一方面,超网络本身的训练不会像Sinitsin等人(2020)那样改变起始模型的权重,但需要一些原始训练样本来估计原始输出概率分布。实际上,与Sinitsin等人(2020)类似,超网络通过两个损失分量的总和进行训练:第一个分量旨在通过语义等效编辑与编辑模型预测之间的交叉熵提供可靠性和通用性,第二个分量则像Sinitsin等人(2020)那样通过KL散度提供局部性。此外,他们还提出为KL添加边界并迭代减小该边界,逐步使编辑更加局部化。他们在不同NLP任务上测试了Knowledge Editor,从使用微调BERT模型(Devlin等人,2018)和FEVER数据集(Thorne等人,2018)的事实核查(FC),到使用BART模型(Lewis等人,2020)在零样本关系抽取(zsRE)数据集(Levy等人,2017)上的闭卷问答(QA)。然而,他们仅实验了单个非连续更改。尽管如此,Huang等人(2023)的作者采用Knowledge Editor作为基线,在FEVER和zsRE上的实验表明它无法实现超过几次连续的单次编辑。正如作者假设的那样,Knowledge Editor的超网络是与起始模型$f_{0}$一起训练的,因此与原参数强耦合。随着编辑的进行,模型与初始模型的差异越来越大,导致其失败。

Building over De Cao et al. (2021), authors of Mitchell et al. (2022a) leverage on hyper networks too in order to learn how to update weights of a starting model. However, while Knowledge Editor trains a recurrent neural network to map the edit example into a rank-1 mask over the gradient, Mitchell et al. (2022a) hyper network, named MEND, directly maps the gradient into a new parameter update, retaining tract ability by leveraging the low-rank form of the gradient. Indeed, the input of a MEND network is a decomposed gradient and the output is formed by some pseudo activation s and psudodelta that should encapsulate, reliability, generality, and locality. Indeed, as De Cao et al. (2021), they make use of cross-entropy between semantically equivalent edits and predictions of the edited model to enforce gene r ability and reliability (edit loss), and KL divergence for locality without the margin (locality loss). In addition, they propose two further hyper parameters to perform a weighted sum of the two losses and to make learnable the edit learning rate coefficient. Each layer of the network itself consists of two consecutive blocks, both initialized to compute the exact identity function of the normalized decomposed gradient, using four matrices initialized with zero or Xavier uniform initialization Glorot and Bengio (2010). Finally, in order to edit multiple layers of the starting model with same matrices dimension, they propose to use the same MEND network, while applying a learned layer-specific scale and offset similar to Perez et al. (2018). As De Cao et al. (2021), they experiment their methodology only with NLP tasks using FEVER Thorne et al. (2018) and zsRE Levy et al. (2017) datasets. In addition, they also evaluate with GPT-style models, Radford et al. (2018), working with a custom version of Wikitext-103 Merity et al. (2016), named Wikitext Generation. They both experiment with non-sequential single or batch types of edits, showing large regression over 100 simultaneous edits. Finally, it is important to point out that as Knowledge Editor, also MEND has the same limitations with successive edits, being strictly tied to the weights of the starting model. In fact, if pre-training is conducted on $f_{t}$ KE will exhibit significantly poorer performance with $f_{t+1}$ . Moreover, as the weights of the edited model diverge from those of the pre-training, MEND will gradually deteriorate, ultimately losing all its editing capabilities.

基于De Cao等人(2021)的研究,Mitchell等人(2022a)的作者同样利用超网络来学习如何更新初始模型的权重。然而,Knowledge Editor通过训练循环神经网络将编辑样本映射为梯度上的秩1掩码,而Mitchell等人(2022a)提出的MEND超网络则直接将梯度映射为新的参数更新,通过利用梯度的低秩形式保持可追踪性。具体而言,MEND网络的输入是分解后的梯度,输出由若干伪激活和伪增量组成,这些输出需要封装可靠性、通用性和局部性。与De Cao等人(2021)类似,他们使用语义等价编辑与编辑模型预测之间的交叉熵来增强通用性和可靠性(编辑损失),并使用不带边际的KL散度来衡量局部性(局部性损失)。此外,他们还提出了两个超参数:用于对两种损失进行加权求和,以及使编辑学习率系数可学习。网络每层包含两个连续模块,均初始化为计算归一化分解梯度的恒等函数,使用四个经零初始化或Xavier均匀初始化(Glorot和Bengio,2010)的矩阵。最后,为使用相同维度矩阵编辑初始模型的多层结构,他们提出采用同一MEND网络,同时应用类似Perez等人(2018)的层特定学习缩放和偏移。与De Cao等人(2021)相同,他们仅使用FEVER(Thorne等人,2018)和zsRE(Levy等人,2017)数据集在NLP任务上验证方法。此外,他们还评估了GPT风格模型(Radford等人,2018),使用定制版Wikitext-103(Merity等人,2016)即Wikitext Generation进行测试。实验涵盖非连续单次或批量编辑类型,显示超过100次同步编辑会出现显著性能衰退。需要指出的是,与Knowledge Editor类似,MEND在连续编辑时也存在相同局限——严格依赖初始模型权重。若在$f_{t}$上进行预训练,KE在$f_{t+1}$上表现将显著下降。随着编辑模型权重与预训练权重的偏离,MEND性能会逐步恶化,最终完全丧失编辑能力。

Similar to MEND, MALMEN Tan et al. (2023) utilizes a hyper network to generate parameter shifts that can be applied to the language model. However, MALMEN distinguishes itself by formulating the parameter shift aggregation as a least squares problem, which it solves using the normal equation. This allows MALMEN to effectively combine the parameter shifts corresponding to different facts, mitigating the potential cancellation effects that can arise when simply summing the shifts. Additionally, MALMEN separates the computation between the hyper network and the language model, enabling the use of arbitrary batch sizes on both components. This memory-economic training strategy permits MALMEN to scale to editing thousands of facts simultaneously, substantially outperforming the editing capabilities of MEND. The paper provides thorough empirical evaluations demonstrating MALMEN’s superior s cal ability and effectiveness across various language model architectures and knowledge-intensive NLP tasks, including closed book fact verification on the FEVER dataset Thorne et al. (2018) and question answering on the zsRE dataset Levy et al. (2017) for BERT-base, GPT-2, T5-XL (2.8B), and GPT-J (6B). Finally, it is important to point out that as Knowledge Editor, and MEND, also MALMEN has the same limitations with successive edits, being strictly tied to the weights of the starting model.

与MEND类似,MALMEN (Tan et al., 2023) 采用超网络生成可应用于语言模型的参数偏移量。但MALMEN的创新点在于将参数偏移聚合表述为最小二乘问题,并通过正规方程求解。这种方法能有效融合不同事实对应的参数偏移,避免简单求和可能导致的抵消效应。此外,MALMEN实现了超网络与语言模型的解耦计算,支持对两个组件使用任意批处理大小。这种内存经济的训练策略使MALMEN能同时编辑数千条事实,性能显著超越MEND的编辑能力。论文通过全面实验验证了MALMEN在多种语言模型架构(包括BERT-base、GPT-2、T5-XL (2.8B) 和GPT-J (6B))及知识密集型NLP任务(如FEVER数据集 (Thorne et al., 2018) 的闭卷事实核查、zsRE数据集 (Levy et al., 2017) 的问答任务)中的卓越扩展性和有效性。最后需指出,与Knowledge Editor和MEND相同,MALMEN在连续编辑时也存在相同局限——严格依赖初始模型的权重参数。


Figure 2: Scaling curves showing three different evaluation metrics with an increased number of non-successive batch edits for three different KE methodologies: MEND, ROME, and MEMIT. Results are computed using COUNTER FACT and GPT-J. Locality is shown not as drawdown (DD), but as its complementary specificity over a neighborhood of samples Meng et al. (2022b). ROME and MEND performs well up to ten edits, but rapidly degrade, losing almost all SR before batches of 1k. On the other hand, MEMIT performs well with considerable large batches of edits. Adapted from Meng et al. (2022a).

图 2: 扩展曲线展示了三种不同KE方法(MEND、ROME和MEMIT)在非连续批量编辑数量增加时的三种评估指标。结果使用COUNTERFACT和GPT-J计算。局部性未以下降值(DD)显示,而是通过样本邻域的互补特异性呈现(Meng et al., 2022b)。ROME和MEND在十次编辑内表现良好,但随后迅速退化,在1k批量编辑前几乎失去所有SR。相比之下,MEMIT在大批量编辑时仍保持良好性能。改编自Meng et al. (2022a)。

4.3 Direct Model Editing

4.3 直接模型编辑

This family of approaches, known as direct model editing, aims to directly edit the weights or parameters within the target neural network, enabling effective implementation of a predefined set of changes while minimizing computational overhead. These methodologies allow injecting knowledge by modifying only certain parameters of the model, making them particularly suitable for foundational models Chang et al. (2023). By directly manipulating the variables of a model, most direct model editing techniques build upon efforts to localize and understand the internal mechanisms within models, striving to attribute knowledge acquired during training to specific neurons or parameters in the network Elhage et al. (2021); Dar et al. (2023); Mitchell et al. (2022b). Consequently, these approaches endeavor to edit the activation s or values of identified neurons or parameters to reflect the desired changes.

这类方法被称为直接模型编辑 (direct model editing),旨在直接修改目标神经网络的权重或参数,既能有效实现预定义的变更集,又能将计算开销降至最低。这些方法仅需调整模型的特定参数即可注入知识,因此特别适合基础模型 [Chang et al., 2023]。通过直接操控模型变量,大多数直接模型编辑技术都建立在模型内部机制定位与理解的研究基础上,试图将训练获得的知识归因于网络中特定神经元或参数 [Elhage et al., 2021; Dar et al., 2023; Mitchell et al., 2022b]。因此,这类方法通过编辑已识别神经元或参数的激活值来反映预期变更。

However, it is crucial to recognize that the applicability and effectiveness of direct model editing techniques may vary depending on the underlying neural network architecture and the nature of the knowledge or changes to be incorporated. While extensive research has been conducted on specific architectures such as Multilayer Perce ptr on s (MLPs) and Transformers, the generalization of these techniques to alternative architectures or problem domains warrants further investigation. Therefore, an in-depth analysis and exploration of the applicable scenarios for direct model editing techniques are beneficial, considering factors such as the model architecture, the type of knowledge or changes to be incorporated, and the computational constraints of the target environment.

然而,必须认识到直接模型编辑技术 (direct model editing) 的适用性和有效性可能因底层神经网络架构、待整合知识或修改的性质而有所差异。虽然针对多层感知机 (MLPs) 和 Transformer 等特定架构已开展大量研究,但这些技术在其他架构或问题领域的泛化能力仍需进一步验证。因此,结合模型架构、待整合知识类型、目标环境计算限制等因素,深入分析并探索直接模型编辑技术的适用场景具有重要意义。

Geva, Mor, et al. Tay et al. (2022), firstly identify the MLP layers in a masked LM transformer as key-value memories of entities and information associated with that entity, Sukhbaatar et al. (2015). Building on this finding, Dai et al. (2022) demonstrate a method to edit facts in BERT models Devlin et al. (2018); they propose the concept of knowledge neurons to study how factual knowledge is stored in pretrained Transformers. Specifically, the authors examine the fill-in-the-blank cloze task for BERT and propose a knowledge attribution method to identify the neurons that express a given relational fact. They find that the activation of such knowledge neurons is positively correlated to the expression of their corresponding facts. Therefore, they present a preliminary methodology that leverages knowledge neurons to edit factual knowledge in Transformers, even without any fine-tuning. The authors perform a knowledge surgery for pretrained Transformers by directly modifying the corresponding parameters in feed-forward networks. Such surgery shows promising results, keeping a moderate influence on other knowledge. The methodology proposed can be used to perform both single and multiple edits at the same time. However, the authors only experimented with single edits in their paper, focusing on specifically on factual knowledge.

Geva、Mor等人与Tay等人(2022)首次将掩码语言模型Transformer中的MLP层识别为实体及其关联信息的键值记忆存储(Sukhbaatar等人,2015)。基于这一发现,Dai等人(2022)提出了一种编辑BERT模型(Devlin等人,2018)中事实知识的方法,他们提出知识神经元(knowledge neurons)概念来研究预训练Transformer如何存储事实知识。具体而言,作者通过BERT的完形填空任务提出知识归因方法,用于识别表达特定关系事实的神经元。研究发现这些知识神经元的激活强度与其对应事实的表达呈正相关。因此,他们提出了一种无需微调即可编辑Transformer中事实知识的初步方法,通过直接修改前馈网络中的对应参数对预训练Transformer实施"知识手术"。该方法对其他知识影响可控,且能同时实现单条或多条知识编辑,但论文仅针对单条事实知识编辑进行了实验验证。

Factual knowledge is also the focus of Meng Kevin, et al. work, Meng et al. (2022b), but exclusively experimenting on auto regressive language modelling, such as GPT-style models, Radford et al. (2018). The study explores the storage and direct manipulation of factual associations in transformer models. These associations are modeled as tuples, represented as $t=(s,r,o)$ , comprising a subject $s$ , a relation $r$ , and an object $o$ , which connect to form a factual relationship. The research focuses into understanding how these tuples are stored within transformer models and investigates methods to facilitate their direct editing. Firstly, they trace the causal effects of hidden state activation s within GPT using causal mediation analysis Pearl (2001) as previously done by other works Vig et al. (2020) to identify the specific modules that mediate recall of a fact about a subject. Their analysis reveals that feed forward MLPs at a range of middle layers are decisive when processing the last token of the subject name. Secondly, they test their previous findings by introducing a novel direct model editing methodology: Rank-One Model Editing method (ROME). The algorithm aims at producing single, not successive edits by altering the parameters that determine a feed forward layer’s behavior. As stated by the authors, despite the simplicity of the intervention, ROME is similarly effective to other model-editing approaches achieving good generalization and locality simultaneously, whereas previous approaches sacrifice one or the other. Furthermore, they introduce a new dataset, Counter Fact (derived from a set of true facts from WikiData), for evaluating counter factual edits in language models. It contains 21,919 records with a diverse set of subjects, relations, and linguistic variations. Finally, they propose to monitor a further metrics, fluency, to evaluate text generation’s repetitive ness by measuring the weighted average of bi- and tri-gram entropies Zhang et al. (2018).

事实性知识也是Meng Kevin等人研究的重点,Meng等人(2022b)的工作专门针对自回归语言建模(如GPT风格模型)(Radford等人,2018)进行实验。该研究探讨了Transformer模型中事实关联的存储和直接操作。这些关联被建模为元组,表示为$t=(s,r,o)$,包含主体$s$、关系$r$和客体$o$,三者连接形成事实关系。研究重点在于理解这些元组如何存储在Transformer模型中,并探索促进其直接编辑的方法。

首先,他们使用因果中介分析(Pearl,2001)追踪GPT中隐藏状态激活的因果效应,如Vig等人(2020)先前所做的那样,以识别中介主体事实回忆的特定模块。分析表明,在处理主体名称最后一个token时,中间层的多层感知机(MLP)起决定性作用。

其次,他们通过引入一种新颖的直接模型编辑方法——Rank-One模型编辑方法(ROME)来验证先前的发现。该算法旨在通过改变决定前馈层行为的参数来产生单次(非连续)编辑。作者指出,尽管干预方式简单,ROME与其他模型编辑方法同样有效,能同时实现良好的泛化性和局部性,而先前方法往往需要牺牲其中一项。

此外,他们引入了一个新数据集Counter Fact(源自WikiData的真实事实集),用于评估语言模型中的反事实编辑。该数据集包含21,919条记录,涵盖多样化的主体、关系和语言变体。

最后,他们提出监测流畅度这一额外指标,通过测量双元和三元组熵的加权平均值(Zhang等人,2018)来评估文本生成的重复性。

ROME results are comparable with other knowledge editing meta-learning techniques. Nevertheless, the methodology is limited to work with single non-successive edits. To address this problem, the same authors modified the original algorithm to overcome this problem Meng et al. (2022a). The method, called MEMIT, is able to scale up to thousands of associations for GPT-J (6B) and GPT-NeoX (20B), exceeding prior work by an order of magnitude. As ROME, MEMIT works by first identifying the critical layers in the LLM that are responsible for storing the memories that need to be updated. Then, it uses a rank-one modification of the MLP weights of these layers to directly write the new memories into the model. This process is repeated for each memory that needs to be updated. The authors evaluated MEMIT on a variety of tasks, including factual question answering, natural language inference, and sum mari z ation. They found that MEMIT was able to significantly improve the performance of the LLM on these tasks, even when the number of memories that were updated was large. In addition to its s cal ability, MEMIT also has several other advantages over previous methods for updating LLMs. First, it is relatively efficient, requiring only a few minutes to update a large model with thousands of memories. Second, it is very accurate, with the updated models achieving near-perfect accuracy on the tasks that were evaluated. Third, MEMIT is general iz able, with the updated models able to perform well on a variety of tasks. Figure 2 compares ROME, MEMIT, and one of the meta-learning techniques described in section 4.2. It is possible to appreciate the improvement brought with MEMIT and the clear collapse of the MEND Hyper Network-based editor with large batches. Overall, MEMIT is a valuable tool for improving the performance of these models. However, it is important to point out that, even if there are works that try to expand the scope of the methodology Gupta et al. (2023), MEMIT is limited to modifying factual association and that there is not a clear path to scale it to different knowledge types. The same applies for PMET Li et al. (2024) which goes a step further by also simultaneously optimizing the hidden states of the Multi-Head Self-Attention (MHSA) component. This is based on the insight that MHSA encodes certain general knowledge extraction patterns that can be leveraged to enable more precise editing of the model. On the other hand, it is worth noticing that recent research suggests that casual tracing or other knowledge localization methodologies for identifying which parameters to update are surprisingly unreliable also for factual knowledge Hase et al. (2023). The paper argues that locating the source of a fact does not necessarily translate to effective editing, as modifying that specific location may not change the model’s output. This surprising finding indicates our understanding of how knowledge is stored in complex models is incomplete.

ROME 的效果与其他知识编辑元学习技术相当。不过,该方法仅限于处理单一非连续编辑。为解决这一问题,原作者改进了算法 Meng et al. (2022a) ,新方法 MEMIT 可支持 GPT-J (6B) 和 GPT-NeoX (20B) 数千条关联的编辑,规模超过先前工作一个数量级。与 ROME 类似,MEMIT 首先定位大语言模型中存储待更新记忆的关键层,然后通过对这些层的 MLP 权重进行秩一修正,直接将新记忆写入模型,该过程对每条待更新记忆重复执行。作者在事实问答、自然语言推理和摘要等任务上评估 MEMIT,发现即使更新大量记忆,模型性能仍能显著提升。除可扩展性外,MEMIT 相较先前方法还有三大优势:高效性(数千条记忆的模型更新仅需数分钟)、准确性(评估任务接近完美准确率)和泛化性(更新后模型在多任务表现良好)。图 2 对比了 ROME、MEMIT 和 4.2 节描述的元学习技术,可见 MEMIT 的改进效果,以及基于 MEND 超网络的编辑器在大批量编辑时的明显失效。尽管 MEMIT 是提升模型性能的有力工具,但需注意其目前仅适用于事实关联修改,且缺乏向其他知识类型扩展的明确路径 Gupta et al. (2023) 。类似局限也存在于 PMET Li et al. (2024) ,虽然该方法通过同时优化多头自注意力 (MHSA) 组件的隐藏状态更进一步——其理论基础是 MHSA 编码的通用知识提取模式可实现更精确的模型编辑。值得注意的是,最新研究表明因果追踪等定位待更新参数的方法对事实知识也出人意料地不可靠 Hase et al. (2023) ,因为定位事实存储位置未必能有效改变模型输出,这揭示我们对复杂模型中知识存储机制的认知仍不完善。

4.4 Architectural Strategies

4.4 架构策略

Architecture strategies represent a distinct family of methodologies that diverge from approaches that directly or indirectly modify the weights of the target model. Instead, these strategies concentrate on manipulating or augmenting the original architecture to patch and modify the network’s existing knowledge. This approach can be advantageous in scenarios where the set of original weights is inaccessible or when it is deemed unsafe to directly manipulate the main model. This indirect manipulation can be particularly useful when dealing with highly sensitive or proprietary models, where direct weight modification may be prohibited or could potentially compromise the model’s integrity. Additionally, architecture strategies can offer a more interpret able and controllable means of knowledge adaptation, as the introduced architectural changes can often be mapped to specific functional aspects of the model. However, it is always crucial to consider the trade-offs between the potential benefits of architectural manipulation and the computational overhead associated with it in terms of latency.

架构策略代表了一类独特的方法论,其区别于直接或间接修改目标模型权重的传统方案。这类策略专注于通过操纵或扩展原始架构来修补和调整网络的现有知识。当无法获取原始权重集或直接操作主模型存在安全隐患时,该方法具有显著优势。这种间接操作方式在处理高度敏感或专有模型时尤为实用,因为直接修改权重可能被禁止或损害模型完整性。此外,架构策略能提供更具可解释性和可控性的知识适应手段,因为引入的架构变更通常可映射到模型的特定功能维度。但必须始终权衡架构操作带来的潜在收益与由此产生的计算开销(如延迟)之间的平衡。

A first editor function of this type was proposed by Sinitsin et al. (2020), that taking inspiration from Conditional Neural Processes (CNP) Garnelo et al. (2018), they propose a specialized architecture that performs edits by adding a special condition vector to intermediate activation s. The vector is generated by an additional encoder layer and provides information to effectively steer the final prediction of the network. This approach is very similar to more generic memory-augmented models that introduce memory mechanisms to enhance neural networks Graves et al. (2014); Santoro et al. (2016). On the same line of experimentation, Mitchell et al. Mitchell et al. (2022b) proposed an approach, called SERAC (Semi-Parametric Editing with a Retrieval-Augmented Counter factual Model), which stores edits in an explicit memory and learns to reason over them to modulate the base model’s predictions as needed. This allows SERAC to be more expressive than other model editors, and to produce more accurate predictions for test inputs that are not closely related to the edit. That is achieved using three components: a memory, a classifier, and a co nter factual model. Users adds edit in the memory and the classifier decides whether the memory contains inputs relevant to processing them. If the classifier determines that a relevant edit example exists, the input and edit example are passed to the counter factual model that is responsible for making the prediction. They evaluate SERAC on numerous tasks, focusing exclusively on large language models with single and batch non-successive edits.

Sinitsin等人(2020)首次提出了这类编辑器功能,他们受条件神经过程(Conditional Neural Processes,CNP) Garnelo等人(2018)的启发,提出了一种通过向中间激活添加特殊条件向量来执行编辑的专用架构。该向量由额外的编码器层生成,并提供信息以有效引导网络的最终预测。这种方法与引入记忆机制来增强神经网络性能的通用记忆增强模型非常相似 Graves等人(2014);Santoro等人(2016)。在相同实验方向上,Mitchell等人(2022b)提出了一种名为SERAC(基于检索增强反事实模型的半参数化编辑)的方法,该方法将编辑存储在显式记忆中,并学习对其进行推理以根据需要调整基础模型的预测。这使得SERAC比其他模型编辑器更具表现力,并能对与编辑不密切相关的测试输入产生更准确的预测。该功能通过三个组件实现:记忆模块、分类器和反事实模型。用户将编辑添加到记忆模块中,分类器决定记忆是否包含与处理相关的输入。如果分类器确定存在相关编辑示例,则将输入和编辑示例传递给负责进行预测的反事实模型。他们在多项任务上评估了SERAC,重点关注支持单次和批量非连续编辑的大语言模型。

On the same line of SERAC, authors of Dong et al. (2022) presented a memory-based methodology called CaliNet. It is a method for calibrating specifically factual knowledge in pretrained language models (PLM) without re-training from scratch. CaliNet first detects whether a PLM has learned a fact correctly by comparing its scores for the correct fact and a set of distractor facts. If the PLM has not learned the fact correctly, CaliNet then uses a lightweight method to add and adapt new parameters to the PLM for that specific fact. CaliNet has two main components: first, a contrastive knowledge assessment (CKA) module that detects whether a PLM has learned a fact correctly. The CKA module works by comparing the PLM’s scores for the correct fact and a set of distractor facts. If the PLM assigns a higher score to the correct fact than to the distractor facts, then the CKA module concludes that the PLM has learned the fact correctly. Secondly, a factual knowledge calibration (FKC) module adds and adapts new parameters to the PLM for a specific fact. The FKC module works by first creating a new set of parameters for the PLM that are specific to the given fact. These new parameters are then initialized with the PLM’s original parameters for the fact. The FKC module then fine-tunes the new parameters on a dataset of factual examples that include the given fact. Using a custom dataset based on ParaRel set Elazar et al. (2021), CaliNet has shown to be effective at calibrating factual knowledge in PLMs. In experiments on the knowledge probing task, CaliNet was able to improve the factual accuracy of PLMs by up to $20%$

与SERAC思路相似,Dong等人(2022)提出了一种基于记忆的方法CaliNet。该方法专门用于校准预训练语言模型(PLM)中的事实知识,而无需从头开始重新训练。CaliNet首先通过比较PLM对正确事实和一组干扰事实的评分,检测PLM是否正确学习了某个事实。如果PLM未能正确学习该事实,CaliNet会使用轻量级方法为该特定事实添加并调整新参数。

CaliNet包含两个核心组件:第一是对比知识评估(CKA)模块,用于检测PLM是否正确学习事实。该模块通过比较PLM对正确事实与干扰事实的评分进行判断——若正确事实的评分高于干扰事实,则判定该事实已被正确掌握。第二是事实知识校准(FKC)模块,负责为特定事实添加和调整新参数。该模块首先创建专属于目标事实的新参数集,并用PLM原有参数初始化,随后在包含该事实的示例数据集上进行微调。

基于ParaRel数据集(Elazar等人,2021)构建的定制实验表明,CaliNet能有效校准PLM中的事实知识。在知识探测任务中,该方法将PLM的事实准确率最高提升了20%。

All previous works have tackled only non-successive edits, making only one edit or batch of edits at a time. However, as pointed out by Huang et al. (2023), the one-mistake-fixing scenario is not an accurate abstraction of the real-world challenge. Therefore, they extend the scenario into successive edits, introducing a novel model editor, TransformerPatcher (T-Patcher), that can shift the behavior of transformer-based models by simply adding and training a few neurons in the last Feed-Forward Network layer. Being at the end of the model, the new neurons have access to the output of all the previous layers. This allows the new neurons to learn how to correct mistakes that are made in earlier layers of the model. Overall, the proposed methodology allows for targeted shifts in the behavior of the model, akin to other fine-tuned oriented methodologies like LoRA Hu et al. (2022). Transformer-Patcher is able to successively correct up to thousands of errors while generalizing to their equivalent inputs and retaining the model’s accuracy on irrelevant inputs. This is in contrast to previous methods, which either fail to make a sequence of edits or to remember previous edits. The work evaluates Transformer-Patcher on both classification and generation tasks, showing that Transformer-Patcher can achieve state-of-the-art performance for single successive edits. However, despite the expansive scope of the methodology, their approach and implementation are highly architecture-specific, relying heavily also on large sources of data and unrelated inputs.

以往的研究都只针对非连续编辑,每次仅进行一次或批量编辑。然而,正如Huang等人(2023)指出的,单错误修正场景并不能准确抽象现实世界的挑战。因此,他们将场景扩展至连续编辑,提出了一种新型模型编辑器TransformerPatcher(T-Patcher),通过仅在最后一个前馈网络(Feed-Forward Network)层添加并训练少量神经元,即可改变基于Transformer模型的行为。由于位于模型末端,新增神经元能够获取之前所有层的输出,从而学习如何修正模型前几层产生的错误。总体而言,该方法能实现模型行为的定向调整,类似于LoRA(Hu等人,2022)等其他面向微调的方法。Transformer-Patcher可连续修正多达数千个错误,同时泛化至等效输入并保持模型在无关输入上的准确性。这与先前方法形成鲜明对比——后者要么无法实现连续编辑,要么无法记住先前编辑。该研究在分类和生成任务上评估了Transformer-Patcher,表明其在单次连续编辑中能达到最先进性能。但尽管方法适用范围广泛,其实现高度依赖特定架构,同时也需要大量数据源和无关输入的支持。

To this end, Hartvigsen, Thomas, et al. Hartvigsen et al. (2022), proposes GRACE, a methodology that requires only samples of the edits and that can perform successive edits, ensuring minimal impact on unrelated inputs. GRACE works by caching a chosen layer’s activation s in an adaptive codebook as edits stream in. When a new edit is received, GRACE retrieves the corresponding activation s from the codebook and uses them to update the model’s predictions. This allows GRACE to make targeted edits to the model’s behavior without affecting its overall performance. The authors evaluated GRACE on a variety of tasks, including text classification, natural language inference, and question answering. They showed that GRACE was able to significantly improve the performance of LLMs on streaming errors, while minimally affecting their performance on unrelated inputs. Finally, the authors do not explicitly mention the impact of GRACE on latency. However, it is possible that GRACE could have a small impact on latency, as it requires caching activation s in an adaptive codebook.

为此,Hartvigsen、Thomas等人(Hartvigsen et al., 2022)提出了GRACE方法,该方法仅需编辑样本即可执行连续编辑,并确保对无关输入的影响最小。GRACE的工作原理是在编辑流输入时,将选定层的激活值(s)缓存到自适应码本中。当收到新编辑时,GRACE会从码本中检索相应的激活值(s),并用它们更新模型的预测结果。这使得GRACE能够有针对性地调整模型行为,而不影响其整体性能。作者在文本分类、自然语言推理和问答等多种任务上评估了GRACE,结果表明GRACE能显著提升大语言模型在流式错误上的表现,同时对无关输入的性能影响极小。最后,作者未明确提及GRACE对延迟的影响,但由于需要将激活值(s)缓存到自适应码本中,GRACE可能会对延迟产生轻微影响。

5 Conclusion and Future Directions

5 结论与未来方向

In this survey, we organize recent progress of the increasingly growing field of knowledge editing. We began formalizing it under a common definition that seamlessly connects the slightly different facets of each work presented in the literature so far. Indeed, being a very recent branch of research, each author tried to differently bend definitions to better accommodate their methodologies. So, following the first definitions given by Sinitsin et al. (2020) and the more recent adaptation by Huang et al. (2023), we formally define knowledge editing as the task of modifying the knowledge of a target model in a non-sequential or sequential manner, without significantly affecting its original performance. This makes knowledge editing closely related to the much more well-known branch of continuous learning, or lifelong learning, as well as the emerging fields of machine unlearning Bourtoule et al. (2021); Nguyen et al. (2022); Si et al. (2023) and parameter-efficient fine-tuning Fu et al. (2023); Hu et al. (2022); Han et al. (2024); Yu et al. (2024), including most notable variants like model editing via task arithmetic Ilharco et al. (2022); Hendel et al. (2023). In the following section, we briefly discuss the intersections and connections between knowledge editing and related disciplines.

在本综述中,我们系统梳理了知识编辑(knowledge editing)这一快速发展的领域最新进展。通过建立统一的形式化定义,将文献中略有差异的研究视角有机衔接。作为新兴研究方向,早期研究者往往根据自身方法论调整定义边界。因此,我们综合Sinitsin等人(2020)的初始定义与Huang等人(2023)的最新阐释,将知识编辑正式定义为:以非连续或连续方式修改目标模型知识,同时保持其原始性能不受显著影响的任务。这使得知识编辑与持续学习(终身学习)、新兴的机器遗忘[20][21][22]及参数高效微调[23][24][25][26]领域密切相关,特别包括通过任务算术进行模型编辑等典型方法[27][28]。下一节将简要探讨知识编辑与相关学科的交叉关联。

5.1 Knowledge editing and related fields of research

5.1 知识编辑及相关研究领域

Machine unlearning researches focus on removing specific data samples or knowledge from a pre-trained model without retraining the entire model from scratch. This task is gaining relevance due to its applications in data privacy Chen et al. (2021), security Cao and Yang (2015), and compliance with regulations such as GDPR Sai et al. (2023). However, while knowledge editing primarily focuses on modifying or adding knowledge to a model, machine unlearning can be seen as a complementary task that aims to selectively remove knowledge from a model. Both tasks share the common goal of avoiding complete retraining while maintaining the model’s predictive capability, but tracking different metrics and evaluation benchmarks.

机器遗忘研究专注于从预训练模型中移除特定数据样本或知识,而无需从头开始重新训练整个模型。由于其在数据隐私 Chen等人 (2021)、安全 Cao和Yang (2015) 以及GDPR等法规合规性 Sai等人 (2023) 方面的应用,该任务正变得越来越重要。然而,知识编辑主要关注修改或向模型添加知识,而机器遗忘可被视为一种互补任务,旨在选择性地从模型中移除知识。这两个任务的共同目标是避免完全重新训练,同时保持模型的预测能力,但追踪的指标和评估基准有所不同。

Knowledge editing and machine unlearning can be viewed as specialized cases of continual learning Mundt et al. (2023), focusing on applying targeted, often non-uniformly distributed edits to a network’s existing knowledge Henn et al. (2021). In contrast, continual learning encompasses a broader scope, seeking general methodologies to increment ally expand a network’s knowledge base, enabling it to perform an increasing array of tasks and skills over time. The key distinction lies in their objectives: knowledge editing aims for precise modifications to specific knowledge elements, while continual learning emphasizes the gradual accumulation of knowledge and skills without compromising previously learned information. Despite these differences, both fields share significant challenges, particularly in modifying neural networks without disrupting existing capabilities. Both contend with issues such as catastrophic forgetting Ratcliff (1990), where new learning can overwrite previously acquired knowledge. However, the approaches to addressing these challenges often diverge. Knowledge editing techniques typically employ localized, precise modifications, while continual learning methods often utilize more global strategies to balance retention of old knowledge with acquisition of new information. The intersection of these fields is particularly evident in scenarios where techniques from one inspire developments in the other. For example, some knowledge editing approaches adapt continual learning strategies for mitigating catastrophic forgetting (Section 4.1), tailoring them for more targeted edits. Conversely, insights from knowledge editing about precise neural network modifications can inform the development of more fine-grained continual learning algorithms.

知识编辑 (knowledge editing) 和机器遗忘 (machine unlearning) 可视为持续学习 (continual learning) 的特例 [Mundt et al., 2023],专注于对网络现有知识进行针对性、通常非均匀分布的修改 [Henn et al., 2021]。相比之下,持续学习的范畴更广,旨在寻找逐步扩展网络知识库的通用方法,使其能够随时间掌握更多任务和技能。二者的核心区别在于目标:知识编辑追求对特定知识元素的精确修改,而持续学习强调在不损害已学信息的前提下逐步积累知识与技能。

尽管存在差异,这两个领域都面临着重大挑战,特别是在修改神经网络时不破坏现有能力方面。它们都需要应对灾难性遗忘 (catastrophic forgetting) [Ratcliff, 1990] 等问题——即新知识可能覆盖先前习得的知识。然而,解决这些挑战的方法往往不同:知识编辑技术通常采用局部精确的修改,而持续学习方法则多使用全局策略来平衡新旧知识的保留与获取。

这两个领域的交叉点尤为明显,其中一方的技术常会启发另一方的发展。例如,某些知识编辑方法会采用持续学习策略来缓解灾难性遗忘(第4.1节),并将其调整为更具针对性的修改方式。反之,知识编辑关于神经网络精确修改的见解,也能促进更细粒度持续学习算法的开发。

On the other hand, parameter-efficient fine-tuning (PET) techniques are another related field that involves modifying pre-trained models to create new models with desired capabilities. PET aims to enable efficient adaptation by updating only a minimal subset of model parameters, rather than fine-tuning all parameters. Notable PET techniques include addition-based methods like adapters Houlsby et al. (2019), specification-based methods like LoRA Hu et al. (2022), and re parameter iz ation-based methods. A interesting variant of PET is task arithmetic Ilharco et al. (2022), which involves operations such as model addition, subtraction, multiplication, and permutation to modify or reshape the knowledge encoded in pre-trained models. By manipulating entire model representations in this arithmetic fashion, task arithmetic provides a way to create tailored models for specific tasks or domains without retraining from scratch, similar to knowledge editing techniques. However, task arithmetic and PET more in general are typically applied to enhance task performance rather than edit knowledge specifically. The efficacy of existing PET methods for knowledge editing remains largely unexplored. Indeed, while task arithmetic and other PET techniques hold promise for efficient model adaptation, they differ from knowledge editing in their primary focus on improving downstream task metrics rather than directly modifying a model’s acquired knowledge. Investigating how to leverage PET methods for precise and targeted knowledge updates presents an interesting direction for future work in the knowledge editing space.

另一方面,参数高效微调 (parameter-efficient fine-tuning, PET) 是另一个相关领域,它涉及修改预训练模型以创建具备所需能力的新模型。PET 旨在通过仅更新最小规模的模型参数子集(而非微调所有参数)来实现高效适配。值得注意的 PET 技术包括基于添加的方法(如适配器 [Houlsby et al. (2019)])、基于规范的方法(如 LoRA [Hu et al. (2022)])以及基于重参数化的方法。PET 的一个有趣变体是任务算术 [Ilharco et al. (2022)],它通过模型加法、减法、乘法和置换等操作来修改或重塑预训练模型中编码的知识。通过这种算术方式操控整个模型表征,任务算术提供了一种为特定任务或领域定制模型的方法(无需从头训练),这与知识编辑技术类似。然而任务算术及更广义的 PET 通常用于提升任务性能,而非专门编辑知识。现有 PET 方法在知识编辑方面的有效性仍有待探索。事实上,尽管任务算术和其他 PET 技术有望实现高效模型适配,但它们与知识编辑的核心区别在于:前者主要关注改进下游任务指标,而非直接修改模型已习得的知识。如何利用 PET 方法实现精准定向的知识更新,是知识编辑领域未来工作的一个有趣方向。

5.2 Future Directions and Risks

5.2 未来方向与风险

Building on the more formal definition, we presented a distilled summary of the most relevant works in the literature at the time of writing. We proposed to categorize these works into four families: regular iz ation techniques, metalearning, direct model editing, and architectural strategies. We discussed each family in turn, highlighting their intrinsic characteristics and limitations. We also summarized the most frequent field of application, tasks, and datasets that have been tackled in each family, for quick reference. We did not specifically deep dive into the works where knowledge editing could emerge as an additional benefit of the proposed methodologies Hewitt et al. (2023) but it is worth noting that future expansions of similar survey research could encompass these aspects for a more comprehensive analysis.

基于更正式的定义,我们对撰写时文献中最相关的工作进行了提炼总结。我们建议将这些工作分为四类:正则化 (regularization) 技术、元学习 (metalearning)、直接模型编辑和架构策略。我们依次讨论了每一类方法,重点分析了其内在特性和局限性。同时汇总了各类方法最常应用的领域、任务和数据集,以便快速查阅。我们未专门深入探讨那些可能将知识编辑作为方法论附加优势的研究 [如 Hewitt et al. (2023)],但值得注意的是,未来类似的综述研究可纳入这些方面以实现更全面的分析。

Overall, we have presented a rapidly expanding field of research driven by the current trend of foundational models Zhou et al. (2023). The advancements in this area have led to a significant increase in the development of tools and methodologies to effectively harness the intrinsic knowledge of these models. As we move forward, knowledge editing is poised to become a critical factor in leveraging the power of these models for various industrial applications. However, several key challenges and future directions remain to be addressed:

总体而言,我们展示了一个由基础模型 (foundational models) 当前趋势驱动而快速扩展的研究领域 [20]。该领域的进步显著推动了工具和方法的发展,以有效利用这些模型的内在知识。随着研究的深入,知识编辑有望成为将这些模型能力应用于各类工业场景的关键因素。然而,仍有若干关键挑战和未来方向亟待解决:

• Improving the efficiency and s cal ability of knowledge editing techniques to handle large-scale foundational models with constrained computational resources. This challenge is particularly pressing as models continue to grow in size and complexity Dubey et al. (2024). Future research could explore techniques such as sparse editing or hierarchical knowledge representations to enable more efficient updates.

• 提升知识编辑技术的效率和可扩展性,以便在有限计算资源下处理大规模基础模型。随着模型规模和复杂度的持续增长,这一挑战尤为紧迫 [20]。未来研究可探索稀疏编辑或分层知识表征等技术,以实现更高效的更新。

Finally, it is vital to acknowledge that the power of knowledge editing also brings inherent risks that must not be overlooked. While editing models can correct their behavior and improve their utility, it can also be exploited for harmful purposes. In particular, sophisticated editing algorithms may enable malicious actors to deliberately incorporate backdoors, vulnerabilities, hidden behaviors, or harmful tendencies into the models. This concern becomes even more critical for methodologies that edit weights without providing sufficient interpret ability of the applied changes. This dual use is a common risk for many machine learning technologies, and only proactive efforts to develop robust security measures and ethical guidelines can help to mitigate these potential risks. Future research should focus on developing "edit verification" techniques that can detect and prevent malicious edits, as well as establishing standardized protocols for auditing and certifying edited models.

最后,必须认识到知识编辑的力量也带来了不容忽视的内在风险。虽然编辑模型可以纠正其行为并提升实用性,但它也可能被用于有害目的。特别是复杂的编辑算法可能让恶意行为者故意在模型中植入后门、漏洞、隐藏行为或有害倾向。对于那些不提供足够可解释性就直接修改权重的方法,这一风险尤为严峻。这种双刃剑效应是许多机器学习技术共有的风险,只有通过主动制定强健的安全措施和伦理准则,才能帮助缓解这些潜在威胁。未来研究应聚焦于开发能检测和预防恶意编辑的"编辑验证"技术,并建立审核认证已编辑模型的标准协议。

In conclusion, knowledge editing in AI has emerged as a critical field, offering transformative potential for enhancing AI capabilities while simultaneously raising significant challenges and ethical considerations. As researchers and practitioners, we must strive to balance the pursuit of technological advancements with a strong awareness of their broader societal impacts, ensuring that the evolution of knowledge editing techniques contributes positively to the development of safe, reliable, and beneficial AI systems.

总之,AI知识编辑已成为一个关键领域,在提升AI能力方面展现出变革性潜力,同时也带来了重大挑战和伦理考量。作为研究者和实践者,我们必须努力在追求技术进步与深刻认知其广泛社会影响之间取得平衡,确保知识编辑技术的发展能为构建安全、可靠且有益的AI系统作出积极贡献。

References

参考文献

阅读全文(20积分)