[论文翻译]可编辑神经网络


原文地址:https://arxiv.org/pdf/2004.00345


EDITABLE NEURAL NETWORKS

可编辑神经网络

Anton Sinitsin1∗ ant.sinitsin@gmail.com

Anton Sinitsin1∗ ant.sinitsin@gmail.com

Vsevolod Plo k hot n yuk 2∗ vsevolod-pl@yandex.ru

Vsevolod Plokhotnyuk 2∗ vsevolod-pl@yandex.ru

Dmitriy Pyrkin2∗ alagaster@yandex.ru

Dmitriy Pyrkin2∗ alagaster@yandex.ru

Sergei Popov1,2 pop ov sergey 95 $@$ gmail.com

Sergei Popov1,2 popovsergey95 $@$ gmail.com

Artem Babenko1,2 artem.babenko $@$ phystech.edu

Artem Babenko1,2 artem.babenko$@$phystech.edu

ABSTRACT

摘要

These days deep neural networks are ubiquitously used in a wide range of tasks, from image classification and machine translation to face identification and selfdriving cars. In many applications, a single model error can lead to devastating financial, reputation al and even life-threatening consequences. Therefore, it is crucially important to correct model mistakes quickly as they appear. In this work, we investigate the problem of neural network editing — how one can efficiently patch a mistake of the model on a particular sample, without influencing the model behavior on other samples. Namely, we propose Editable Training, a model-agnostic training technique that encourages fast editing of the trained model. We empirically demonstrate the effectiveness of this method on large-scale image classification and machine translation tasks.

如今,深度神经网络广泛应用于各类任务中,从图像分类、机器翻译到人脸识别和自动驾驶汽车。在许多应用中,单个模型错误可能导致严重的财务损失、声誉损害甚至危及生命的后果。因此,在错误出现时快速修正模型至关重要。本文研究了神经网络编辑问题——如何在特定样本上高效修补模型错误,同时不影响模型在其他样本上的表现。具体而言,我们提出了可编辑训练(Editable Training),这是一种与模型无关的训练技术,旨在促进训练模型的快速编辑。我们通过大规模图像分类和机器翻译任务,实证验证了该方法的有效性。

1 INTRODUCTION

1 引言

Deep neural networks match and often surpass human performance on a wide range of tasks including visual recognition (Krizhevsky et al. (2012); D. C. Ciresan (2011)), machine translation (Hassan et al. (2018)) and others (Silver et al. (2016)). However, just like humans, artificial neural networks sometimes make mistakes. As we trust them with more and more important decisions, the cost of such mistakes grows ever higher. A single mis classified image can be negligible in academic research but can be fatal for a pedestrian in front of a self-driving vehicle. A poor automatic translation for a single sentence can get a person arrested (Hern (2018)) or ruin company’s reputation.

深度神经网络在包括视觉识别 (Krizhevsky等人 (2012); D. C. Ciresan (2011))、机器翻译 (Hassan等人 (2018)) 等众多任务中 (Silver等人 (2016)) 达到甚至超越人类水平。然而与人脑类似,人工神经网络同样会犯错。随着我们将越来越重要的决策权交给它们,这类错误的代价正变得愈发高昂。学术研究中单张图像的误分类可能无足轻重,但对自动驾驶车辆前的行人而言却足以致命。一句糟糕的自动翻译可能让人身陷囹圄 (Hern (2018)) 或令企业声誉扫地。

Since mistakes are inevitable, deep learning practitioners should be able to adjust model behavior by correcting errors as they appear. However, this is often difficult due to the nature of deep neural networks. In most network architectures, a prediction for a single input depends on all model parameters. Therefore, updating a neural network to change its predictions on a single input can decrease performance across other inputs.

由于错误不可避免,深度学习从业者应能通过纠正出现的错误来调整模型行为。然而,由于深度神经网络的性质,这往往很困难。在大多数网络架构中,单个输入的预测依赖于所有模型参数。因此,更新神经网络以改变其对单个输入的预测可能会降低其他输入的性能。

Currently, there are two workaround s commonly used by practitioners. First, one can re-train the model on the original dataset augmented with samples that account for the mistake. However, this is computationally expensive as it requires to perform the training from scratch. Another solution is to use a manual cache (e.g. lookup table) that overrules model predictions on problematic samples. While being simple, this approach is not robust to minor changes in the input. For instance, it will not generalize to a different viewpoint of the same object or paraphrasing in natural language processing tasks.

目前,从业者通常采用两种变通方案。首先,可以在原始数据集上加入修正错误的样本后重新训练模型。但这种方法计算成本高昂,因为需要从头开始训练。另一种解决方案是使用手动缓存(如查找表)来覆盖模型对问题样本的预测。虽然简单,但这种方法对输入的微小变化不具备鲁棒性。例如,它无法泛化到同一物体的不同视角,或自然语言处理任务中的文本改写情况。

In this work, we present an alternative approach that we call Editable Training. This approach involves training neural networks in such a way that the trained parameters can be easily edited afterwards. Editable Training employs modern meta-learning techniques (Finn et al. (2017)) to ensure that model’s mistakes can be corrected without harming its overall performance. With thorough experimental evaluation, we demonstrate that our method works on both small academical datasets and industry-scale machine learning tasks. We summarize the contributions of this study as follows:

在这项工作中,我们提出了一种称为可编辑训练 (Editable Training) 的替代方法。该方法通过特定方式训练神经网络,使得训练后的参数能够被轻松修改。可编辑训练采用现代元学习技术 (Finn et al. (2017)) 来确保模型的错误可以被修正,同时不影响其整体性能。通过全面的实验评估,我们证明了该方法在小型学术数据集和工业级机器学习任务中均有效。本研究的主要贡献总结如下:

2 RELATED WORK

2 相关工作

In this section, we aim to position our approach with respect to existing literature. Namely, we explain the connections of Editable Neural Networks with ideas from prior works.

在本节中,我们将阐明本方法与现有文献的关联定位,具体解释可编辑神经网络 (Editable Neural Networks) 与先前研究思想的联系。

Meta-learning is a family of methods that aim to produce learning algorithms, appropriate for a particular machine learning setup. These methods were shown to be extremely successful in a large number of problems, such as few-shot learning (Finn et al. (2017); Nichol et al. (2018)), learnable optimization (An dry ch owicz et al. (2016)) and reinforcement learning (Houthooft et al. (2018)). Indeed, Editable Neural Networks also belong to the meta-learning paradigm, as they basically ”learn to allow effective patching”. While neural network correction has significant practical importance, we are not aware of published meta-learning works, addressing this problem.

元学习 (Meta-learning) 是一类旨在生成适用于特定机器学习场景的学习算法的方法。这些方法已被证明在少样本学习 (Finn et al. (2017); Nichol et al. (2018))、可学习优化 (Andrychowicz et al. (2016)) 和强化学习 (Houthooft et al. (2018)) 等大量问题上取得了巨大成功。事实上,可编辑神经网络 (Editable Neural Networks) 也属于元学习范式,因为它们本质上是在"学习如何实现有效修补"。尽管神经网络修正具有重要的实践意义,但我们尚未发现已发表的元学习工作专门解决这一问题。

Catastrophic forgetting is a well-known phenomenon arising in the problem of lifelong/continual learning (Ratcliff (1990)). For a sequence of learning tasks, it turns out that after deep neural networks learn on newer tasks, their performance on older tasks deteriorates. Several lines of research address overcoming catastrophic forgetting. The methods based on Elastic Weight Consolidation (Kirkpatrick et al. (2016)) update model parameters based on their importance to the previous learning tasks. The rehearsal-based methods (Robins (1995)) occasionally repeat learning on samples from earlier tasks to ”remind” the model about old data. Finally, a line of work (Garnelo et al. (2018); Lin et al. (2019)) develops specific neural network architectures that reduce the effect of catastrophic forgetting. The problem of efficient neural network patching differs from continual learning, as our setup is not sequential in nature. However, correction of model for mislabeled samples must not affect its behavior on other samples, which is close to overcoming catastrophic forgetting task.

灾难性遗忘 (catastrophic forgetting) 是终身学习/持续学习 (lifelong/continual learning) 领域中的一个经典问题 (Ratcliff (1990))。当深度神经网络在一系列学习任务中学习新任务时,其在旧任务上的性能会出现显著下降。目前主要有三类方法应对该问题:基于弹性权重固化 (Elastic Weight Consolidation) 的方法 (Kirkpatrick et al. (2016)) 会根据参数对先前任务的重要性来调整模型参数;基于回放 (rehearsal) 的方法 (Robins (1995)) 会定期使用旧任务样本进行复习以唤醒模型记忆;另一类研究 (Garnelo et al. (2018); Lin et al. (2019)) 则致力于设计能减轻灾难性遗忘的专用神经网络架构。高效的神经网络修补 (patching) 问题与持续学习存在本质区别——我们的设定并不具有时序性。但模型在修正错误标注样本时,必须确保不影响其他样本的表现,这一要求与克服灾难性遗忘的任务高度相关。

Adversarial training. The proposed Editable Training also bears some resemblance to the adversarial training (Goodfellow et al. (2015)), which is the dominant approach of adversarial attack defense. The important difference here is that Editable Training aims to learn models, whose behavior on some samples can be efficiently corrected. Meanwhile, adversarial training produces models, which are robust to certain input perturbations. However, in practice one can use Editable Training to efficiently cover model vulnerabilities against both synthetic (Szegedy et al. (2013); Yuan et al. (2017); Ebrahimi et al. (2017); Wallace et al. (2019)) and natural (Hendrycks et al. (2019)) adversarial examples.

对抗训练。提出的可编辑训练 (Editable Training) 也与对抗训练 (Goodfellow et al. (2015)) 有相似之处,后者是对抗攻击防御的主要方法。重要区别在于,可编辑训练旨在学习模型,使其在某些样本上的行为能被高效修正;而对抗训练产生的模型则对特定输入扰动具有鲁棒性。不过在实践中,可编辑训练可高效覆盖模型对合成对抗样本 (Szegedy et al. (2013); Yuan et al. (2017); Ebrahimi et al. (2017); Wallace et al. (2019)) 和自然对抗样本 (Hendrycks et al. (2019)) 的漏洞。

3 EDITING NEURAL NETWORKS

3 编辑神经网络

In order to measure and optimize the model’s ability for editing, we first formally define the operation of editing a neural network. Let $f(x,\theta)$ be a neural network, with $x$ denoting its input and $\theta$ being a set of network parameters. The parameters $\theta$ are learned by minimizing a task-specific objective function $\mathcal{L}_{b a s e}(\theta)$ , e.g. cross-entropy for multi-class classification problems.

为了衡量和优化模型的编辑能力,我们首先正式定义神经网络编辑操作。设 $f(x,\theta)$ 为一个神经网络,其中 $x$ 表示输入,$\theta$ 为网络参数集。参数 $\theta$ 通过最小化任务特定的目标函数 $\mathcal{L}_{b a s e}(\theta)$ 习得(例如多分类问题的交叉熵损失)。

Then, if we discover mistakes in the model’s behavior, we can patch the model by changing its parameters $\theta$ . Here we aim to change model’s predictions on a subset of inputs, corresponding to mis classified objects, without affecting other inputs. We formalize this goal using the editor function: $\scriptstyle{\hat{\theta}}=E d i t(\theta,l_{e})$ . Informally, this is a function that adjusts $\theta$ to satisfy a given constraint $l_{e}(\hat{\theta})\leq0$ , whose role is to enforce desired changes in the model’s behavior.

然后,如果我们发现模型行为存在错误,可以通过调整其参数 $\theta$ 来修补模型。我们的目标是在不干扰其他输入的情况下,修正模型对误分类对象对应输入子集的预测行为。这一目标通过编辑器函数形式化表示为: $\scriptstyle{\hat{\theta}}=Edit(\theta,l_{e})$ 。简而言之,该函数通过调整 $\theta$ 以满足给定约束条件 $l_{e}(\hat{\theta})\leq0$ ,从而实现模型行为的定向修正。

For instance, in the case of multi-class classification, $l_{e}$ can guarantee that the model assigns input $x$ to the desired label $y_{r e f}$ : $\begin{array}{r}{l_{e}(\hat{\theta})=\operatorname*{max}_ {y_{i}}\log p(y_{i}|x,\hat{\theta})-\log p(y_{r e f}|x,\hat{\theta})}\end{array}$ . Under such definition of $l_{e}$ , the constraint $l_{e}(\hat{\theta})\leq0$ is satisfied iff arg $\mathrm{nax}_ {y_{i}}\log p(y_{i}|x,\hat{\theta})=y_{r e f}$ .

例如,在多类别分类的情况下,$l_{e}$可以保证模型将输入$x$分配到期望的标签$y_{r e f}$:$\begin{array}{r}{l_{e}(\hat{\theta})=\operatorname*{max}_ {y_{i}}\log p(y_{i}|x,\hat{\theta})-\log p(y_{r e f}|x,\hat{\theta})}\end{array}$。在$l_{e}$的这种定义下,约束条件$l_{e}(\hat{\theta})\leq0$当且仅当arg $\mathrm{nax}_ {y_{i}}\log p(y_{i}|x,\hat{\theta})=y_{r e f}$时成立。

To be practically feasible, the editor function must meet three natural requirements:

为了实际可行,编辑器功能必须满足三个基本要求:

Intuitively, the editor locality aims to minimize changes in model’s predictions for inputs unrelated to $l_{e}$ . For classification problem, this requirement can be formalized as minimizing the difference between model’s predictions over the ”control” set $X_{c}\colon_{\stackrel{E}{x\in X_{c}}}\text{#}[f(x,\hat{\theta})\neq f(x,\theta)]\rightarrow\operatorname*{min}$ .

直观上,编辑器局部性旨在最小化模型对与$l_{e}$无关输入的预测变化。对于分类问题,该要求可形式化为最小化模型在"控制"集$X_{c}\colon_{\stackrel{E}{x\in X_{c}}}\text{#}[f(x,\hat{\theta})\neq f(x,\theta)]\rightarrow\operatorname*{min}$上的预测差异。

3.1 GRADIENT DESCENT EDITOR

3.1 梯度下降编辑器

A natural way to implement $E d i t(\theta,l_{e})$ for deep neural networks is using gradient descent. Parameters $\theta$ are shifted against the gradient direction $-\alpha\nabla_{\theta}l_{e}(\theta)$ for several iterations until the constraint $l_{e}(\theta)\leq0$ is satisfied. We formulate the SGD editor with up to $k$ steps and learning rate $\alpha$ as:

为深度神经网络实现 $Edit(\theta,l_{e})$ 的自然方法是使用梯度下降。参数 $\theta$ 沿梯度方向 $-\alpha\nabla_{\theta}l_{e}(\theta)$ 进行多次迭代调整,直到约束条件 $l_{e}(\theta)\leq0$ 得到满足。我们将最多进行 $k$ 步、学习率为 $\alpha$ 的SGD编辑器表述为:

$$
\mathrm{Edit}_ {\alpha}^{k}(\theta, l_{e}, k) =\begin{cases}\theta, & \text{if } l_{e}(\theta) \leq 0 \text{ or } k = 0 \quad
\mathrm{Edit}_ {\alpha}^{k-1}(\theta - \alpha \cdot \nabla_{\theta} l_{e}(\theta), l_{e}, k-1), & \text{otherwise}\end{cases}.
$$

$$
\mathrm{Edit}_ {\alpha}^{k}(\theta, l_{e}, k) =\begin{cases}\theta, & \text{if } l_{e}(\theta) \leq 0 \text{ or } k = 0 \quad
\mathrm{Edit}_ {\alpha}^{k-1}(\theta - \alpha \cdot \nabla_{\theta} l_{e}(\theta), l_{e}, k-1), & \text{otherwise}\end{cases}.
$$

The standard gradient descent editor can be further augmented with momentum, adaptive learning rates (Duchi et al. (2010); Zeiler (2012)) and other popular deep learning tricks (Kingma & Ba (2014); Smith & Topin (2017)). One technique that we found practically useful is Resilient Back propagation: RProp, SignSGD by Bernstein et al. (2018) or RMSProp by Tieleman & Hinton (2012). We observed that these methods produce more robust weight updates that improve locality.

标准梯度下降编辑器可以通过动量、自适应学习率 (Duchi et al. (2010); Zeiler (2012)) 以及其他流行的深度学习技巧 (Kingma & Ba (2014); Smith & Topin (2017)) 进一步增强。我们发现一种实际有效的技术是弹性反向传播:Bernstein等人 (2018) 提出的RProp、SignSGD,或Tieleman & Hinton (2012) 提出的RMSProp。我们观察到这些方法能产生更稳健的权重更新,从而改善局部性。

3.2 EDITABLE TRAINING

3.2 可编辑训练

The core idea behind Editable Training is to enforce the model parameters $\theta$ to be ”prepared” for the editor function. More formally, we want to learn such parameters $\theta$ , that the editor $E d i t(\theta,l_{e})$ is reliable, local and efficient, as defined in above.

可编辑训练 (Editable Training) 的核心思想是强制模型参数 $\theta$ 为编辑函数做好"准备"。更正式地说,我们希望学习这样的参数 $\theta$,使得编辑器 $Edit(\theta,l_{e})$ 能够满足上文定义的可靠性、局部性和高效性。

Our training procedure employs the fact that Gradient Descent Editor (1) is differentiable w.r.t. $\theta$ . This well-known observation (Finn et al. (2017)) allows us to optimize through the editor function directly via back propagation (see Figure 1).

我们的训练过程利用了梯度下降编辑器 (Gradient Descent Editor) (1) 对 $\theta$ 可微的特性。这一众所周知的观点 (Finn et al. (2017)) 使我们能够通过反向传播直接优化编辑器函数 (参见图 1)。


Figure 1: A high-level scheme of editable training: (left) forward pass, (right) backward pass.

图 1: 可编辑训练的高级方案:(左) 前向传播,(右) 反向传播。

Editable Training is performed on mini batches of constraints $l_{e}\sim p(l_{e})$ (e.g. images and target labels). First, we compute the edited parameters $\hat{\theta}=E d i t(\theta,l_{e})$ by applying up to $k$ steps of gradient descent (1). Second, we compute the objective that measures locality and efficiency of the editor function:

训练是在小批量约束条件 $l_{e}\sim p(l_{e})$ (例如图像和目标标签)上进行的。首先,我们通过应用最多 $k$ 步梯度下降 (1) 来计算编辑后的参数 $\hat{\theta}=E d i t(\theta,l_{e})$。其次,我们计算衡量编辑器函数局部性和效率的目标函数:

$$
\begin{array}{r l}&{O b j(\theta,l_{e})=\mathcal{L}_ {b a s e}(\theta)+c_{e d i t}\cdot\mathcal{L}_ {e d i t}(\theta)+c_{l o c}\cdot\mathcal{L}_ {l o c}(\theta)}\ &{\qquad\mathcal{L}_ {e d i t}(\theta)=m a x(0,l_{e}(E d i t_{\alpha}^{k}(\theta,l_{e}))}\ &{\mathcal{L}_ {l o c}(\theta)=\underset{x\sim p(x)}{E}D_{K L}(p(y|x,\theta)||p(y|x,E d i t_{\alpha}^{k}(\theta,l_{e})))}\end{array}
$$

$$
\begin{array}{r l}&{O b j(\theta,l_{e})=\mathcal{L}_ {b a s e}(\theta)+c_{e d i t}\cdot\mathcal{L}_ {e d i t}(\theta)+c_{l o c}\cdot\mathcal{L}_ {l o c}(\theta)}\ &{\qquad\mathcal{L}_ {e d i t}(\theta)=m a x(0,l_{e}(E d i t_{\alpha}^{k}(\theta,l_{e}))}\ &{\mathcal{L}_ {l o c}(\theta)=\underset{x\sim p(x)}{E}D_{K L}(p(y|x,\theta)||p(y|x,E d i t_{\alpha}^{k}(\theta,l_{e})))}\end{array}
$$

Intuitively, $\mathcal{L}_ {e d i t}(\theta)$ encourages reliability and efficiency of the editing procedure by making sure the constraint is satisfied in under $k$ gradient steps. The final term $\mathcal{L}_{l o c}(\theta)$ is responsible for locality by minimizing the KL divergence between the predictions of original and edited models.

直观上,$\mathcal{L}_ {e d i t}(\theta)$ 通过确保在 $k$ 次梯度步骤内满足约束条件,来提升编辑过程的可靠性和效率。最后一项 $\mathcal{L}_{l o c}(\theta)$ 则通过最小化原始模型与编辑后模型预测之间的KL散度来保证局部性。

We use hyper parameters $c_{e d i t}$ , $c_{l o c}$ to balance between the original task-specific objective, editor efficiency and locality. Setting both of them to large positive values would cause the model to sacrifice some of its performance for a better edit. On the other hand, sufficiently small $c_{e d i t},c_{l o c}$ will not cause any deterioration of the main training objective while still improving the editor function in all our experiments (see Section 4). We attribute this to the fact that neural networks are typically over parameterized. Most neural networks can accommodate the edit-related properties and still have enough capacity to optimize $O b j(\theta,l_{e})$ . The learning step $\alpha$ and other optimizer parameters (e.g. $\beta$ for RMSProp) are trainable parameters of Editable Training and we optimize them explicitly via gradient descent.

我们使用超参数 $c_{e d i t}$ 和 $c_{l o c}$ 来平衡原始任务目标、编辑效率与局部性。将两者设为较大的正值会导致模型牺牲部分性能以获得更好的编辑效果。另一方面,足够小的 $c_{e d i t},c_{l o c}$ 在所有实验中都不会影响主训练目标的优化,同时仍能提升编辑功能(见第4节)。我们认为这是由于神经网络通常存在过参数化现象——大多数网络在容纳编辑相关属性的同时,仍具备足够容量来优化 $O b j(\theta,l_{e})$。学习步长 $\alpha$ 及其他优化器参数(如RMSProp中的 $\beta$)是可编辑训练(Editable Training)的可训练参数,我们通过梯度下降对其进行显式优化。

4 EXPERIMENTS

4 实验

In this section, we extensively evaluate Editable Training on several deep learning problems and compare it to existing alternatives for efficient model patching.

在本节中,我们广泛评估了可编辑训练(Editable Training)在多个深度学习问题上的表现,并将其与现有高效模型修补方案进行了对比。

4.1 TOY EXPERIMENT: CIFAR-10

4.1 玩具实验:CIFAR-10

First, we experiment on image classification with the small CIFAR-10 dataset with standard train/test splits (Krizhevsky et al.). The training dataset is further augmented with random crops and random horizontal flips. All models trained on this dataset follow the ResNet-18 (He et al. (2015)) architecture and use the Adam optimizer (Kingma & Ba (2014)) with default hyper parameters.

首先,我们在小型CIFAR-10数据集上进行了图像分类实验,采用标准训练/测试划分 (Krizhevsky et al.)。训练数据集通过随机裁剪和随机水平翻转进行了数据增强。所有在该数据集上训练的模型均采用ResNet-18架构 (He et al. (2015)),并使用默认超参数的Adam优化器 (Kingma & Ba (2014))。

Our baseline is ResNet-18 (He et al. (2015)) neural network trained to minimize the standard crossentropy loss without Editable Training. This model provides $6.3%$ test error rate at convergence.

我们的基线模型是未经可编辑训练(Editable Training)、以最小化标准交叉熵损失为目标训练的ResNet-18(He et al. (2015))神经网络。该模型收敛时的测试错误率为$6.3%$。

Comparing editor functions. As a preliminary experiment, we compare several variations of editor functions for the baseline model without Editable Training. We evaluate each editor by applying $N{=}1000$ edits $l_{e}$ . Each edit consists of an image from the test set assigned with a random (likely incorrect) label uniformly chosen from 0 to 9. After $N$ independent edits, we compute three following metrics over the entire test set:

比较编辑功能。作为初步实验,我们比较了未使用可编辑训练(Editable Training)的基线模型中几种编辑功能的变体。通过应用 $N{=}1000$ 次编辑 $l_{e}$ 来评估每个编辑器。每次编辑包含一张测试集中的图像,并为其分配一个从0到9均匀随机选择(可能不正确)的标签。完成 $N$ 次独立编辑后,我们在整个测试集上计算以下三个指标:

• Drawdown — mean absolute difference of classification error before and after performing an edit. Smaller drawdown indicates better editor locality. • Success Rate — a rate of edits, for which editor succeeds in under $k{=}10$ gradient steps. • Num Steps — an average number of gradient steps needed to perform a single edit.

• 回撤 (Drawdown) — 执行编辑前后分类误差的平均绝对差值。回撤越小表示编辑器的局部性越好。
• 成功率 (Success Rate) — 编辑器在 $k{=}10$ 次梯度步内成功完成编辑的比例。
• 步数 (Num Steps) — 执行单次编辑所需的平均梯度步数。

Editor FunctionGDScaled GDRPropRMSPropMomentumAdam
Drawdown3.8%2.81%1.99%1.77%2.42%19.4%
Success Rate Num Steps98.8% 3.5499.1% 3.91100% 2.99100% 3.1196.0% 5.60100% 3.86
编辑器功能 GD Scaled GD RProp RMSProp Momentum Adam
回撤率 3.8% 2.81% 1.99% 1.77% 2.42% 19.4%
成功率/步数 98.8% 3.54 99.1% 3.91 100% 2.99 100% 3.11 96.0% 5.60 100% 3.86

Table 1: Comparison of different editor functions on the CIFAR10 dataset with the baseline ResNet18 model trained without Editable Training.

表 1: 在未使用可编辑训练 (Editable Training) 的基线 ResNet18 模型上,不同编辑方法在 CIFAR10 数据集上的功能对比

• RProp — like GD, but the algorithm only uses the sign of gradients: $\theta-\alpha\cdot s i g n(\nabla_{\theta}l_{e}(\theta))$ . • RMSProp — like GD, but the learning rate for each individual parameter is divided by $\sqrt{r m s_{t}+\epsilon}$ where $r m s_{0}=[\nabla_{\theta}l_{e}(\theta_{0})]^{2}$ and $r m s_{t+1}=\boldsymbol{\beta}\cdot r m s_{t}+(1-\boldsymbol{\beta})\cdot[\nabla_{\boldsymbol{\theta}}l_{e}(\boldsymbol{\theta})]^{2}$ . • Momentum GD — like GD, but the update follows the accumulated gradient direction $\nu$ : $\nu_{0}=0;\nu_{t+1}=\alpha\cdot\nabla_{\theta}l_{e}(\theta_{0})+\mu\cdot\nu_{t}$ . • Adam — adaptive momentum algorithm as described in Kingma & Ba (2014) with tunable $\alpha,\beta_{1},\beta_{2}$ . To prevent Adam from replicating RMSProp, we restrict $\beta_{1}$ to [0.1, 1.0] range.

• RProp —— 类似于梯度下降 (GD),但该算法仅使用梯度的符号:$\theta-\alpha\cdot s i g n(\nabla_{\theta}l_{e}(\theta))$。
• RMSProp —— 类似于梯度下降,但每个参数的学习率会除以 $\sqrt{r m s_{t}+\epsilon}$,其中 $r m s_{0}=[\nabla_{\theta}l_{e}(\theta_{0})]^{2}$,且 $r m s_{t+1}=\boldsymbol{\beta}\cdot r m s_{t}+(1-\boldsymbol{\beta})\cdot[\nabla_{\boldsymbol{\theta}}l_{e}(\boldsymbol{\theta})]^{2}$。
• Momentum GD —— 类似于梯度下降,但更新遵循累积梯度方向 $\nu$:$\nu_{0}=0;\nu_{t+1}=\alpha\cdot\nabla_{\theta}l_{e}(\theta_{0})+\mu\cdot\nu_{t}$。
• Adam —— 自适应动量算法,如 Kingma & Ba (2014) 所述,可调节参数 $\alpha,\beta_{1},\beta_{2}$。为避免 Adam 退化为 RMSProp,我们将 $\beta_{1}$ 限制在 [0.1, 1.0] 范围内。

For each optimizer, we tune all hyper parameters (e.g. learning rate) to optimize locality while ensuring that editor succeeds in under $k=10$ steps for at least $95%$ of edits. We also tune the editor function by limiting the subset of parameters it is allowed to edit. The ResNet-18 model consists of six parts: initial convolutional layer, followed by four ”chains” of residual blocks and a final linear layer that predicts class logits. We experimented with editing the whole model as well as editing each individual ”chain”, leaving parameters from other layers fixed. For each editor Table 1 reports the numbers, obtained for the subset of editable parameters, corresponding to the smallest drawdown. For completeness, we also report the drawdown of Gradient Descent and RMSProp for different subsets of editable parameters in Table 2.

对于每个优化器,我们调整所有超参数(如学习率)以优化局部性,同时确保编辑器在不超过$k=10$步的情况下成功完成至少$95%$的编辑。我们还通过限制编辑器可修改的参数子集来调整编辑函数。ResNet-18模型包含六个部分:初始卷积层,随后是四个残差块"链"和一个预测类别logits的最终线性层。我们尝试了编辑整个模型以及单独编辑每个"链",同时固定其他层的参数。表1展示了各编辑器在对应最小回撤的可编辑参数子集上的数值。为完整起见,表2还列出了梯度下降(Gradient Descent)和RMSProp在不同可编辑参数子集上的回撤情况。

EditableLayersWholeModelChain 1Chain 2Chain 3Chain 4
GradientDescent3.8%18.3%7.7%5.3%4.76%
RMSProp2.29%22.8%1.85%1.77%1.99%
EditableLayers WholeModel Chain 1 Chain 2 Chain 3 Chain 4
GradientDescent 3.8% 18.3% 7.7% 5.3% 4.76%
RMSProp 2.29% 22.8% 1.85% 1.77% 1.99%

Table 2: Mean Test Error Drawdown when editing different ResNet18 layers on CIFAR10.

表 2: 在CIFAR10上编辑不同ResNet18层时的平均测试误差回撤。

Table 1 and Table 2 demonstrate that the editor function locality is heavily affected by the choice of editing function even for models trained without Editable Training. Both RProp and RMSProp significantly outperform the standard Gradient Descent while Momentum and Adam show smaller gains. In fact, without the constraint $\beta_{1}>0.1$ the tuning procedure returns $\beta_{1}=0$ , which makes Adam equivalent to RMSProp. We attribute the poor performance of Adam and Momentum to the fact that most methods only make a few gradient steps till convergence and the momentum term cannot accumulate the necessary statistics.

表1和表2表明,即使对于未经可编辑训练(Editable Training)的模型,编辑器功能局部性也深受编辑函数选择的影响。RProp和RMSProp显著优于标准梯度下降法,而Momentum和Adam的提升幅度较小。事实上,若不施加约束$\beta_{1}>0.1$,调参过程会返回$\beta_{1}=0$,这使得Adam等效于RMSProp。我们将Adam和Momentum表现不佳归因于:大多数方法只需少量梯度步就能收敛,导致动量项无法积累必要的统计量。

Editable Training. Finally, we report results obtained with Editable Training. On each training batch, we use a single constraint $\begin{array}{r}{l_{e}(\bar{\hat{\theta}})=\operatorname*{max}_ {y_{i}}\log p(y_{i}|x,\hat{\theta})-\log p(y_{r e f}|x,\hat{\theta})}\end{array}$ , where $x$ is sampled from the train set and $y_{r e f}$ is a random class label (from 0 to 9). The model is then trained by directly minimizing objective (2) with $k{=}10$ editor steps and all other parameters optimized by back propagation.

可编辑训练。最后,我们报告了使用可编辑训练获得的结果。在每个训练批次上,我们使用单一约束 $\begin{array}{r}{l_{e}(\bar{\hat{\theta}})=\operatorname*{max}_ {y_{i}}\log p(y_{i}|x,\hat{\theta})-\log p(y_{r e f}|x,\hat{\theta})}\end{array}$ ,其中 $x$ 从训练集中采样, $y_{r e f}$ 是一个随机类别标签(从0到9)。然后通过直接最小化目标函数(2)来训练模型,其中 $k{=}10$ 次编辑步骤,其余所有参数通过反向传播优化。

We compare our Editable Training against three baselines, which also allow efficient model correction. The first natural baseline is Elastic Weight Consolidation (Kirkpatrick et al. (2016)): a technique that penalizes the edited model with the squared difference in parameter space, weighted by the importance of each parameter. Our second baseline is a semi-parametric Deep $\mathrm{k\Omega}$ -Nearest Neighbors (DkNN) model (Papernot & McDaniel (2018)) that makes predictions by using $k$ nearest neighbors in the space of embeddings, produced by different CNN layers. For this approach, we edit the model by flipping labels of nearest neighbors until the model predicts the correct class.

我们将可编辑训练 (Editable Training) 与三种同样支持高效模型修正的基线方法进行对比。第一个自然基线是弹性权重固化 (Elastic Weight Consolidation) (Kirkpatrick et al. (2016)):该技术通过参数空间的平方差(按各参数重要性加权)对编辑后的模型进行惩罚。第二个基线是半参数化深度k近邻 (Deep k-Nearest Neighbors, DkNN) 模型 (Papernot & McDaniel (2018)),该模型通过在不同CNN层生成的嵌入空间中使用k个最近邻进行预测。对于该方法,我们通过翻转最近邻的标签来编辑模型,直至模型预测出正确类别。

Finally we compare to alternative editor function inspired by Conditional Neural Processes (CNP) (Garnelo et al. (2018)) that we refer to as Editable+CNP. For this baseline, we train a specialized CNP model architecture that performs edits by adding a special condition vector to intermediate activation s. This vector is generated by an additional ”encoder” layer. We train the CNP model to solve the original classification problem when the condition vector is zero (hence, the model behaves as standard ResNet18) and minimize $\mathcal{L}_ {e d i t}$ and $\mathcal{L}_{l o c}$ when the condition vector is applied.

最后,我们对比了受条件神经过程 (Conditional Neural Processes, CNP) [20] 启发的另一种编辑器函数,称为 Editable+CNP。对于该基线,我们训练了一个专用的 CNP 模型架构,通过向中间激活添加特殊条件向量来执行编辑。该向量由额外的"编码器"层生成。我们训练 CNP 模型在条件向量为零时解决原始分类问题(此时模型表现为标准 ResNet18),并在应用条件向量时最小化 $\mathcal{L}_ {e d i t}$ 和 $\mathcal{L}_{l o c}$。

After tuning the CNP architecture, we obtained the best performance when the condition vector is computed with a single ResNet block that receives the image representation via activation s from the third residual chain of the main ResNet-18 model. This ”encoder” also conditions on the target class $y_{r e f}$ with an embedding layer (lookup table) that is added to the third chain activation s. The resulting procedure becomes the following: first, apply encoder to the edited sample and compute the condition vector, then add this vector to the third layer chain activation s for all subsequent inputs.

在调整CNP架构后,我们发现当条件向量通过单个ResNet块计算时性能最佳,该块接收来自主ResNet-18模型第三条残差链的激活值作为图像表征。该"编码器"还通过嵌入层(查找表)对目标类别$y_{ref}$进行条件化处理,并将其添加到第三条链的激活值中。最终流程如下:首先对编辑样本应用编码器并计算条件向量,然后将该向量添加到所有后续输入的第三层链激活值中。

Training ProcedureEditor FunctionEditable LayersTest Error RateTest Error DrawdownSuccess RateNum Steps
Baseline TrainingGD RMSPropAll Chain 36.3% 6.3%3.8% 1.77%98.8% 100%3.54 3.11
Editable Cloc = 0.01GD GD RMSPropAll Chain 36.34% 6.28%1.42% 1.44%100% 100%3.39 2.82
Editable Cloc = 0.1 Editable+CNP (best) Baseline Training Baseline TrainingRMSProp Cond.vector GD+EWC RMSProp+EwCChain 3 Chain 3 Chain 3 Chain 36.31% 7.19% 6.33% 6.3% 6.3%0.86% 0.65% 1.06% 1.92%100% 100% 98.9% 100%4.13 4.76 n/a 3.88

Table 3: Editable Training of ResNet18 on CIFAR10 dataset with different editor functions.

训练流程 编辑功能 可编辑层 测试错误率 测试误差下降 成功率 步骤数
基线训练 GD RMSProp 全链3 6.3% 6.3% 3.8% 1.77% 98.8% 100% 3.54 3.11
可编辑Cloc=0.01 GD GD RMSProp 全链3 6.34% 6.28% 1.42% 1.44% 100% 100% 3.39 2.82
可编辑Cloc=0.1 可编辑+CNP(最佳) 基线训练 基线训练 RMSProp 条件向量 GD+EWC RMSProp+EwC 链3 链3 链3 链3 6.31% 7.19% 6.33% 6.3% 6.3% 0.86% 0.65% 1.06% 1.92% 100% 100% 98.9% 100% 4.13 4.76 n/a 3.88

表 3: 在CIFAR10数据集上使用不同编辑函数对ResNet18进行可编辑训练的结果。

Table 3 demonstrates two advantages of Editable Training. First, with $c_{l o c}{=}0.01$ it is able to reduce drawdown (compared to models trained without Editable Training) while having no significant effect on test error rate. Second, editing Chain 3 alone is almost as effective as editing the whole model. This is important because it allows us to reduce training time, making Editable Training $\approx2.5$ times slower than baseline training. Note, Editable $+\mathbf{CNP}$ turned out to be almost as effective as models trained with gradient-based editors while being simpler to implement.

表 3 展示了可编辑训练 (Editable Training) 的两个优势。首先,在 $c_{l o c}{=}0.01$ 时,它能够减少回撤 (drawdown) ,同时对测试错误率没有显著影响。其次,仅编辑 Chain 3 的效果几乎与编辑整个模型相当。这一点很重要,因为它可以缩短训练时间,使可编辑训练比基线训练慢 $\approx2.5$ 倍。值得注意的是,可编辑 $+\mathbf{CNP}$ 的效果几乎与基于梯度的编辑器训练的模型相当,同时实现更简单。

4.2 ANALYZING EDITED MODELS

4.2 分析编辑后的模型

In this section, we aim to interpret the differences between the models learned with and without Editable Training. First, we investigate which inputs are most affected when the model is edited on a sample that belongs to each particular class. Based on Figure 2 (left), we conclude that edits of baseline model cause most drawdown on samples that belong to the same class as the edited input (prior to edit). However, this visualization loses information by reducing edits to their class labels.

在本节中,我们旨在解释使用可编辑训练(Editable Training)与未使用时学习到的模型之间的差异。首先,我们研究当模型在属于每个特定类别的样本上进行编辑时,哪些输入受到的影响最大。根据图2(左),我们得出结论:基线模型的编辑对与编辑输入(编辑前)属于同一类别的样本造成最大的回撤。然而,这种可视化通过将编辑简化为类别标签而丢失了部分信息。

In Figure 2 (middle) we apply t-SNE (van der Maaten & Hinton (2008)) to analyze the structure of the ”edit space”. Intuitively, two edited versions of the same model are considered close if they make similar predictions. We quantify this by computing KL-divergence between the model’s predictions before and after edit for each of 10.000 test samples. These KL divergences effectively form a 10.000-dimensional model descriptor. We compute these descriptors for 4.500 edits applied to models trained with and without Editable Training. These vectors are then embedded in twodimensional space with the t-SNE algorithm. We plot the obtained charts on Figure 2 (middle), with point colors denoting original class labels of edited images. As expected, the baseline edits for images of the same class are mapped to close points.

在图 2 (中) 中,我们应用 t-SNE (van der Maaten & Hinton (2008)) 来分析"编辑空间"的结构。直观上,如果两个同一模型的编辑版本做出相似的预测,则认为它们很接近。我们通过计算模型在 10,000 个测试样本上编辑前后的 KL 散度 (KL-divergence) 来量化这一点。这些 KL 散度有效地形成了一个 10,000 维的模型描述符。我们为应用了可编辑训练 (Editable Training) 和未应用可编辑训练的模型所进行的 4,500 次编辑计算了这些描述符。然后使用 t-SNE 算法将这些向量嵌入二维空间。我们在图 2 (中) 中绘制了获得的图表,点颜色表示编辑图像的原始类别标签。正如预期的那样,同一类别图像的基线编辑被映射到相近的点。

In turn, Editable Training does not always follow this pattern: the edit clusters are formed based on both original and target labels with a highly interlinked region in the middle. Combined with the fact that Editable Training has a significantly lower drawdown, this lets us hypothesize that with Editable Training neural networks learn representations where edits affect objects of the same original class to a smaller extent.

反过来,可编辑训练(Editable Training)并不总是遵循这种模式:编辑簇是基于原始标签和目标标签形成的,中间有一个高度互连的区域。结合可编辑训练的回撤幅度明显较低这一事实,我们可以假设,通过可编辑训练,神经网络学习到的表示会使编辑对同一原始类别对象的影响程度更小。

Conversely, the t-SNE visualization lacks information about the true dimensionality of the data manifold. To capture this property, we also perform truncated SVD decomposition of the same matrix of descriptors. Our main interest is the number of SVD components required to explain a given percentage of data variance. In Figure 2 (right) we report the explained variance ratio for models obtained with and without Editable Training. These results present evidence that Editable Training learns representations that exploit the neural network capacity to a greater extent.

相反,t-SNE可视化无法反映数据流形的真实维度特性。为捕捉这一属性,我们还对相同的描述符矩阵进行了截断SVD分解。我们重点关注的是解释给定百分比数据方差所需的SVD成分数量。在图2(右)中,我们对比了使用可编辑训练(Editable Training)与未使用时模型的解释方差比率。结果表明,可编辑训练学到的表征能更充分地利用神经网络容量。


Figure 2: Edited model visualization s (Left) Confusion matrix of baseline model: rows correspond to editing images belonging to each of 10 classes; columns represent drawdowns per individual class. (Middle) t-SNE visualization s. Point color represents original class labels; brightness encodes edit targets (Right) The proportion of explained variance versus the number of components.

图 2: 编辑模型可视化 (左) 基线模型的混淆矩阵: 行对应属于10个类别中每个类别的编辑图像; 列表示每个单独类别的下降情况。(中) t-SNE可视化。点颜色表示原始类别标签; 亮度编码编辑目标。(右) 解释方差比例与组件数量的关系。

4.3 EDITABLE FINE-TUNING FOR LARGE SCALE IMAGE CLASSIFICATION

4.3 大规模图像分类的可编辑微调

Section 4.1 demonstrates the success of Editable Training on the small CIFAR-10 dataset. However, many practical applications require training for many weeks on huge datasets. Re-training such model for the sake of better edits may be impractical. In contrast, it would be more efficient to start from a pre-trained model and fine-tune it with Editable Training.

第4.1节展示了可编辑训练(Editable Training)在小型CIFAR-10数据集上的成功。然而,许多实际应用需要在海量数据集上进行数周的训练。为了获得更好的编辑效果而重新训练此类模型可能不切实际。相比之下,从预训练模型出发并通过可编辑训练进行微调会更加高效。

Here we experiment with the ILSVRC image classification task (Deng et al. (2009)) and consider two pre-trained architectures: smaller ResNet-18 and deeper DenseNet-169 (Huang et al. (2016)) networks. For each architecture, we start with pre-trained model weights2 and fine-tune them on the same dataset with Editable Training. More specifically, we choose the training objective $\mathcal{L}_{b a s e}(\theta)$ as KL-divergence between the predictions of the original network and its fine-tuned counterpart. Intuitively, this objective encourages the network to preserve its original classification behavior, while being trained to allow local edits. Similar to Section 4.1, the editor functions are only allowed to modify a subset of neural network layers. We experiment with two choices of such subsets. First, we try to edit a pre-existing layer in the network. Namely, we select the third out of four ”chains” in both architectures. In the second experiment, we augment each architecture with an extra trainable layer after the last convolutional layer. We set an extra layer to be a residual block with a 4096-unit dense layer, followed by ELU activation (Clevert et al. (2015)) and another 1024-unit dense layer.

我们在此使用ILSVRC图像分类任务(Deng et al. (2009))进行实验,并考虑两种预训练架构:较小的ResNet-18和更深的DenseNet-169(Huang et al. (2016))网络。对于每种架构,我们从预训练模型权重2开始,通过可编辑训练(Editable Training)在同一数据集上进行微调。具体而言,我们选择训练目标$\mathcal{L}_{b a s e}(\theta)$作为原始网络与其微调对应网络预测之间的KL散度。直观上,该目标鼓励网络在训练允许局部编辑的同时保持其原始分类行为。与第4.1节类似,编辑器函数仅允许修改神经网络层的子集。我们对此类子集进行两种选择的实验。首先,我们尝试编辑网络中预先存在的层,即在两种架构中都选择四个"链"中的第三个。在第二个实验中,我们在最后一个卷积层后为每种架构添加一个额外的可训练层。我们将该额外层设置为具有4096单元密集层的残差块,后接ELU激活(Clevert et al. (2015))和另一个1024单元密集层。

The evaluation is performed on $N{=}1000$ edits with random target class. We measure the drawdown on the full ILSVRC validation set of 50.000 images. We use the SGD optimizer with momentum $\mu{=}0.9$ . We set the learning rate to $10^{-5}$ for the pre-existing layers and $\mathrm{\dot{1}0^{-3}}$ for the extra block. The ImageNet training data is augmented with random resized crops and random horizontal flips.

在$N{=}1000$次随机目标类别的编辑上进行评估。我们在包含50,000张图像的完整ILSVRC验证集上测量回撤幅度。使用带动量$\mu{=}0.9$的SGD优化器,为预训练层设置学习率为$10^{-5}$,为额外块设置学习率为$\mathrm{\dot{1}0^{-3}}$。ImageNet训练数据通过随机缩放裁剪和随机水平翻转进行增强。

Our baselines for this task are the pre-trained architectures without Editable Fine-Tuning. However, during experiments, we noticed that minimizing the KL-divergence ${\mathcal{L}}(\theta)$ has a side-effect of improving validation error. We attribute this improvement to the self-distillation phenomenon (Hinton et al. (2015); Furlanello et al. (2018)). To disentangle these two effects, we consider an additional baseline where the model is trained to minimize the KL-divergence without Editable Training terms. For fair comparison, we also include baselines that edit an extra layer. This layer is initialized at random for the pre-trained models and fine-tuned for the models trained with distillation.

我们在此任务中的基线模型是未经可编辑微调 (Editable Fine-Tuning) 的预训练架构。然而在实验中,我们发现最小化KL散度 ${\mathcal{L}}(\theta)$ 会附带提升验证集误差的效果。我们将此改进归因于自蒸馏现象 (Hinton et al. (2015); Furlanello et al. (2018))。为区分这两种效应,我们增设了一个仅最小化KL散度(不含可编辑训练项)的基线模型。为公平比较,我们还引入了编辑额外层的基线方案:对于预训练模型,该层随机初始化;对于蒸馏训练模型,该层进行微调。

The results in Table 4 show that Editable Training can be effectively applied in the fine-tuning scenario, achieving the best results with an extra trainable layer. In all cases Editable Fine-Tuning took under 48 hours on a single GeForce 1080 Ti GPU while a single edit requires less than $150\mathrm{ms}$ .

表4中的结果表明,可编辑训练(Editable Training)可有效应用于微调场景,通过额外添加可训练层实现了最佳效果。在所有实验中,使用单块GeForce 1080 Ti GPU进行可编辑微调耗时均低于48小时,而单次编辑所需时间不足150毫秒。

4.3.1 REALISTIC EDIT TASKS WITH NATURAL ADVERSARIAL EXAMPLES

4.3.1 基于自然对抗样本的真实编辑任务

In all previous experiments, we considered edits with randomly chosen target class. However, in many practical scenarios, most of these edits will never occur. For instance, it is far more likely that an image previously classified as ”plane” would require editing into ”bird” than into ”truck”

在之前的所有实验中,我们考虑了随机选择目标类别的编辑操作。然而在实际应用场景中,大多数此类编辑根本不会发生。例如:一张原本分类为"飞机"的图片,被编辑成"鸟类"的可能性远高于被编辑成"卡车"。

Table 4: Editable Training on the ImageNet dataset with RMSProp editor function.

Model ArchitectureTraining ProcedureEditable LayersTest Error RateMean DrawdownSuccess RateNum Steps
ResNet18Pre-trainedChain 330.95%3.89%99.8%3.582
Pre-trainedExtra layer30.95%9.18%100%4.272
DistillationExtra layer30.75%2.80%100%2.63
EditableChain 330.53%3.78%99.8%3.616
EditableExtra layer30.61%0.57%100%3.388
DenseNet169Pre-trainedChain 325.49%5.20%100%2.551
Pre-trainedExtra layer25.47%9.05%100%3.874
DistillationExtra layer24.33%1.67%100%2.822
EditableChain 324.32%4.47%100%2.556
EditableExtra layer24.38%0.96 %100%2.970

表 4: 使用RMSProp编辑器函数在ImageNet数据集上的可编辑训练

模型架构 训练流程 可编辑层 测试错误率 平均回撤 成功率 步数
ResNet18 预训练 链式3层 30.95% 3.89% 99.8% 3.582
预训练 额外层 30.95% 9.18% 100% 4.272
蒸馏 额外层 30.75% 2.80% 100% 2.63
可编辑 链式3层 30.53% 3.78% 99.8% 3.616
可编辑 额外层 30.61% 0.57% 100% 3.388
DenseNet169 预训练 链式3层 25.49% 5.20% 100% 2.551
预训练 额外层 25.47% 9.05% 100% 3.874
蒸馏 额外层 24.33% 1.67% 100% 2.822
可编辑 链式3层 24.32% 4.47% 100% 2.556
可编辑 额外层 24.38% 0.96% 100% 2.970

or ”ship”. To address this consideration, we employ the Natural Adversarial Examples (NAE) data set by Hendrycks et al. (2019). This data set contains 7.500 natural images that are particularly hard to classify with neural networks. Without edits, a pre-trained model can correctly predict less than $1%$ of NAEs, but the correct answer is likely to be within top-100 classes ordered by predicted probabilities (see Figure 5 left).

为了解决这一问题,我们采用了Hendrycks等人(2019)提出的自然对抗样本(NAE)数据集。该数据集包含7500张对神经网络分类特别困难的真实图像。未经编辑时,预训练模型对NAE的正确预测率不足1%,但正确答案很可能出现在按预测概率排序的前100个类别中(见图5左)。

The next set of experiments quantifies Editable Training in this more realistic setting. All models are evaluated on a sample of 1.000 edits, each corresponding to one Natural Adversarial Example and its reference class. We measure the drawdown from each edit on 50.000 ILSVRC test images. We evaluate best techniques from Section 4.3 and their modifications that account for NAEs:

下一组实验量化了在这种更现实场景下的可编辑训练 (Editable Training)。所有模型都在1000次编辑样本上进行评估,每次编辑对应一个自然对抗样本 (Natural Adversarial Example) 及其参考类别。我们在5万张ILSVRC测试图像上测量每次编辑带来的性能下降。我们评估了第4.3节中的最佳技术及其针对自然对抗样本的改进方案:

• Editable Training: Random — model trained to edit on random targets from the uniform distribution, same as in Table 4. Compared to the same pre-trained and distilled baselines. • Editable Training: Match Ranks — model trained to edit ImageNet training images with targets sampled based on their rank under NAE rank distribution (see 5, left). • Editable Training: Train on NAE — model trained to edit 6.500 natural adversarial examples. These NAEs do not overlap with 1.000 NAE examples used for evaluation.

  • 可编辑训练:随机 (Editable Training: Random) —— 模型训练用于在均匀分布中随机选择的目标上进行编辑,与表 4 相同。与相同的预训练和蒸馏基线进行比较。
  • 可编辑训练:匹配排名 (Editable Training: Match Ranks) —— 模型训练用于编辑 ImageNet 训练图像,目标采样基于其在 NAE 排名分布下的排名(见 5,左)。
  • 可编辑训练:在 NAE 上训练 (Editable Training: Train on NAE) —— 模型训练用于编辑 6,500 个自然对抗样本 (NAE)。这些 NAE 与用于评估的 1,000 个 NAE 示例不重叠。

Table 5: Editing Natural Adversarial Examples for ResNet18: (Top-Left) Editor effectiveness when editing $N=1000$ NAEs; (Top-Right) Reference class rank distribution for baseline model, (Bottom-Right) Error rate for edit sequences, ResNet18 baseline and Match Ranks. Pale areas indicate std. deviation over 10 runs.

Training ProcedureTest ErrorDrawdownSuccess RateNum Steps
Baseline Training
Pre-trained Distillation30.99%4.54%100%3.822 2.192
30.75%1.62% Editable Training100%
Randomedits Matchranks TrainonNAE30.79% 30.76% 30.86%0.314% 0.146% 0.167%100% 100% 100%2.594 2.149 2.236

表 5: ResNet18自然对抗样本编辑效果: (左上) 编辑 $N=1000$ 个NAE时的编辑器有效性; (右上) 基线模型的参考类别排名分布, (右下) 编辑序列、ResNet18基线和匹配排名的错误率。浅色区域表示10次运行的标准差。

训练流程 测试错误率 回撤率 成功率 步骤数
基线训练
预训练蒸馏 30.99% 4.54% 100% 3.822±2.192
30.75% 1.62% 100%
随机编辑/匹配排名/NAE训练 30.79%/30.76%/30.86% 0.314%/0.146%/0.167% 100% 2.594/2.149/2.236


Test error with standard deviation

图 1: 测试误差及标准差

The results in Table 5 (top-left) show that Editable Training significantly reduces drawdown for NAEs even when trained with random targets. However, accounting for the distribution of target classes improves locality even further. Surprisingly enough, training on 6.500 actual NAEs fares no better than simply matching the distribution of target ranks.

表5(左上)结果显示,即便使用随机目标进行训练,可编辑训练(Editable Training)也能显著降低NAE的回撤幅度。但若考虑目标类别的分布情况,还能进一步提升局部性。令人惊讶的是,用6,500个真实NAE进行训练的效果,竟与简单匹配目标排名分布的效果不相上下。

For the final set of evaluations, we consider two realistic scenarios that are not covered by our previous experiments. First, we evaluate whether edits performed by our method generalize to substantially similar inputs. This behavior is highly desirable since we want to avoid the edited model repeating old mistakes in a slightly changed context. For each of 1, 000 NAEs, we find the most similar image from test set based on Inception V 3 embeddings (Szegedy et al., 2015). For each such pair, we edit 5 augmentations of the first image and measure how often the model predicts the edited class on 5 augmentations of the second image. A model trained with random edits has an accuracy of $86.7%$ while ”Editable $^+$ Match Ranks” scores $85.9%$ accuracy. Finally, we evaluate if our technique can perform multiple edits in a sequence. Figure 5 (bottom-left) demonstrates that our approach can cope with sequential edits without ever being trained that way.

在最终评估阶段,我们考察了先前实验未涵盖的两种现实场景。首先,我们评估本方法执行的编辑是否可泛化至高度相似的输入。这种行为极具价值,因为我们需要避免被编辑模型在轻微变化的语境中重复原有错误。针对1,000个NAE样本,我们基于Inception V3嵌入向量 (Szegedy et al., 2015) 从测试集中找到最相似的图像。对每组成对样本,我们编辑首张图像的5个增强版本,并统计模型在第二张图像的5个增强版本上预测编辑类别的频率。使用随机编辑训练的模型准确率为$86.7%$,而"可编辑$^+$匹配排序"方案的准确率达到$85.9%$。最后,我们验证了本技术能否执行连续多次编辑。图5(左下)表明,我们的方法能处理序列化编辑任务,尽管从未接受过相关训练。

4.4 EDITABLE TRAINING FOR MACHINE TRANSLATION

4.4 机器翻译的可编辑训练

The previous experiments focused on multi-class classification problems. However, Editable Training can be applied to any task where the model is trained by minimizing a differentiable objective. Our final set of experiments demonstrates the applicability of Editable Training for machine translation. We consider the IWSLT 2014 German-English translation task with the standard training/test splits (Cettolo et al. (2015)). The data is pre processed with Moses Tokenizer (Koehn et al. (2007)) and converted to lowercase. We further apply the Byte-Pair Encoding with 10.000 BPE rules learned jointly from German and English training data. Finally, we train the Transformer (Vaswani et al. (2017)) model similar to transformer-base configuration, optimized for IWSLT De-En task3.

先前实验主要针对多类别分类问题。然而可编辑训练(Editable Training)可应用于任何通过最小化可微分目标函数训练模型的任务。我们最后一系列实验展示了该方法在机器翻译任务中的适用性。我们采用IWSLT 2014德语-英语翻译任务的标准训练/测试集划分(Cettolo et al. (2015))。数据通过Moses Tokenizer(Koehn et al. (2007))进行预处理并转换为小写格式。我们进一步应用从德英双语训练数据联合学习的10,000条BPE规则进行字节对编码。最终我们训练了类似transformer-base配置的Transformer(Vaswani et al. (2017))模型,并针对IWSLT德英翻译任务进行了优化。

Typical machine translation models use beam search to find the most likely translation. Hence we consider an edit to be successful if and only if the log-probability of target translation is greater than log-probability of any alternative translation. So, $\begin{array}{r}{l_{e}(\hat{\theta})=\operatorname*{max}_ {y_{i}}\log p(\tilde{y}_ {i}\vert s,\hat{\theta})-\log p(\tilde{y_{0}}\vert s,\hat{\theta})}\end{array}$ , where $s$ is a source sentence, $y_{0}$ denotes target translation and ${y_{i}}_{i=1}^{k}$ are alternative translations. During training, we approximate this by finding $k{=}32$ most likely translations with beam search using the Transformer model trained normally on the same data. The edit targets are sampled from the same model by sampling with temperature $\tau{=}1.2$ . The resulting edit consists of three parts: a source sentence, a target translation and a set of alternative translations.

典型的机器翻译模型使用束搜索(beam search)来寻找最可能的翻译。因此,我们定义编辑成功当且仅当目标翻译的对数概率大于任何替代翻译的对数概率。即 $\begin{array}{r}{l_{e}(\hat{\theta})=\operatorname*{max}_ {y_{i}}\log p(\tilde{y}_ {i}\vert s,\hat{\theta})-\log p(\tilde{y_{0}}\vert s,\hat{\theta})}\end{array}$ ,其中 $s$ 是源语句, $y_{0}$ 表示目标翻译, ${y_{i}}_{i=1}^{k}$ 是替代翻译。训练时,我们通过在相同数据上正常训练的Transformer模型使用束搜索找到 $k{=}32$ 个最可能的翻译来近似这一过程。编辑目标通过温度 $\tau{=}1.2$ 采样从同一模型中采样获得。最终生成的编辑包含三部分:源语句、目标翻译和一组替代翻译。

We define $\mathcal{L}_ {l o c}$ as KL-divergence between the predictions of the original and edited model averaged over target tokens, = x,yE D |y1| Pt DKL(p(yt|x, y0:t, θ) || p(yt|x, y0:tEditkα(θ, le))), where D is a data batch, $x$ and $y$ are the source and translation phrases respectively, $y_{0:t}$ denotes a translation prefix. The $E d i t$ function optimizes the final decoder layer using RMSProp with hyper parameters tuned as in Section 4.1. The results in Table 6 show that Editable Training produces a model that matches the baseline translation quality but has less than half of its drawdown.

我们将 $\mathcal{L}_ {l o c}$ 定义为原始模型与编辑后模型在目标token上的平均预测KL散度,即 = x,yE D |y1| Pt DKL(p(yt|x, y0:t, θ) || p(yt|x, y0:tEditkα(θ, le))),其中D为数据批次,$x$ 和 $y$ 分别为源语句和翻译语句,$y_{0:t}$ 表示翻译前缀。Edit函数使用RMSProp优化最终解码层,其超参数设置如第4.1节所述。表6结果显示,可编辑训练生成的模型在保持基线翻译质量的同时,性能下降幅度不到基线模型的一半。

Table 6: Evaluation of editable Transformer models on IWSLT14 German-English translation task.

Training ProcedureTestBLEUBLEU DrawdownSuccessrateNumSteps
Baseline training, Q=10-334.770.76100%2.35
Editable, Cloc=100, α=10-334.800.35100%3.07
Editable, Cloc=100,α=3 · 10-434.810.17100%5.5

表 6: 可编辑Transformer模型在IWSLT14德英翻译任务上的评估

训练流程 测试BLEU BLEU下降值 成功率 步骤数
基线训练, Q=10-3 34.77 0.76 100% 2.35
可编辑, Cloc=100, α=10-3 34.80 0.35 100% 3.07
可编辑, Cloc=100, α=3·10-4 34.81 0.17 100% 5.5

5 CONCLUSION

5 结论

In this paper we have addressed the efficient correction of neural network mistakes, a highly important task for deep learning practitioners. We have proposed several evaluation measures for comparison of different means of model correction. Then we have introduced Editable Training, a training procedure that produces models that allow gradient-based editing to address corrections of the model behaviour. We demonstrate the advantage of Editable Training against reasonable baselines on large-scale image classification and machine translation tasks.

本文探讨了神经网络错误的高效修正方法,这对深度学习从业者至关重要。我们提出了多种评估指标来比较不同的模型修正手段,并介绍了可编辑训练 (Editable Training) 这一训练流程,该流程生成的模型支持基于梯度的编辑来修正模型行为。我们在大规模图像分类和机器翻译任务中验证了可编辑训练相对于合理基线的优势。

ACKNOWLEDGMENTS

致谢

We would like to thank Andrey Voynov for many useful discussions which helped inspire the idea for this study. We also wish to express our sincere appreciation to Pavel Bogomolov for constructive criticism and for his diligent proofreading of this paper.

我们要感谢Andrey Voynov富有启发性的讨论,这些讨论为本研究的构思提供了灵感。同时衷心感谢Pavel Bogomolov提出的建设性意见以及对本论文的细致校对。

Alex Hern. Facebook translates “good morning” into “attack them”, leading to arrest. The Guardian, 2018.

Alex Hern. Facebook将"good morning"翻译成"attack them"导致用户被捕。The Guardian, 2018.

阅读全文(20积分)