[论文翻译]深度高光谱先验:单幅图像去噪、修复与超分辨率


原文地址:https://arxiv.org/pdf/1902.00301v2


Deep Hyper spectral Prior: Single-Image Denoising, Inpainting, Super-Resolution

深度高光谱先验:单幅图像去噪、修复与超分辨率

Abstract

摘要

Deep learning algorithms have demonstrated state-ofthe-art performance in various tasks of image restoration. This was made possible through the ability of CNNs to learn from large exemplar sets. However, the latter becomes an issue for hyper spectral image processing where datasets commonly consist of just a few images. In this work, we propose a new approach to denoising, inpainting, and superresolution of hyper spectral image data using intrinsic properties of a CNN without any training. The performance of the given algorithm is shown to be comparable to the performance of trained networks, while its application is not restricted by the availability of training data. This work is an extension of original “deep prior” algorithm to hyperspectral imaging domain and 3D-convolutional networks.

深度学习算法已在图像修复的多种任务中展现出顶尖性能。这得益于卷积神经网络 (CNN) 从大型样本集中学习的能力。然而,对于高光谱 (hyper spectral) 图像处理而言,由于数据集通常仅包含少量图像,这一优势反而成为瓶颈。本研究提出了一种无需训练、仅利用 CNN 固有特性的新方法,用于高光谱图像数据的去噪、修复和超分辨率重建。实验表明,该算法性能可与训练后的网络相媲美,且不受训练数据可用性的限制。本工作将原始"深度先验 (deep prior)"算法扩展至高光谱成像领域,并采用 3D 卷积网络实现。[20]

1. Introduction

1. 引言

Deep Convolutional Neural Networks (CNNs) are occupying more and more leading positions in benchmarks of various image processing tasks. Commonly, it is related to the excellent representative ability of hierarchical convolutional layers which allows CNNs to learn a large amount of visual data without any hand-crafted assumptions. Ulyanov et al. [30] were the first to show that not only a learning ability but also the inner structure of a CNN itself can be beneficial for processing of image data.

深度卷积神经网络 (CNN) 在各种图像处理任务的基准测试中占据越来越重要的地位。这通常归功于其分层卷积层出色的表征能力,使得CNN无需任何人工假设就能学习大量视觉数据。Ulyanov等人 [30] 首次证明,CNN不仅具有学习能力,其内部结构本身也有利于图像数据处理。

For example, the inverse task of image restoration, such as inpainting, noise removal, or super-resolution, can be formulated as an energy minimization problem as follows:

例如,图像复原的逆任务(如修复、去噪或超分辨率)可以表述为如下能量最小化问题:

$$
x^{}=\operatorname*{min}{x}E(x,x_{0})+R(x)\quad,
$$

$$
x^{}=\operatorname*{min}{x}E(x,x_{0})+R(x)\quad,
$$

where $E(x,x_{0})$ is a task related metric, $x$ and $x_{0}$ are original and corrupted images, and $R(x)$ is a regular iz ation term (image prior) which can be chosen manually or can be learned from data (as it happens in the vast majority of CNN-based methods). However, the theory of Ulyanov et al. states that image prior can be found in the space of the network’s parameters directly, through the optimization process, which allows removal of regular iz ation term, and searching a solution as:

其中 $E(x,x_{0})$ 是任务相关度量,$x$ 和 $x_{0}$ 分别为原始图像与受损图像,$R(x)$ 是可手动选择或从数据中学习(如绝大多数基于CNN的方法)的正则化项(图像先验)。但Ulyanov等人的理论指出:通过优化过程可直接在网络参数空间中发现图像先验,从而省略正则化项,并将解搜索形式简化为:

$$
x^{}=f_{\theta^{}}(z),\quad w h e r e\theta^{}=\underset{x}{\arg\operatorname*{min}}E(f_{\theta}(z),x_{0})
$$

$$
x^{}=f_{\theta^{}}(z),\quad 其中\theta^{}=\underset{x}{\arg\operatorname*{min}}E(f_{\theta}(z),x_{0})
$$

Here, $f$ is a CNN with parameters , and $z$ is a fixed input (noise). Thereby, an original image can be restored via optimization of the network’s weights using only a corrupted image.

这里,$f$ 是一个带参数的 CNN (卷积神经网络),$z$ 是固定输入(噪声)。因此,仅通过使用损坏图像优化网络权重即可恢复原始图像。

This approach has a particularly high significance in the domain of hyper spectral imaging (HSI). Currently, HSI is a powerful tool which is widely used in remote sensing, agriculture, cultural heritage, food industry, p harm ace u tics, etc. The complexity of hyper spectral equipment and process of data acquisition make corruption of image data even more likely than it is for RGB imaging. Thus, it generates an increased demand for algorithms of hyper spectral image restoration. But, at the same time, accurate learningbased methods can hardly be used due to the lack of data. The complexity of data acquisition does not allow gathering of large custom datasets for a particular task, and even openly available ones are very limited and rarely exceed one hundred images, sometimes consisting of just one image [3][15][31].

这种方法在高光谱成像(HSI)领域具有特别重要的意义。目前,HSI是一种强大的工具,广泛应用于遥感、农业、文化遗产、食品工业、制药等领域。高光谱设备的复杂性和数据采集过程使得图像数据比RGB成像更容易出现损坏。因此,对高光谱图像恢复算法的需求日益增加。但与此同时,由于数据缺乏,很难使用基于学习的精确方法。数据采集的复杂性使得无法为特定任务收集大型定制数据集,即使是公开可用的数据集也非常有限,很少超过一百张图像,有时仅包含一张图像[3][15][31]。

Our work aims to solve this problem and, altogether, our contributions can be formulated as follows:

我们的工作旨在解决这一问题,总体而言,我们的贡献可归纳如下:

2. Related Works

2. 相关工作

In this section, we briefly introduce recent advances in hyper spectral denoising, inpainting, and super-resolution.

在本节中,我们简要介绍高光谱去噪、修复和超分辨率领域的最新进展。

One way to perform HSI denoising is to apply 2D algorithms to each band separately. Such an approach can utilize bilateral [29] or NL-means filtering [7], total variation [27], block-matching 3D-filtering [9], or novel CNN-based techniques, e.g. DnCNN [39]. However, not taking spectral data into consideration may cause distortions and artifacts in the spectral domain. This, has given rise to a family of algorithms based on spatial-spectral features, such as spatio spectral derivative-domain wavelet shrinkage [24], lowrank tensor approximation [26], low-rank matrix recovery [38], and most recently FastHyDe algorithm [40], which utilizes sparse representation of an image linked to its lowrank and self-similarity characteristics. A deep learning paradigm has been used in the 3D modification of DnCNN, and more advanced HSI-oriented network HSID-CNN [36].

一种进行高光谱图像(HSI)去噪的方法是对每个波段单独应用二维算法。这种方法可以使用双边滤波[29]或非局部均值滤波[7]、全变分[27]、块匹配三维滤波[9],或基于CNN的新技术(如DnCNN[39])。然而,不考虑光谱数据可能会导致光谱域的失真和伪影。这催生了一系列基于空间-光谱特征的算法,如空间-光谱导数域小波收缩[24]、低秩张量逼近[26]、低秩矩阵恢复[38],以及最近利用图像稀疏表示(与其低秩和自相似特性相关)的FastHyDe算法[40]。深度学习范式已被用于DnCNN的三维改进,以及更先进的面向HSI的网络HSID-CNN[36]。

The inpainting of grayscale and RGB images convent ion ally rely on patch-similarity and variation al algorithms to propagate information from intact regions to holes [2][4][13] and may be used for HSI data in a band-wise manner. Novel inpainting approaches benefit from realistic reconstruction ability of GANs, which allows filling even large holes with remarkable accuracy [16][21][35]. However, they rely on large training datasets. There are also a number of HSI-specific inpainting methods [6][8][10]. Similar to our approach, Addesso et al. [1] address HSI inpainting as an optimization task with (hand-crafted) collaborative total variation regularize r, while Yao et al. [34] designed a regularize r based on the Criminisi’s inpainting method. Recently, the FastHyIn algorithm [40] (an extension of FastHyDe) demonstrated state-of-the-art HSI inpainting accuracy, with the only remark that similar to [28] it utilizes information from intact bands, thus cannot be used in cases of all-bands corruption.

灰度图像和RGB图像的修复传统上依赖于块相似性和变分算法,将信息从完整区域传播到缺失区域 [2][4][13],并可逐波段应用于HSI数据。新型修复方法得益于GAN (Generative Adversarial Network) 的真实重建能力,能够以极高精度填补大面积缺失 [16][21][35],但这类方法依赖大规模训练数据集。此外还存在多种HSI专用修复方法 [6][8][10]。与我们的方法类似,Addesso等人 [1] 将HSI修复视为带有(人工设计)协作总变分正则项的优化任务,而Yao等人 [34] 则基于Criminisi修复方法设计了正则项。近期提出的FastHyIn算法 [40](FastHyDe的扩展)实现了最先进的HSI修复精度,但需注意的是:与 [28] 类似,该算法利用了完整波段的信息,因此不适用于全波段损坏的情况。

The majority of hyper spectral super-resolution (SR) algorithms perform a fusion of input hyper spectral image with a high-resolution multi spectral image which is easier to obtain [11][25]. Single-image SR is a more sophisticated task. The attempts to solve it include spectral mixture analysis [19], low-rank tensor approximation [32], Local-Global Combined Network [18], MRF-based energy minimization [17], transfer learning [37], the recent method of 3D Full Convolutional Neural Network [23], and others.

大多数高光谱超分辨率 (SR) 算法通过将输入的高光谱图像与更易获取的高分辨率多光谱图像进行融合来实现 [11][25]。单图像超分辨率是一项更为复杂的任务。现有解决方案包括光谱混合分析 [19]、低秩张量近似 [32]、局部-全局联合网络 [18]、基于 MRF 的能量最小化 [17]、迁移学习 [37]、最新的 3D 全卷积神经网络方法 [23] 等。

3. Methodology

3. 方法论

The main idea of the method is captured by Equations (1) and (2). The fully-convolutional encoder-decoder $f_{\theta}$ is designed to translate a fixed input $z$ filled with noise to the original image $x$ , conditioned on corrupted image $x_{0}$ . We use the paradigm of the “deep prior” method [30] which says that optimal weights $\theta$ of a network $f_{\theta}$ can be found from the intrinsic prior contained in a network structure instead of learning them from the data. Particularly, $\theta$ is approximated by the minimizer $\theta^{*}$ :

该方法的核心思想体现在公式(1)和(2)中。全卷积编码器-解码器 $f_{\theta}$ 的设计目标是将填充噪声的固定输入 $z$ 转换为原始图像 $x$ ,并以受损图像 $x_{0}$ 为条件。我们采用"深度先验"方法 [30] 的范式,即网络 $f_{\theta}$ 的最优权重 $\theta$ 可以从网络结构本身包含的内在先验中寻找,而无需从数据中学习。具体而言,$\theta$ 通过最小化器 $\theta^{*}$ 来近似:

$$
\theta^{}=\arg\operatorname*{min}{x}E(f_{\theta}(z),x_{0})
$$

$$
\theta^{}=\arg\operatorname*{min}{x}E(f_{\theta}(z),x_{0})
$$

which can be obtained using an optimizer such as gradient descent from randomly initiated parameters. It is also possible to optimize over input $z$ (not covered in this work).

可以通过梯度下降等优化器从随机初始化的参数中获得。也可以对输入 $z$ 进行优化(本文未涉及)。

The energy function $E(x,x_{0})$ may be chosen accordingly to the application task. In the case of a basic reconstruction problem, it may be formulated as $L_{2}$ -distance:

能量函数 $E(x,x_{0})$ 可根据应用任务选择。对于基础重建问题,可将其表述为 $L_{2}$ 距离:

$$
E(x,x_{0})=||x-x_{0}||^{2}
$$

$$
E(x,x_{0})=||x-x_{0}||^{2}
$$

It was shown [30] that optimization converges faster in cases of natural-looking images rather than random noise, i.e. the process demonstrates high impedance to noise and low impedance to signal. The latter can be used to interrupt the reconstruction before the noise will be recovered, which will lead to blind image denoising.

研究表明 [30],在自然图像而非随机噪声的情况下,优化收敛速度更快,即该过程对噪声呈现高阻抗、对信号呈现低阻抗。这一特性可用于在噪声被恢复前中断重建,从而实现盲图像去噪。

Also, $E(x,x_{0})$ term can be modified to fill the missing regions in a inpainting problem with mask $m\in{0,1}$ :

此外,$E(x,x_{0})$ 项可被修改为用掩码 $m\in{0,1}$ 填补修复问题中的缺失区域:

$$
E(x,x_{0})=||x-x_{0}\circ m||^{2}
$$

$$
E(x,x_{0})=||x-x_{0}\circ m||^{2}
$$

where $\circ$ is a Hadamard product. Otherwise, down sampling operator $d(x,\alpha):x^{\alpha N\bar{\times}\alpha N\times C}\to x^{N\times N\times C}$ with factor $\alpha$ can be used in $E(x,x_{0})$ to address super-resolution task as a

其中 $\circ$ 表示哈达玛积 (Hadamard product)。否则,下采样算子 $d(x,\alpha):x^{\alpha N\bar{\times}\alpha N\times C}\to x^{N\times N\times C}$ 可通过因子 $\alpha$ 应用于 $E(x,x_{0})$ 来处理超分辨率任务。

prediction of high-res image $x$ which, when down sampled, is the same as low-res image $x_{0}$ :

高分辨率图像 $x$ 的预测,其下采样结果与低分辨率图像 $x_{0}$ 相同:

$$
E(x,x_{0})=||d(x)-x_{0}||^{2}
$$

$$
E(x,x_{0})=||d(x)-x_{0}||^{2}
$$

3.1. Implementation details

3.1. 实现细节

It was found, that different fully-convolutional encoderdecoder architectures (sometimes with skip-connections) are suitable for the implementation of the given method. For the exact description in details please see the source code1. Although parameters differ for each sub-task, the general framework is common and is illustrated in Fig. 1.

研究发现,不同的全卷积编码器-解码器架构(有时带有跳跃连接)适用于该方法的实现。具体细节描述请参阅源代码1。尽管每个子任务的参数各不相同,但总体框架是通用的,如图 1 所示。

We experiment with two versions of the networks – 2D and 3D. While 2D convolutions are still able to process multi-channel input, they cause shrinkage of spectral information already at the first convolutional layer and recover it back at the last one. In this case, filters of these layers would have an “elongated” shape with a depth equal to the depth of the hyper spectral image. A 3D convolution allows the use of smaller filters (e.g. $3\times3\times3,$ ) along the whole network because its output is a 3D volume. This ability to preserve a 3D shape of the input is considered to be beneficial for the processing of hyper spectral data. It is worth mentioning, that unlike a conventional “hourglass” architecture, where data is consecutively down sampled/upsampled along two spatial dimensions, processing of 3D volumes allows doing the same with third dimension as well (see Fig. 1).

我们实验了两种网络版本——2D和3D。虽然2D卷积仍能处理多通道输入,但它们会在第一个卷积层就导致光谱信息收缩,并在最后一层恢复。这种情况下,这些层的滤波器会呈现"拉长"形状,其深度与高光谱图像的深度相同。3D卷积由于输出是3D体积,因此允许在整个网络中使用更小的滤波器(例如 $3\times3\times3$ )。这种保持输入3D形状的能力被认为有利于高光谱数据处理。值得一提的是,与传统"沙漏"架构(数据仅沿两个空间维度连续下采样/上采样)不同,3D体积处理可以同时对第三个维度进行相同操作(见图1)。

The input of the network is uniform noise in range 0-0.1 of a shape equal to the shape of a processed hyper spectral image. Optionally, it is additionally perturbed at each iteration with Gaussian noise of specified $\sigma$ . The activation