[论文翻译]深度高光谱先验:单幅图像去噪、修复与超分辨率


原文地址:https://arxiv.org/pdf/1902.00301v2


Deep Hyper spectral Prior: Single-Image Denoising, Inpainting, Super-Resolution

深度高光谱先验:单幅图像去噪、修复与超分辨率

Abstract

摘要

Deep learning algorithms have demonstrated state-ofthe-art performance in various tasks of image restoration. This was made possible through the ability of CNNs to learn from large exemplar sets. However, the latter becomes an issue for hyper spectral image processing where datasets commonly consist of just a few images. In this work, we propose a new approach to denoising, inpainting, and superresolution of hyper spectral image data using intrinsic properties of a CNN without any training. The performance of the given algorithm is shown to be comparable to the performance of trained networks, while its application is not restricted by the availability of training data. This work is an extension of original “deep prior” algorithm to hyperspectral imaging domain and 3D-convolutional networks.

深度学习算法已在图像修复的多种任务中展现出顶尖性能。这得益于卷积神经网络 (CNN) 从大型样本集中学习的能力。然而,对于高光谱 (hyper spectral) 图像处理而言,由于数据集通常仅包含少量图像,这一优势反而成为瓶颈。本研究提出了一种无需训练、仅利用 CNN 固有特性的新方法,用于高光谱图像数据的去噪、修复和超分辨率重建。实验表明,该算法性能可与训练后的网络相媲美,且不受训练数据可用性的限制。本工作将原始"深度先验 (deep prior)"算法扩展至高光谱成像领域,并采用 3D 卷积网络实现。[20]

1. Introduction

1. 引言

Deep Convolutional Neural Networks (CNNs) are occupying more and more leading positions in benchmarks of various image processing tasks. Commonly, it is related to the excellent representative ability of hierarchical convolutional layers which allows CNNs to learn a large amount of visual data without any hand-crafted assumptions. Ulyanov et al. [30] were the first to show that not only a learning ability but also the inner structure of a CNN itself can be beneficial for processing of image data.

深度卷积神经网络 (CNN) 在各种图像处理任务的基准测试中占据越来越重要的地位。这通常归功于其分层卷积层出色的表征能力,使得CNN无需任何人工假设就能学习大量视觉数据。Ulyanov等人 [30] 首次证明,CNN不仅具有学习能力,其内部结构本身也有利于图像数据处理。

For example, the inverse task of image restoration, such as inpainting, noise removal, or super-resolution, can be formulated as an energy minimization problem as follows:

例如,图像复原的逆任务(如修复、去噪或超分辨率)可以表述为如下能量最小化问题:

$$
x^{}=\operatorname*{min}{x}E(x,x_{0})+R(x)\quad,
$$

$$
x^{}=\operatorname*{min}{x}E(x,x_{0})+R(x)\quad,
$$

where $E(x,x_{0})$ is a task related metric, $x$ and $x_{0}$ are original and corrupted images, and $R(x)$ is a regular iz ation term (image prior) which can be chosen manually or can be learned from data (as it happens in the vast majority of CNN-based methods). However, the theory of Ulyanov et al. states that image prior can be found in the space of the network’s parameters directly, through the optimization process, which allows removal of regular iz ation term, and searching a solution as:

其中 $E(x,x_{0})$ 是任务相关度量,$x$ 和 $x_{0}$ 分别为原始图像与受损图像,$R(x)$ 是可手动选择或从数据中学习(如绝大多数基于CNN的方法)的正则化项(图像先验)。但Ulyanov等人的理论指出:通过优化过程可直接在网络参数空间中发现图像先验,从而省略正则化项,并将解搜索形式简化为:

$$
x^{}=f_{\theta^{}}(z),\quad w h e r e\theta^{}=\underset{x}{\arg\operatorname*{min}}E(f_{\theta}(z),x_{0})
$$

$$
x^{}=f_{\theta^{}}(z),\quad 其中\theta^{}=\underset{x}{\arg\operatorname*{min}}E(f_{\theta}(z),x_{0})
$$

Here, $f$ is a CNN with parameters , and $z$ is a fixed input (noise). Thereby, an original image can be restored via optimization of the network’s weights using only a corrupted image.

这里,$f$ 是一个带参数的 CNN (卷积神经网络),$z$ 是固定输入(噪声)。因此,仅通过使用损坏图像优化网络权重即可恢复原始图像。

This approach has a particularly high significance in the domain of hyper spectral imaging (HSI). Currently, HSI is a powerful tool which is widely used in remote sensing, agriculture, cultural heritage, food industry, p harm ace u tics, etc. The complexity of hyper spectral equipment and process of data acquisition make corruption of image data even more likely than it is for RGB imaging. Thus, it generates an increased demand for algorithms of hyper spectral image restoration. But, at the same time, accurate learningbased methods can hardly be used due to the lack of data. The complexity of data acquisition does not allow gathering of large custom datasets for a particular task, and even openly available ones are very limited and rarely exceed one hundred images, sometimes consisting of just one image [3][15][31].

这种方法在高光谱成像(HSI)领域具有特别重要的意义。目前,HSI是一种强大的工具,广泛应用于遥感、农业、文化遗产、食品工业、制药等领域。高光谱设备的复杂性和数据采集过程使得图像数据比RGB成像更容易出现损坏。因此,对高光谱图像恢复算法的需求日益增加。但与此同时,由于数据缺乏,很难使用基于学习的精确方法。数据采集的复杂性使得无法为特定任务收集大型定制数据集,即使是公开可用的数据集也非常有限,很少超过一百张图像,有时仅包含一张图像[3][15][31]。

Our work aims to solve this problem and, altogether, our contributions can be formulated as follows:

我们的工作旨在解决这一问题,总体而言,我们的贡献可归纳如下:

2. Related Works

2. 相关工作

In this section, we briefly introduce recent advances in hyper spectral denoising, inpainting, and super-resolution.

在本节中,我们简要介绍高光谱去噪、修复和超分辨率领域的最新进展。

One way to perform HSI denoising is to apply 2D algorithms to each band separately. Such an approach can utilize bilateral [29] or NL-means filtering [7], total variation [27], block-matching 3D-filtering [9], or novel CNN-based techniques, e.g. DnCNN [39]. However, not taking spectral data into consideration may cause distortions and artifacts in the spectral domain. This, has given rise to a family of algorithms based on spatial-spectral features, such as spatio spectral derivative-domain wavelet shrinkage [24], lowrank tensor approximation [26], low-rank matrix recovery [38], and most recently FastHyDe algorithm [40], which utilizes sparse representation of an image linked to its lowrank and self-similarity characteristics. A deep learning paradigm has been used in the 3D modification of DnCNN, and more advanced HSI-oriented network HSID-CNN [36].

一种进行高光谱图像(HSI)去噪的方法是对每个波段单独应用二维算法。这种方法可以使用双边滤波[29]或非局部均值滤波[7]、全变分[27]、块匹配三维滤波[9],或基于CNN的新技术(如DnCNN[39])。然而,不考虑光谱数据可能会导致光谱域的失真和伪影。这催生了一系列基于空间-光谱特征的算法,如空间-光谱导数域小波收缩[24]、低秩张量逼近[26]、低秩矩阵恢复[38],以及最近利用图像稀疏表示(与其低秩和自相似特性相关)的FastHyDe算法[40]。深度学习范式已被用于DnCNN的三维改进,以及更先进的面向HSI的网络HSID-CNN[36]。

The inpainting of grayscale and RGB images convent ion ally rely on patch-similarity and variation al algorithms to propagate information from intact regions to holes [2][4][13] and may be used for HSI data in a band-wise manner. Novel inpainting approaches benefit from realistic reconstruction ability of GANs, which allows filling even large holes with remarkable accuracy [16][21][35]. However, they rely on large training datasets. There are also a number of HSI-specific inpainting methods [6][8][10]. Similar to our approach, Addesso et al. [1] address HSI inpainting as an optimization task with (hand-crafted) collaborative total variation regularize r, while Yao et al. [34] designed a regularize r based on the Criminisi’s inpainting method. Recently, the FastHyIn algorithm [40] (an extension of FastHyDe) demonstrated state-of-the-art HSI inpainting accuracy, with the only remark that similar to [28] it utilizes information from intact bands, thus cannot be used in cases of all-bands corruption.

灰度图像和RGB图像的修复传统上依赖于块相似性和变分算法,将信息从完整区域传播到缺失区域 [2][4][13],并可逐波段应用于HSI数据。新型修复方法得益于GAN (Generative Adversarial Network) 的真实重建能力,能够以极高精度填补大面积缺失 [16][21][35],但这类方法依赖大规模训练数据集。此外还存在多种HSI专用修复方法 [6][8][10]。与我们的方法类似,Addesso等人 [1] 将HSI修复视为带有(人工设计)协作总变分正则项的优化任务,而Yao等人 [34] 则基于Criminisi修复方法设计了正则项。近期提出的FastHyIn算法 [40](FastHyDe的扩展)实现了最先进的HSI修复精度,但需注意的是:与 [28] 类似,该算法利用了完整波段的信息,因此不适用于全波段损坏的情况。

The majority of hyper spectral super-resolution (SR) algorithms perform a fusion of input hyper spectral image with a high-resolution multi spectral image which is easier to obtain [11][25]. Single-image SR is a more sophisticated task. The attempts to solve it include spectral mixture analysis [19], low-rank tensor approximation [32], Local-Global Combined Network [18], MRF-based energy minimization [17], transfer learning [37], the recent method of 3D Full Convolutional Neural Network [23], and others.

大多数高光谱超分辨率 (SR) 算法通过将输入的高光谱图像与更易获取的高分辨率多光谱图像进行融合来实现 [11][25]。单图像超分辨率是一项更为复杂的任务。现有解决方案包括光谱混合分析 [19]、低秩张量近似 [32]、局部-全局联合网络 [18]、基于 MRF 的能量最小化 [17]、迁移学习 [37]、最新的 3D 全卷积神经网络方法 [23] 等。

3. Methodology

3. 方法论

The main idea of the method is captured by Equations (1) and (2). The fully-convolutional encoder-decoder $f_{\theta}$ is designed to translate a fixed input $z$ filled with noise to the original image $x$ , conditioned on corrupted image $x_{0}$ . We use the paradigm of the “deep prior” method [30] which says that optimal weights $\theta$ of a network $f_{\theta}$ can be found from the intrinsic prior contained in a network structure instead of learning them from the data. Particularly, $\theta$ is approximated by the minimizer $\theta^{*}$ :

该方法的核心思想体现在公式(1)和(2)中。全卷积编码器-解码器 $f_{\theta}$ 的设计目标是将填充噪声的固定输入 $z$ 转换为原始图像 $x$ ,并以受损图像 $x_{0}$ 为条件。我们采用"深度先验"方法 [30] 的范式,即网络 $f_{\theta}$ 的最优权重 $\theta$ 可以从网络结构本身包含的内在先验中寻找,而无需从数据中学习。具体而言,$\theta$ 通过最小化器 $\theta^{*}$ 来近似:

$$
\theta^{}=\arg\operatorname*{min}{x}E(f_{\theta}(z),x_{0})
$$

$$
\theta^{}=\arg\operatorname*{min}{x}E(f_{\theta}(z),x_{0})
$$

which can be obtained using an optimizer such as gradient descent from randomly initiated parameters. It is also possible to optimize over input $z$ (not covered in this work).

可以通过梯度下降等优化器从随机初始化的参数中获得。也可以对输入 $z$ 进行优化(本文未涉及)。

The energy function $E(x,x_{0})$ may be chosen accordingly to the application task. In the case of a basic reconstruction problem, it may be formulated as $L_{2}$ -distance:

能量函数 $E(x,x_{0})$ 可根据应用任务选择。对于基础重建问题,可将其表述为 $L_{2}$ 距离:

$$
E(x,x_{0})=||x-x_{0}||^{2}
$$

$$
E(x,x_{0})=||x-x_{0}||^{2}
$$

It was shown [30] that optimization converges faster in cases of natural-looking images rather than random noise, i.e. the process demonstrates high impedance to noise and low impedance to signal. The latter can be used to interrupt the reconstruction before the noise will be recovered, which will lead to blind image denoising.

研究表明 [30],在自然图像而非随机噪声的情况下,优化收敛速度更快,即该过程对噪声呈现高阻抗、对信号呈现低阻抗。这一特性可用于在噪声被恢复前中断重建,从而实现盲图像去噪。

Also, $E(x,x_{0})$ term can be modified to fill the missing regions in a inpainting problem with mask $m\in{0,1}$ :

此外,$E(x,x_{0})$ 项可被修改为用掩码 $m\in{0,1}$ 填补修复问题中的缺失区域:

$$
E(x,x_{0})=||x-x_{0}\circ m||^{2}
$$

$$
E(x,x_{0})=||x-x_{0}\circ m||^{2}
$$

where $\circ$ is a Hadamard product. Otherwise, down sampling operator $d(x,\alpha):x^{\alpha N\bar{\times}\alpha N\times C}\to x^{N\times N\times C}$ with factor $\alpha$ can be used in $E(x,x_{0})$ to address super-resolution task as a

其中 $\circ$ 表示哈达玛积 (Hadamard product)。否则,下采样算子 $d(x,\alpha):x^{\alpha N\bar{\times}\alpha N\times C}\to x^{N\times N\times C}$ 可通过因子 $\alpha$ 应用于 $E(x,x_{0})$ 来处理超分辨率任务。

prediction of high-res image $x$ which, when down sampled, is the same as low-res image $x_{0}$ :

高分辨率图像 $x$ 的预测,其下采样结果与低分辨率图像 $x_{0}$ 相同:

$$
E(x,x_{0})=||d(x)-x_{0}||^{2}
$$

$$
E(x,x_{0})=||d(x)-x_{0}||^{2}
$$

3.1. Implementation details

3.1. 实现细节

It was found, that different fully-convolutional encoderdecoder architectures (sometimes with skip-connections) are suitable for the implementation of the given method. For the exact description in details please see the source code1. Although parameters differ for each sub-task, the general framework is common and is illustrated in Fig. 1.

研究发现,不同的全卷积编码器-解码器架构(有时带有跳跃连接)适用于该方法的实现。具体细节描述请参阅源代码1。尽管每个子任务的参数各不相同,但总体框架是通用的,如图 1 所示。

We experiment with two versions of the networks – 2D and 3D. While 2D convolutions are still able to process multi-channel input, they cause shrinkage of spectral information already at the first convolutional layer and recover it back at the last one. In this case, filters of these layers would have an “elongated” shape with a depth equal to the depth of the hyper spectral image. A 3D convolution allows the use of smaller filters (e.g. $3\times3\times3,$ ) along the whole network because its output is a 3D volume. This ability to preserve a 3D shape of the input is considered to be beneficial for the processing of hyper spectral data. It is worth mentioning, that unlike a conventional “hourglass” architecture, where data is consecutively down sampled/upsampled along two spatial dimensions, processing of 3D volumes allows doing the same with third dimension as well (see Fig. 1).

我们实验了两种网络版本——2D和3D。虽然2D卷积仍能处理多通道输入,但它们会在第一个卷积层就导致光谱信息收缩,并在最后一层恢复。这种情况下,这些层的滤波器会呈现"拉长"形状,其深度与高光谱图像的深度相同。3D卷积由于输出是3D体积,因此允许在整个网络中使用更小的滤波器(例如 $3\times3\times3$ )。这种保持输入3D形状的能力被认为有利于高光谱数据处理。值得一提的是,与传统"沙漏"架构(数据仅沿两个空间维度连续下采样/上采样)不同,3D体积处理可以同时对第三个维度进行相同操作(见图1)。

The input of the network is uniform noise in range 0-0.1 of a shape equal to the shape of a processed hyper spectral image. Optionally, it is additionally perturbed at each iteration with Gaussian noise of specified $\sigma$ . The activation function used is LeakyReLU [33]. Down sampling is performed using a stride of convolutions, while upsampling is either “nearest” or bilinear (trilinear for a 3D case). Other methods can also be used but these prevailed in our experiments. ADAM algorithm was used for optimization.

网络的输入是0-0.1范围内、形状与处理后的高光谱图像相同的均匀噪声。可选地,在每次迭代时还会用指定$\sigma$的高斯噪声进行扰动。使用的激活函数是LeakyReLU [33]。下采样通过卷积步长实现,而上采样采用"最近邻"或双线性(三维情况下为三线性)方法。其他方法也可使用,但这些在我们的实验中效果最佳。优化采用ADAM算法。


Figure 1: 2D (top) and 3D (bottom) convolutional architectures used in our experiments. Size of the filters of the first convolutional layer is illustrated under the input image. Pooling and activation layers are omitted for simplicity sake.

图 1: 实验中使用的2D(上)和3D(下)卷积架构。输入图像下方标明了第一层卷积滤波器的尺寸。为简洁起见,省略了池化层和激活层。


Figure 2: HSI denoising results. HYDICE DC Mall image; false-color with bands (57, 27, 17).

图 2: HSI去噪结果。HYDICE DC Mall图像;(57, 27, 17)波段伪彩色显示。

4. Experimental setup

4. 实验设置

Denoising. We evaluate the ability of an algorithm to remove noise using HYDICE DC Mall data [31] with synthetically added Gaussian noise of $\sigma=100$ . The image consists of 191 channels and was cropped to $200\times200$ pixels size. Results (Fig. 2, Table 1) are compared to HSSNR [24], LRTA [26], BM4D [22], LRMR [38], and HSID-CNN [36] methods.

去噪。我们使用HYDICE DC Mall数据[31]评估算法去除噪声的能力,该数据添加了$\sigma=100$的高斯噪声。图像包含191个通道,裁剪为$200\times200$像素大小。结果(图2, 表1)与HSSNR[24]、LRTA[26]、BM4D[22]、LRMR[38]和HSID-CNN[36]方法进行了对比。

Inpainting. The Indian Pines dataset [3] $(145\times145\times200)$ from AVIRIS sensor was used to test the proposed inpainting method. The mask of corrupted strips was applied to all bands. Results (Fig. 4, Table 1) are compared to MumfordShah [14] and fourth-order total variation $\mathrm{(TV}{\mathrm{-}}H^{-1}\cdot$ ) [5] 2D methods, as well as the state-of-the-art HSI inpainting method FastHyIn [40].

修复。采用AVIRIS传感器获取的Indian Pines数据集[3] $(145\times145\times200)$ 测试所提出的修复方法。在所有波段上应用了损坏条纹的掩膜。结果(图4,表1)与MumfordShah[14]、四阶全变分 $\mathrm{(TV}{\mathrm{-}}H^{-1}\cdot$ [5]等2D方法,以及最先进的HSI修复方法FastHyIn[40]进行了对比。

Super-resolution. The experiment was conducted using ROSIS-03 image of Pavia Center [15] (102 spectral bands). A patch of $150\times150$ pixels was cropped from the original image and down sampled by a factor of 2 by spatial dimensions. The evaluation includes “nearest” and bilinear upsampling, learning-based method SRCNN [12] applied band-wise (msiSRCNN) or by groups of 3 bands (3BSRCNN [20]), and 3D-FCNN [23]. Results are presented in Fig. 3 and Table 1.

超分辨率。实验采用Pavia Center的ROSIS-03影像[15](102个光谱波段),从原始图像中裁剪出$150\times150$像素的区块,并在空间维度上进行2倍降采样。评估方法包括"最近邻"和双线性上采样、逐波段应用基于学习的方法SRCNN[12](msiSRCNN)、3波段分组处理的3BSRCNN[20],以及3D-FCNN[23]。结果展示在图3和表1中。

5. Results and discussion

5. 结果与讨论

As can be seen, the proposed method outperforms all single-image algorithms and demonstrate performance comparable to trained CNNs, while not being trained on any dataset before. Surprisingly, the 2D version of the Deep HS prior outperformed the 3D-convolutional one in all experiments. Note that the 2D implementation has nothing to do with band-wise processing; instead, it captures spectral information in filters of the first convolutional layer and uses combinations of them at the subsequent layers. Besides, the 3D implementation requires significantly more memory and computational time for processing the same amount of data because it works with tensors of higher dimensionality (roughly speaking a 3D version will have $C$ times more parameters, where $C$ is a number of bands). Eventually, we find 2D implementation was sufficient for processing of hyper spectral data. And yet, it is worth mentioning that 3D version demonstrated comparable performance that is important from a theoretical point of view because it proves that CNN architectures based on 3D convolutions also contain the image prior within the intrinsic parameters.

可以看出,所提出的方法优于所有单图像算法,并展现出与训练过的CNN相当的性能,且此前未在任何数据集上进行训练。值得注意的是,在所有实验中,Deep HS先验的2D版本表现均优于3D卷积版本。需要说明的是,2D实现与逐波段处理无关,而是通过第一卷积层的滤波器捕获光谱信息,并在后续层中组合使用。此外,由于处理更高维度的张量(粗略来说,3D版本的参数量会是2D版本的$C$倍,其中$C$为波段数),3D实现需要显著更多的内存和计算时间来处理相同数据量。最终我们发现,2D实现已足以处理高光谱数据。不过值得一提的是,3D版本展现出相当的性能,这在理论层面具有重要意义——它证明了基于3D卷积的CNN架构同样在其内部参数中蕴含图像先验。

Table 1: Quantitative evaluation of the results.

表 1: 结果定量评估

去噪
HSSNR LRTA BM4D LRMR HSID-CNN (训练版) Deep HS prior 3D Deep HS prior 2D
MPSNR 16.31 23.17 22.57 24.31 25.29 23.24 25.05
MSSIM 0.605 0.849 0.812 0.879 0.901 0.852 0.889
SAM 24.73 9.122 9.761 10.46 8.406 9.910 8.606
修复
Do Nothing Mumford-Shah Inpainting TV-H-1 FastHyIn Deep HS prior 3D Deep HS prior 2D
MPSNR 17.75 24.74 27.68 28.08 35.34 37.54
MSSIM 0.722 0.890 0.911 0.920 0.966 0.979
SAM Inf 5.429 3.855 3.032 1.133 0.856
超分辨率
Nearest Bicubic msiSRCNN (训练版) 3B-SRCNN (训练版) 3D-FCNN (训练版) Deep HS prior 3D Deep HS prior 2D
MPSNR 29.98 31.10 32.48 32.69 33.92 32.31 33.67
MSSIM 0.921 0.937 0.957 0.960 0.969 0.945 0.967
SAM 4.786 4.592 4.617 4.661 4.140 4.692 4.211


Figure 3: HSI super-resolution results. Pavia Center image; rescale factor 2; band 25 visualization.

图 3: HSI超分辨率结果。Pavia Center图像;缩放因子2;波段25可视化。


Figure 4: HSI inpainting results. AVIRIS Indian Pines dataset; band 150. The mask of corrupted stripes (a) was applied to all bands, except case (d) where only bands 25:175 $(75%)$ were affected.

图 4: HSI修复结果。AVIRIS Indian Pines数据集;第150波段。损坏条纹的掩膜(a)应用于所有波段,除(d)情况外,其中仅25:175波段$(75%)$受影响。

6. Conclusions

6. 结论

Starting from the paradigm that the image prior can be found within a CNN itself and not be learned from training data or designed manually, we developed an effective single-hyper spectral-image restoration algorithm. Qualitative and quantitative evaluation of the results demonstrated superior effectiveness of the proposed algorithm compared to other single-image algorithms.

从图像先验可以在CNN内部发现而非从训练数据学习或手动设计的范式出发,我们开发了一种有效的高光谱单图像复原算法。定性定量评估结果表明,该算法相比其他单图像算法具有显著优势。

阅读全文(20积分)