[论文翻译]0.8% 奈奎斯特计算鬼成像的非实验性深度学习实现


原文地址:https://arxiv.org/pdf/2108.07673v1


0.8% Nyquist computational ghost imaging via non-experimental deep learning

0.8% 奈奎斯特计算鬼成像的非实验性深度学习实现

We present a framework for computational ghost imaging based on deep learning and customized pink noise speckle patterns. The deep neural network in this work, which can learn the sensing model and enhance image reconstruction quality, is trained merely by simulation. To demonstrate the subNyquist level in our work, the conventional computational ghost imaging results, reconstructed imaging results using white noise and pink noise via deep learning are compared under multiple sampling rates at different noise conditions. We show that the proposed scheme can provide highquality images with a sampling rate of $0.8%$ even when the object is outside the training dataset, and it is robust to noisy environments. This method is excellent for various applications, particularly those that require a low sampling rate, fast reconstruction efficiency, or experience strong noise interference.

我们提出了一种基于深度学习(deep learning)和定制粉红噪声散斑图案的计算鬼成像框架。本工作中的深度神经网络仅通过模拟训练就能学习传感模型并提升图像重建质量。为验证本工作的亚奈奎斯特采样水平,我们在不同噪声条件下的多种采样率场景中,对比了传统计算鬼成像结果、使用白噪声和粉红噪声通过深度学习重建的成像结果。实验表明,即使目标物体不在训练数据集中,该方案仍能在0.8%采样率下获得高质量图像,并对噪声环境具有强鲁棒性。该方法特别适用于需要低采样率、快速重建效率或面临强噪声干扰的各类应用场景。

I. INTRODUCTION

I. 引言

Ghost imaging (GI) [1–4] is an innovative method for measuring the spatial correlations between light beams. With GI, the signal light field interacts with the object and is collected by a single-pixel detector, and the reference light field, which does not interact with the object, falls onto the imaging detector. Therefore, the image information is not present in either beam alone but only revealed in their correlations. Computational ghost imaging (CGI) [5, 6] was proposed to further ameliorate and simplify this framework. In CGI, The reference arm that records the speckles is replaced by loading pre-generated patterns directly onto the spatial light modulator or the digital micro mirror device (DMD). The unconventional image is then revealed by correlating the sequentially recorded intensities at the single-pixel detector with the corresponding patterns. CGI finds a lot of applications such as wide spectrum imaging [7–9], remote sensing [10], and quantum-secured imaging [11].

鬼成像 (GI) [1–4] 是一种测量光束间空间相关性的创新方法。在鬼成像中,信号光场与物体相互作用后被单像素探测器收集,而未与物体相互作用的参考光场则投射到成像探测器上。因此,图像信息并不单独存在于任一光束中,而是仅体现在二者的相关性中。计算鬼成像 (CGI) [5, 6] 的提出进一步优化并简化了这一框架。在计算鬼成像中,记录散斑的参考臂被直接加载预生成图案的空间光调制器或数字微镜器件 (DMD) 所取代。随后,通过将单像素探测器依次记录的强度与对应图案相关联,即可重构出非常规图像。该技术在宽光谱成像 [7–9]、遥感 [10] 和量子安全成像 [11] 等领域具有广泛应用。

However, CGI generally requires a large number of samplings to reconstruct a high-quality image, or the signal would have been submerged under correlation fluctuations and environmental noise. To suppress the environmental noise and correlation fluctuations, the required minimum number of sampling is proportional to the total pixel number of the pattern applied on DMD, i.e., the Nyquist sampling limit [12, 13]. The image could have a meager quality with a limited sampling number. This demanding requirement hindered CGI from fully replacing conventional photography. A large number of schemes have been proposed to improve CGI’s speed and decrease the sampling rate (sub-Nyquist). For instance, compressive sensing imaging can reconstruct images with a relatively low sampling rate by exploiting the sparsity of the objects [14–17]. It nevertheless largely depends on the sparsity of objects and is sensitive to noise [18]. Ortho normalized noise patterns can be used to suppress the noise and improve the image’s quality under a limited sampling number [19, 20]. In particular, the ortho normalized colored noise patterns can break the Nyquist limit down to $\sim5%$ [20]. Fourier and sequence-ordered Walsh-Hadamard patterns, which are orthogonal to each other in time or spatial domain, were also applied to the sub-Nyquist imaging [21–23]. The Russian doll [24] and cake-cutting [25] ordering of Walsh-Hadamard patterns can minimize the sampling ratio to 5%-10% Nyquist limit.

然而,计算鬼成像 (CGI) 通常需要大量采样才能重建高质量图像,否则信号会被相关波动和环境噪声淹没。为抑制环境噪声和相关波动,所需的最小采样数与数字微镜器件 (DMD) 上加载图案的总像素数成正比,即奈奎斯特采样极限 [12, 13]。在有限采样数下,图像质量可能极差。这一严苛要求阻碍了CGI完全取代传统摄影。目前已有大量方案被提出以提高CGI速度并降低采样率(亚奈奎斯特)。例如,压缩感知成像通过利用物体的稀疏性,能以较低采样率重建图像 [14-17],但其效果很大程度上依赖于物体的稀疏性且对噪声敏感 [18]。正交归一化噪声图案可用于抑制噪声,并在有限采样数下提升图像质量 [19, 20]。特别是正交归一化彩色噪声图案能将奈奎斯特极限降低至 $\sim5%$ [20]。在时域或空域相互正交的傅里叶排序和沃尔什-哈达玛序列排序图案也被应用于亚奈奎斯特成像 [21-23]。俄罗斯套娃 [24] 和蛋糕切割 [25] 排序的沃尔什-哈达玛图案能将采样率降至奈奎斯特极限的5%-10%。

Recently, the deep learning (DL) technique is employed to identify images [26, 27] and improve the quality of images with the deep neural network (DNN) [28–36]. Specifically, computational ghost imaging via deep learning (CGIDL) has shown a minimum ratio of Nyquist limit down to $\sim5%$ [29, 33]. However, such work’s DNNs are trained by experimental CGI results. Only when the training environment is highly identical to the environment used for image reconstruction can the DNN be effective. This limits its universal applications and restricts it to achieve quick reconstructions. Usually at least thousands of inputs have to be generated for the training, which would be very time-consuming if conducting experimental training each time. Some studies have been performed to test the effectiveness of non-experimental CGI training DNN, the minimum ratios of the Nyquist limit were up to a few percent [30, 31, 35]. However, the sampling ratio is much higher for objects outside of training dataset than those in the training dataset [33]. Therefore, despite the proliferation of numerous algorithms, retrieving high-quality images outside of the training group with a meager Nyquist limit ratio by non-experimental training remains a challenge for the CGIDL system.

近年来,深度学习(DL)技术被用于图像识别[26, 27],并通过深度神经网络(DNN)提升图像质量[28–36]。其中,基于深度学习的计算鬼成像(CGIDL)已实现最低至$\sim5%$奈奎斯特极限采样率[29, 33]。但这类研究的DNN需通过实验CGI结果进行训练,只有当训练环境与图像重建环境高度一致时,DNN才能有效工作。这限制了其普适性应用,并阻碍了快速重建的实现。通常需要生成至少数千组训练输入,若每次进行实验训练将极其耗时。已有研究测试非实验CGI训练DNN的效果,其奈奎斯特极限最低采样率可达百分之几[30, 31, 35],但对于训练集外的物体,其采样率远高于训练集内物体[33]。因此,尽管算法层出不穷,如何通过非实验训练在极低奈奎斯特极限采样率下获取训练集外的高质量图像,仍是CGIDL系统面临的挑战。

This letter aims to minimize the necessary sampling number further and improve the imaging quality with the combination of DL and colored noise CGI. Recently, it has been shown that the synthesized colored noise patterns possess unique non-zero correlations between neighborhood pixels via amplitude modulation in the spatial frequency domain [37, 38]. In particular, The pink noise CGI owns positive cross-correlations in the second-order correlation [37]. It gives a good image quality under a boisterous environment or pattern distortion when the traditional CGI method fails. Combining DL with pink noise CGI shows that the imaging can be retrieved under an extremely low sampling rate $\mathrm{'\sim0.8%}$ ). We also show that we can get training patterns from the simulation without introducing the environmental noises, i.e., there is no need to get DNN training with a large number of experimental training inputs. The object used in the experiment can be independent of the training dataset, which can largely benefit CGIDL in the real application.

本信旨在进一步减少必要采样数量,并通过深度学习(DL)与彩色噪声关联成像(CGI)的结合提升成像质量。最新研究表明,通过空间频域的振幅调制,合成彩色噪声模式在相邻像素间具有独特的非零相关性[37,38]。特别是粉红噪声CGI在二阶关联中呈现正交叉相关性[37],当传统CGI方法失效时,它能在嘈杂环境或模式畸变下保持良好成像质量。实验表明,深度学习与粉红噪声CGI结合可在极低采样率( $\mathrm{'\sim0.8%}$ )下重建图像。我们还证明可以从仿真中获取训练模式而无需引入环境噪声,即无需通过大量实验训练输入来训练深度神经网络(DNN)。实验中使用的物体可独立于训练数据集,这将极大促进CGIDL在实际应用中的发展。

II. DEEP LEARNING

II. 深度学习


FIG. 1: Architecture of DNN. It consists of four convolution layers, one image input layer, one fully connected layer (yellow), the rectified linear unit, and the batch normalization layers (red). In the upper line are CGI results (training inputs) and handwriting ground truths (training labels); In the bottom line are CGI results from the experiment (test inputs) and CGIDL results (test outputs) with block style.

图 1: DNN架构。该架构包含四个卷积层、一个图像输入层、一个全连接层(黄色)、修正线性单元以及批量归一化层(红色)。上方为CGI生成结果(训练输入)和手写真实值(训练标签);下方为实验获得的CGI结果(测试输入)和采用块状风格的CGIDL输出结果(测试输出)。

Our DNN model, as shown in Fig. 1, uses four convolution layers, one image input layer, and one fully connected layer. Small $3\times3$ receptive fields are applied throughout the whole convolution layers for better performance [39]. Batch normalization layers (BNL), rectified Linear Unit (ReLU) layers and zero padding are added between each convolution layer. The BNL is functioned to avoid internal covariate shift during the training process and speed up the training of DNN [40]. The ReLU layer applies a threshold operation to each element of the inputs [41]. The zero padding part was designed to maintain the characteristic of input images’ boundaries. To customize the size of training pictures, both the input and output layers are set to be $54\times98$ . The solver for training is employed by the Stochastic Gradient Descent with Momentum Optimizer (SGDMO) to reduce the oscillation via using momentum. The parameter vector can be updated via equation Eq. (1), which demonstrates the updating process during the iteration.

我们的 DNN (Deep Neural Network) 模型如图 1 所示,使用了四个卷积层、一个图像输入层和一个全连接层。整个卷积层都采用小的 $3\times3$ 感受野以获得更好的性能 [39]。每个卷积层之间都添加了批量归一化层 (BNL)、修正线性单元 (ReLU) 层和零填充。BNL 的作用是避免训练过程中的内部协变量偏移,并加速 DNN 的训练 [40]。ReLU 层对输入的每个元素应用阈值操作 [41]。零填充部分旨在保持输入图像边界的特征。为了自定义训练图片的大小,输入层和输出层都设置为 $54\times98$。训练求解器采用带动量的随机梯度下降优化器 (SGDMO),通过使用动量来减少振荡。参数向量可以通过方程 Eq. (1) 进行更新,该方程展示了迭代过程中的更新过程。
图片.png

where $\ell$ is the iteration number, $\alpha$ is the learning rate, $\theta$ is the parameter vector, and $E(\theta)$ is the loss function, mean square error (MSE). The MSE is defined as

其中 $\ell$ 为迭代次数,$\alpha$ 为学习率,$\theta$ 为参数向量,$E(\theta)$ 为损失函数,即均方误差 (MSE)。MSE定义为

图片.png

Here, $G$ represents the pixel value of the resulted imaging. $G_{(o)}$ represents pixels that the light ought to be transmitted, i.e., the object area, while $G_{(b)}$ represents pixels that the light ought to be blocked, i.e., the background area. $X$ is the ground truth calculated by

此处,$G$ 表示成像结果的像素值。$G_{(o)}$ 表示光线应当透过的像素(即目标区域),而 $G_{(b)}$ 表示光线应当被阻挡的像素(即背景区域)。$X$ 是通过计算得到的地面真实值。

图片.png

The third part on the right hand side of the equation is the feature of SGDMO, analog to the momentum where $\gamma$ determines the contribution of the previous gradient step to the current iteration [42]. Two strategies are applied to avoid over-fitting of training images. At the end of DNN, a dropout layer is applied with probability of dropping out input elements being 0.2, which is aimed to reduce the connection between convolution layers and the fully connected layer [43]. Meanwhile, we adopted a step decay schedule for the learning rate. The learning rate dropped from $10^{-3}$ to $10^{-4}$ after 75 epochs, which constrain the fitting parameters within a reasonable region. Lower the learning rate could avoid over fitting significantly with constant maximum epochs.

方程右侧的第三部分是SGDMO的特征,类似于动量,其中$\gamma$决定了前一步梯度对当前迭代的贡献[42]。为避免训练图像过拟合,采用了两种策略。在DNN末端应用了丢弃概率为0.2的dropout层,旨在减少卷积层与全连接层之间的关联[43]。同时,我们采用了学习率逐步衰减策略,学习率在75个周期后从$10^{-3}$降至$10^{-4}$,从而将拟合参数限制在合理范围内。在保持最大周期数不变的情况下,降低学习率能显著避免过拟合。

B. Network training

B. 网络训练

The proposed CGIDL scheme requires a training process based on pre-prepared dataset. After training in simulation, it owns ability to reconstruct the images. We use a set of 10000 digits from the MNIST handwritten digit database [44] as training images. All images are resized and normalized to $54\times98$ to test the smaller sampling ratio. These training images are reconstructed by the CGI algorithm. The training images and reconstruction training images then feed the DNN model as inputs and outputs, respectively, as shown in Fig. 2(a). The white noise and pink noise speckle patterns are used separately for the training process, using exactly the same protocol. The maximum epochs are set as 100, and the training iteration is 31200. The program is implemented via MATLAB R2019a Update 5 (9.6.0.1174912, 64-bit), and the DNN is implemented through DL Toolbox. The GPU-chip NVIDIA GTX1050 is used to accelerate the speed of the computation.

提出的CGIDL方案需要基于预先准备的数据集进行训练。在仿真环境中完成训练后,该方案将具备图像重建能力。我们使用MNIST手写数字数据库[44]中的10000个数字作为训练图像,所有图像均被调整尺寸并归一化为$54\times98$以测试更小的采样率。这些训练图像通过CGI算法进行重建,随后将原始训练图像与重建训练图像分别作为DNN模型的输入和输出,如图2(a)所示。训练过程中分别采用白噪声和粉红噪声散斑图案,且保持完全相同的协议参数。最大训练周期设为100次,训练迭代次数为31200次。程序通过MATLAB R2019a Update 5(9.6.0.1174912, 64位)实现,DNN部分采用深度学习工具箱(DL Toolbox)构建,并利用NVIDIA GTX1050显卡加速计算。

The trained DNN is then tested by simulation and used for retrieving CGI results in the experiments. In the testing part, the CGI algorithm generates reconstructed images from testing images with both the MNIST handwritten digits and block style digits, where the later set is different from images in the training group. As shown in Fig. 2(b), the trained DNN, fed with reconstruction testing images, generates CGIDL results. Comparing the difference between CGIDL and testing images, we could measure the quality of the trained DNN. Well-performed DNN can be used for retrieving CGI in the experiment.

训练好的深度神经网络 (DNN) 随后通过仿真进行测试,并用于在实验中检索计算鬼成像 (CGI) 结果。在测试环节,CGI 算法分别对 MNIST 手写数字和方块风格数字的测试图像进行重建,其中后者与训练组的图像不同。如图 2(b) 所示,输入重建测试图像后,训练好的 DNN 会生成 CGIDL 结果。通过对比 CGIDL 与测试图像的差异,我们可以评估训练好的 DNN 的质量。性能良好的 DNN 可用于实验中检索 CGI。

The schematic of the experiment is shown in Fig. 2(c). A CW laser is used to illuminate the DMD, on which the noise patterns are loaded. The pattern generated by the DMD is then projected onto the object. In our experiment, the size of the noise patterns is $216\times392$ DMD pixels (54 $\times$ 98 independent pixels), in which the independent changeable mirrors count for $4\times4$ pixels. Each DMD pixel is $16\mu m\times16\mu m$ in size.

实验示意图如图 2(c)所示。实验中采用连续波激光器照射加载了噪声图案的 DMD (数字微镜器件),随后由 DMD 生成的图案被投射到目标物体上。实验所用噪声图案尺寸为 $216\times392$ 个 DMD 像素(54 $\times$ 98 个独立像素),其中每个可独立控制的微镜对应 $4\times4$ 个像素单元。每个 DMD 像素的物理尺寸为 $16\mu m\times16\mu m$。

In the CGI process, the quality of the images is proportional to the sampling rate, which is the ratio between the number of illumination patterns $N_{\mathrm{pattern}}$ and $N_{\mathrm{pixel}}$ [45, 46]:

在CGI过程中,图像质量与采样率成正比,即照明模式数量$N_{\mathrm{pattern}}$与像素数量$N_{\mathrm{pixel}}$之比 [45, 46]:
图片.png


FIG. 2: The flow chart of CGIDL consists of three parts: (a) training, (b) test, and (c) experiment. The DNN model is trained with CGI results from database via simulation. The simulation testing process and experimental measuments use both the handwriting digits and block style digits. The experimental part for CGI uses pink noise and white noise speckle patterns, and their CGI results are ameliorated by trained DNN model.

图 2: CGIDL流程图包含三部分: (a) 训练, (b) 测试, (c) 实验。DNN模型通过仿真使用数据库中的CGI结果进行训练。仿真测试过程和实验测量同时使用手写数字和方块风格数字。CGI实验部分采用粉红噪声和白噪声散斑图案, 其CGI结果通过训练后的DNN模型进行优化。

In the following, We compared the trained network using white noise speckle patterns (DL white) and pink noise speckle patterns (DL pink), as well as the conventional CGI (CGI white) in terms of reconstruction performance with respect to the sampling ratio $\beta$ .

我们比较了使用白噪声散斑图案 (DL white) 和粉红噪声散斑图案 (DL pink) 训练的网络,以及传统 CGI (CGI white) 在不同采样率 $\beta$ 下的重建性能。

III. SIMULATION

III. 仿真

To test the robustness of our method to different datasets, noise, and its performs at different sampling rates, we performed a set of simulations. Two sets of testing images are used in the simulation. One of which is the handwriting digits 1-9 from the training set, the other is the block style digits 1-9, which are completely independent of training images. These images have $28\times28$ pixels and are resized into $54\times98$ by widening and amplification. We started our simulation from the comparison of the CGI white, DL white and DL pink without noise at $\beta=5%$ , as shown in Fig. 3. The upper part is with the handwriting digits 1-9, the lower part is with the block style digits 1-9. Apparently, at this low sampling rate, the traditional CGI method fails to retrieve the images in both cases. On the other hand, both DL methods work much better than the traditional CGI. For digits from the training dataset, both methods work almost equally well. For digits from outside the training dataset, DL pink works already better than DL white. For example, the DL white barely can distinguish digits ’3’ and ’8’, but DL pink can retrieve all the digits images.

为验证本方法对不同数据集、噪声及采样率的鲁棒性,我们进行了一系列仿真实验。仿真采用两组测试图像:一组来自训练集的手写数字1-9,另一组是与训练图像完全独立的方块风格数字1-9。这些图像原始尺寸为$28\times28$像素,通过加宽放大调整为$54\times98$像素。我们首先在$\beta=5%$无噪声条件下比较CGI白光、DL白光与DL粉光的性能,如图3所示。上半部分为手写数字1-9的重建结果,下半部分为方块数字1-9的重建结果。显然,在此低采样率下,传统CGI方法在两类情况下均无法重建图像;而两种DL方法均显著优于传统CGI。对于训练集内的数字,两种DL方法表现相当;对于训练集外数字,DL粉光已优于DL白光——例如DL白光难以区分"3"和"8",而DL粉光可完整重建所有数字图像。

In real application, there always exist noise in the measurement. It is therefore worthwhile checking the performances of different methods under the influence of noise. We then performed another set of simulations with added grayscale random noise. The signal-to-noise ratio (SNR) in logarithmic decibel scale is defined as

在实际应用中,测量中总是存在噪声。因此,有必要检查不同方法在噪声影响下的性能。我们随后进行了一组添加灰度随机噪声的模拟实验。对数分贝尺度的信噪比 (SNR) 定义为

$$
\mathrm{SNR}=10\log{\frac{P_{\mathrm{s}}}{P_{\mathrm{b}}}},
$$

信噪比 (SNR) 计算公式:
$$
\mathrm{SNR}=10\log{\frac{P_{\mathrm{s}}}{P_{\mathrm{b}}}},
$$

where $P_{\mathrm{s}}$ is the average signal and $P_{\mathrm{b}}$ is the average noise background. Here the SNR is set to be 4.77dB. As shown in Fig. 4, the upper part is the simulation with digits 2, 3, 5, and 6 from the training dataset, and the lower part is the simulation with digits 2, 3, 5, and 6 from the block style dataset. For both datasets, $\beta$ of $100%$ , $50%$ , and $10%$ are chosen for CGI white, $50%$ , $5%$ , and $1%$ for DL white, 5%, $0.8%$ , and $0.5%$ for DL pink. The image quality is better with the increase of $\beta$ for all cases, as expected. As for the CGI white case, it can only give marginally visible images when the sampling rate is beyond $50%$ . The DL white, can retrieve the digits from the training dataset when $\beta=1%$ . However, for the block style digits, it fails to do so even when $\beta=5%$ . Unlike the previous case with no noise, there is a significant difference between objects from the training dataset and outside the training dataset. Lastly, we note that the DL pink trained network can retrieve the training dataset when $\beta=0.5%$ . It can also retrieve clear images for the block style digits at $\beta=0.8%$ . If we compare the black style images at $\beta=5%$ for both DL white and DL pink with the no noise case in Fig. 3, it obvious that the quality of the former is largely affected by the noise, and the latter is barely affected.

其中 $P_{\mathrm{s}}$ 为平均信号强度,$P_{\mathrm{b}}$ 为平均噪声背景。此处信噪比(SNR)设为4.77dB。如图4所示,上半部分为训练数据集中数字2、3、5、6的仿真结果,下半部分为方块风格数据集中相同数字的仿真结果。两组实验中,CGI白噪声选取 $\beta$ 值为100%、50%和10%,DL白噪声为50%、5%和1%,DL粉红噪声为5%、0.8%和0.5%。正如预期,所有情况下图像质量均随 $\beta$ 值增大而提升。对于CGI白噪声,仅当采样率超过50%时才能勉强辨识图像。DL白噪声在 $\beta=1%$ 时可重建训练数据集中的数字,但对块状风格数字即使 $\beta=5%$ 仍无法重建。与无噪声情况不同,此时训练集内外对象存在显著差异。值得注意的是,经DL粉红噪声训练的网络在 $\beta=0.5%$ 时可重建训练数据,且在 $\beta=0.8%$ 时能清晰重建块状风格数字。对比图3无噪声情况下DL白噪声与DL粉红噪声在 $\beta=5%$ 时的块状图像,可见前者质量受噪声影响较大,后者则几乎不受影响。


FIG. 3: Simulation results without noise. The upper part used handwriting digits 1-9 from the training dataset, and the lower part used block style digits 1-9, which is outside the training dataset. All the simulations are done at $\beta=5%$ . GT: ground truth.

图 3: 无噪声模拟结果。上半部分使用训练数据集中的手写数字1-9,下半部分使用训练数据集之外的方块风格数字1-9。所有模拟均在 $\beta=5%$ 下完成。GT: 真实值。

IV. EXPERIMENT

IV. 实验

To further demonstrate the advantage and applicability of CGIDL with pink noise, we perform experiments with the non-experimental and one-time trained model. All the experiments are done with digits 2, 3, 5, and 6 with block style. The block style is chosen to better compare the different behaviors of all three methods. We manage to start from a relatively low noise level of $\mathrm{SNR}=14.90$ dB. The results are shown in upper part of Fig 5. We can see at this noise level, the CGI white method barely can distinguish the images from the noisy background even at $\beta=100%$ . The DL white trained network, while giving clear images at $\beta=50%$ , fails to fully image the digits at $\beta=5%$ . This is mainly due to the objects are outside the training set, reveal one of the shortcomings of the standard DL network. On the other hand, our DL pink trained network can still give clear results even when $\beta=0.5%$ .

为了进一步展示CGIDL结合粉红噪声的优势和适用性,我们使用未经实验训练和一次性训练的模型进行了实验。所有实验均采用块状风格对数字2、3、5和6进行处理。选择块状风格是为了更清晰地比较三种方法的不同表现。实验从相对较低的噪声水平$\mathrm{SNR}=14.90$ dB开始,结果如图5上半部分所示。可见在该噪声水平下,CGI白噪声方法即使在$\beta=100%$时也难以从噪声背景中区分图像;采用白噪声训练的DL网络在$\beta=50%$时能生成清晰图像,但在$\beta=5%$时无法完整呈现数字——这主要因为目标物体不在训练集中,暴露出标准DL网络的缺陷之一。相比之下,我们采用粉红噪声训练的DL网络在$\beta=0.5%$时仍能输出清晰结果。

We then increase the noise level to $\mathrm{SNR}=4.77$ dB, which is the same as the simulation case so we can have a fair comparison. The experimental results are shown in the lower part of Fig 5. The CGI white completely fail to image the digits even at $\beta=100%$ . The DL white trained network is also largely affected by the noisy environment, and not able to fully retrieve the images at $\beta=50%$ . On the other hand, the DL pink method can still image all digits at the sampling rate of $0.8%$ . If we compare these results to the corresponding low noise case, we can see that the image qualities do not change much, indicating our trained network is robust to noise. Also, the results with $\beta=0.8%$ is better than the standard DL white network at $\beta=50%$ , which is about two orders higher.

随后我们将噪声水平提高到 $\mathrm{SNR}=4.77$ dB,与仿真场景保持一致以确保公平比较。实验结果如图5下半部分所示。CGI白噪声方法在 $\beta=100%$ 时仍完全无法成像,DL白噪声训练网络在强噪声环境下也受到显著影响,当 $\beta=50%$ 时仍无法完整重建图像。相比之下,DL粉红噪声方法在 $0.8%$ 采样率下仍能清晰成像所有数字。通过与对应低噪声场景的对比可见,成像质量变化幅度较小,表明我们训练的网络具有噪声鲁棒性。值得注意的是,该方法在 $\beta=0.8%$ 时的成像效果优于标准DL白噪声网络在 $\beta=50%$(采样率高出约两个数量级)时的表现。

To quantitatively justify the quality of reconstructed block style images, we compare three evaluating indicators of image quality, i.e., the peak signal to noise ratio (PSNR), the visibility (VIS), and the correlation coefficient (CC):

为了定量评估重建块状风格图像的质量,我们比较了三种图像质量评价指标:峰值信噪比 (PSNR)、可见度 (VIS) 和相关系数 (CC):
图片.png


FIG. 4: Simulation results of handwriting (top) and block style (bottom) digits 2, 3, 5, 6 with the SNR of 4.77 dB. The results of CGI white are done at $\beta$ of $10%$ , $50%$ , and $100%$ , DL white with $\beta$ of $1%$ , $5%$ , and $50%$ , and DL pink with $\beta$ of $0.5%$ , $0.8%$ , and $5%$ .

图 4: 信噪比(SNR)为4.77 dB时手写体(上)与印刷体(下)数字2、3、5、6的仿真结果。CGI白噪声在β为10%、50%和100%时完成,DL白噪声在β为1%、5%和50%时完成,DL粉红噪声在β为0.5%、0.8%和5%时完成。


FIG. 5: Experimental results with the SNR of 14.90 dB (upper) and 4.77 dB (lower). Objects are block style digits 2,3, 5, 6. Different sampling rates are shown for different methods: CGI white are done at $\beta$ of $10%$ , $50%$ , and $100%$ , DL white with $\beta$ of $1%$ , $5%$ , and $50%$ , and DL pink with $\beta$ of $0.5%$ , $0.8%$ , and $5%$ .

图 5: 信噪比 (SNR) 为 14.90 dB (上) 和 4.77 dB (下) 的实验结果。物体为方块风格数字 2、3、5、6。不同方法采用不同采样率: CGI 白噪声在 $\beta$ 为 $10%$、$50%$ 和 $100%$ 下完成, DL 白噪声在 $\beta$ 为 $1%$、$5%$ 和 $50%$ 下完成, DL 粉红噪声在 $\beta$ 为 $0.5%$、$0.8%$ 和 $5%$ 下完成。

图片.png

Here MSE is defined in Eq. 2, $\mathrm{Var()}$ is the variance of its arguments, $\mathrm{Cov}()$ is the covariance of its arguments, $k$ is the gray level of the image, and in our experiment $k\equiv8$ .

此处MSE由公式2定义,$\mathrm{Var()}$表示其参数的方差,$\mathrm{Cov}()$表示其参数的协方差,$k$为图像灰度级,本实验中设定$k\equiv8$。


FIG. 6: PSNR, VIS, and CC for simulation and experiments of the block style digits 2, 3, 5, 6 with three categories: CGI white $\beta$ of $10%$ , $50%$ , and $100%$ ), DL white ( $\beta$ of $1%$ , $5%$ , and $50%$ ), and DL pink ( $\beta$ of $0.5%$ , $0.8%$ , and $5%$ ).

图 6: 块状数字2、3、5、6在仿真和实验中的PSNR、VIS和CC指标,包含三类参数:CGI白噪声(β为10%、50%和100%)、DL白噪声(β为1%、5%和50%)以及DL粉红噪声(β为0.5%、0.8%和5%)。

The results for all cases including simulation without and with noise, experiment with high and low SNR, are shown in Fig. 6. The PSNR, VIS, and CC all indicate that the CGIDL methods are much better than the traditional CGI method. Indeed, as shown in the simulation case, the image quality of CGIDL at 5% is already better than CGI at full sampling rate for all the situations. When we compare the two DL methods, we see that in general DL pink is much better than DL white, as also suggested from Figs. 3, 4, and 5. Since the network is trained using MSE as the loss function, the PSNR of simulation without noise at 5% is very similar for both cases. However, when the noise increases, the PSNR of DL white starts to decrease, while the PSNR of DL pink does not change much. The VIS and CC also have similar behavior as PSNR. We note here that all three indicators suggest DL pink works better than the other two methods, in the experimental results with low SNR, DL pink of 5% sampling rate is already better than the DL white with $50%$ sampling rate.

所有情况下的结果,包括无噪声和有噪声的仿真、高信噪比(SNR)和低信噪比的实验,如图6所示。PSNR、VIS和CC均表明CGIDL方法明显优于传统CGI方法。事实上,如仿真案例所示,在所有情况下,CGIDL在5%采样率下的图像质量已经优于全采样率的CGI。当我们比较两种DL方法时,发现DL pink总体上远优于DL white,这也与图3、4和5的结论一致。由于网络训练时使用MSE作为损失函数,在无噪声仿真中5%采样率下两者的PSNR非常接近。但随着噪声增加,DL white的PSNR开始下降,而DL pink的PSNR变化不大。VIS和CC指标也表现出与PSNR相似的趋势。值得注意的是,所有三个指标都表明DL pink优于其他两种方法,在低SNR的实验结果中,5%采样率的DL pink甚至优于50%采样率的DL white。

V. CONCLUSION

V. 结论

In conclusion, we have demonstrated a deep-learning imaging method with pink noise patterns. The DNN is trained using only simulation data from the handwriting dataset. The trained network can then be applied to various conditions, including objects outside the training set and experiments with strong noise. We have demonstrated imaging results with extremely low sampling rate both in simulation and experiments. We have also evaluated the quality of the images outside the training dataset for both simulation and experimental results, in terms of PSNR,

总之,我们展示了一种采用粉红噪声模式的深度学习成像方法。该深度神经网络(DNN)仅使用手写数据集的模拟数据进行训练。训练后的网络可应用于多种场景,包括训练集外的物体和强噪声实验。我们通过仿真和实验证明了该方法在极低采样率下的成像效果,并从峰值信噪比(PSNR)角度评估了训练数据集外仿真与实验结果的图像质量。

VIS, and CC.

VIS和CC。

All results suggest that the DL pink scheme has a great advantage, especially in the low sampling region. This one-time, noise-robust, and non-experimental training CGIDL is eligible to be implemented in various situations and has a wide range of application prospects. The pink noise speckle patterns, trained DNN with various sampling rates, and their raw encoding programs are encapsulated and uploaded online [48]. People who need a quick sampling function on CGIDL can utilize this universal system to get ameliorated results in other CGIDL systems. Further works can reach to other imaging and spectroscopy systems by loss function adjustment and speckle pattern optimization, in order to get spatial, frequency, or time-resolution. In addition to results amelioration, DL may also have great potential to generate optimized speckle patterns for a variety of tasks.

所有结果表明,DL粉红方案具有显著优势,尤其在低采样区域表现突出。这种一次性、抗噪声且无需实验训练的CGIDL方法适用于多种场景,具备广阔的应用前景。粉红噪声散斑图案、不同采样率训练的DNN及其原始编码程序已封装并在线发布[48]。需要快速采样功能的用户可通过该通用系统在其他CGIDL系统中获得优化结果。后续工作可通过调整损失函数和优化散斑图案,将方法拓展至其他成像与光谱系统,以获取空间、频率或时间分辨率。除结果优化外,DL在生成针对各类任务的优化散斑图案方面也展现出巨大潜力。

Funding

资金

Air Force Office of Scientific Research (Award No. FA9550-20-1-0366 DEF), Office of Naval Research (Award No. N00014-20-1-2184), Robert A. Welch Foundation (Grant No. A-1261), National Science Foundation (Grant No. PHY-2013771).

空军科学研究办公室 (项目编号 FA9550-20-1-0366 DEF)
海军研究办公室 (项目编号 N00014-20-1-2184)
Robert A. Welch 基金会 (资助编号 A-1261)
国家科学基金会 (资助编号 PHY-2013771)

Data availability.

数据可用性

The experimental and simulation data are available upon reasonable request. The trained DL networks and raw DL training and test codes are uploaded on the website: https://github.com/XJTU-TAMU-CGI/CGIDL.

实验和模拟数据可根据合理请求提供。训练好的深度学习网络及原始深度学习训练与测试代码已上传至网站:https://github.com/XJTU-TAMU-CGI/CGIDL

阅读全文(20积分)