[论文翻译]填充K空间与优化图像:动态多对比MRI重建的提示方法


原文地址:https://arxiv.org/pdf/2309.13839v1


Fill the K-Space and Refine the Image: Prompting for Dynamic and Multi-Contrast MRI Reconstruction

填充K空间与优化图像:动态多对比MRI重建的提示方法

Abstract. The key to dynamic or multi-contrast magnetic resonance imaging (MRI) reconstruction lies in exploring inter-frame or intercontrast information. Currently, the unrolled model, an approach combining iterative MRI reconstruction steps with learnable neural network layers, stands as the best-performing method for MRI reconstruction. However, there are two main limitations to overcome: firstly, the unrolled model structure and GPU memory constraints restrict the capacity of each denoising block in the network, impeding the effective extraction of detailed features for reconstruction; secondly, the existing model lacks the flexibility to adapt to variations in the input, such as different contrasts, resolutions or views, necessitating the training of separate models for each input type, which is inefficient and may lead to insufficient reconstruction. In this paper, we propose a two-stage MRI reconstruction pipeline to address these limitations. The first stage involves filling the missing $k$ -space data, which we approach as a physics-based reconstruction problem. We first propose a simple yet efficient baseline model, which utilizes adjacent frames/contrasts and channel attention to capture the inherent interframe/-contrast correlation. Then, we extend the baseline model to a prompt-based learning approach, PromptMR, for all-in-one MRI reconstruction from different views, contrasts, adjacent types, and acceleration factors. The second stage is to refine the reconstruction from the first stage, which we treat as a general video restoration problem to further fuse features from neighboring frames/contrasts in the image domain. Extensive experiments show that our proposed method significantly outperforms previous state-of-the-art accelerated MRI reconstruction methods.

摘要。动态或多对比度磁共振成像(MRI)重建的关键在于探索帧间或对比度间的信息。目前,展开模型(unrolled model)这种将迭代MRI重建步骤与可学习神经网络层相结合的方法,是MRI重建中性能最佳的方法。然而该方法存在两个主要局限:首先,展开模型结构和GPU内存限制制约了网络中每个去噪模块的容量,阻碍了有效提取重建所需的细节特征;其次,现有模型缺乏适应输入变化(如不同对比度、分辨率或视角)的灵活性,需要为每种输入类型单独训练模型,效率低下且可能导致重建不足。本文提出两阶段MRI重建流程来解决这些局限。第一阶段通过基于物理的重构方法来填补缺失的k空间数据:首先提出一个利用相邻帧/对比度和通道注意力捕捉固有帧间/对比度相关性的高效基线模型;随后将该基线模型扩展为基于提示学习(prompt-based learning)的PromptMR方法,实现多视角、多对比度、多相邻类型和多加速因子的统一MRI重建。第二阶段将第一阶段的输出作为通用视频修复问题,在图像域进一步融合相邻帧/对比度特征。大量实验表明,我们提出的方法显著优于现有最先进的加速MRI重建方法。

Keywords: MRI reconstruction · Prompt-based learning · Dynamic · Multi-contrast · Two-stage approach

关键词: MRI重建 · 基于提示的学习 · 动态 · 多对比度 · 两阶段方法

1 Introduction

1 引言

Cardiovascular disease, including conditions such as coronary artery disease, heart failure, and arrhythmia s, remains the leading cause of death globally. Cardiac magnetic resonance (CMR) imaging is the most accurate and reliable non-invasive technique for accessing cardiac anatomy, function, and pathology [13]. In the field of accelerated MR imaging (MRI) reconstruction, unrolled networks have achieved state-of-the-art performance. This is attributed to their ability to incorporate the known imaging degradation processes, the under sampling operation in kspace, into the network and to learn image priors from large-scale data [16,2]. As transformers have become predominant in general image restoration tasks [18,9], there is a noticeable trend towards incorporating transformer-based denoising blocks into the unrolled network [2], which enhances reconstruction quality. However, the adoption of transformer blocks concurrently increases the network parameters and computational complexity. The stacking of denoising blocks, in an unrolled manner, further exacerbates this complexity, making the network training challenging. Therefore, one challenging question is how to design efficient denoising blocks within an unrolled model while fully leveraging the k-space information. Another challenge arises from the versatility of MRI, which enables the acquisition of multi-view, multi-contrast, multi-slice, and dynamic image sequences, given specific clinical demands. While there is a prevailing trend towards designing all-in-one models for natural image restoration [7,12], existing MRI reconstruction models cannot offer a unified solution for diverse input types. We thus endeavor to address these challenges with the following contributions:

心血管疾病,包括冠状动脉疾病、心力衰竭和心律失常等,仍是全球主要的死亡原因。心脏磁共振 (cardiac magnetic resonance, CMR) 成像是评估心脏解剖结构、功能和病理最准确可靠的无创技术 [13]。在加速磁共振成像 (MRI) 重建领域,展开式网络 (unrolled networks) 已实现最先进的性能,这归因于其能将已知的成像退化过程 (即k空间中的欠采样操作) 融入网络,并能从大规模数据中学习图像先验 [16,2]。随着Transformer在通用图像修复任务中占据主导地位 [18,9],将基于Transformer的去噪模块融入展开式网络以提升重建质量已成为显著趋势 [2]。然而,Transformer模块的采用同时增加了网络参数和计算复杂度,而展开式堆叠去噪模块的方式进一步加剧了这一复杂性,使网络训练更具挑战性。因此,一个关键问题是如何在展开式模型中设计高效去噪模块,同时充分利用k空间信息。另一挑战源于MRI的多功能性——根据临床需求可获取多视角、多对比度、多切片及动态图像序列。尽管自然图像修复领域普遍趋向设计全能模型 [7,12],现有MRI重建模型仍无法为多样化输入类型提供统一解决方案。我们通过以下创新应对这些挑战:

2 Preliminaries

2 预备知识

Consider reconstructing a complex-valued MR image $x$ from the multi-coil under sampled measurements $y$ in k-space, such that,

考虑从k空间的多线圈欠采样测量值$y$重建复值MR图像$x$,使得

$$
y=A x+\epsilon,
$$

$$
y=A x+\epsilon,
$$

where $A$ is the linear forward complex operator which is constructed based on multiplications with the sensitivity maps $S$ , application of the 2D Fourier transform $F$ , while it under-samples the k-space data with a binary mask $M$ ; $\epsilon$ is the acquisition noise. According to compressed sensing theory [1], we can estimate $x$ by formulating an optimization problem:

其中 $A$ 是基于灵敏度图 $S$ 的线性前向复算子,通过应用二维傅里叶变换 $F$ 构建,同时使用二元掩码 $M$ 对k空间数据进行欠采样;$\epsilon$ 是采集噪声。根据压缩感知理论 [1],我们可以通过构建以下优化问题来估计 $x$:

$$
\operatorname*{min}{x}\frac{1}{2}||y-A x||_{2}^{2}+\lambda R(x),
$$

$$
\operatorname*{min}{x}\frac{1}{2}||y-A x||_{2}^{2}+\lambda R(x),
$$


Fig. 1: The proposed two-stage MRI reconstruction pipeline. The first stage solves a physics-based inverse problem to fill the missing k-space data, which are then transformed to the image domain by the inverse Fast Fourier Transformation (IFFT) and root-sum-of-squares (RSS) is applied to get the first-stage reconstructed image. The second stage solves a general denoising problem to further refine the image reconstruction result.

图 1: 提出的两阶段MRI重建流程。第一阶段通过求解基于物理学的逆问题来填补缺失的k空间数据,随后通过快速傅里叶逆变换 (IFFT) 转换到图像域,并应用平方和根 (RSS) 方法获得第一阶段重建图像。第二阶段通过解决通用去噪问题进一步优化图像重建结果。

where $||\boldsymbol{y}-\boldsymbol{A x}||_{2}^{2}$ is the data consistency term, $R(x)$ is a sparsity regular iz ation term on $x$ (e.g., total variation) and $\lambda$ is a hyper-parameter which controls the contribution weights of the two terms. E2E-VarNet [16] solves the problem in Eq. 2 by applying an iterative gradient descent method in the k-space domain. In the $t$ -th step, the k-space is updated from $k^{t}$ to $k^{t+1}$ using:

其中 $||\boldsymbol{y}-\boldsymbol{A x}||_{2}^{2}$ 是数据一致性项,$R(x)$ 是关于 $x$ 的稀疏正则化项 (例如全变分),$\lambda$ 是控制两项贡献权重的超参数。E2E-VarNet [16] 通过在k空间域应用迭代梯度下降法来求解公式2中的问题。在第 $t$ 步时,k空间从 $k^{t}$ 更新至 $k^{t+1}$ 的公式为:

$$
\boldsymbol{k}^{t+1}=\boldsymbol{k}^{t}-\eta^{t}\boldsymbol{M}(\boldsymbol{k}^{t}-\boldsymbol{y})+\boldsymbol{G}(\boldsymbol{k}^{t}),
$$

$$
\boldsymbol{k}^{t+1}=\boldsymbol{k}^{t}-\eta^{t}\boldsymbol{M}(\boldsymbol{k}^{t}-\boldsymbol{y})+\boldsymbol{G}(\boldsymbol{k}^{t}),
$$

where $\eta^{t}$ is a learned step size and $G$ is a learned function representing the gradient of the regular iz ation term $R$ . We can unroll the iterative updating algorithm to a sequence of sub-networks, where each cascade represents an unrolled iteration in Eq. 3. The regular iz ation term is applied in the image domain:

其中 $\eta^{t}$ 是学习到的步长,$G$ 是表示正则化项 $R$ 梯度的一个学习函数。我们可以将迭代更新算法展开为一系列子网络,其中每个级联对应于式3中的一次展开迭代。正则化项在图像域中应用:

$$
G(k)=F({\mathcal{E}}(\mathbf{D}({\mathcal{R}}(F^{-1}(k))))),
$$

$$
G(k)=F({\mathcal{E}}(\mathbf{D}({\mathcal{R}}(F^{-1}(k))))),
$$

where $\begin{array}{r}{\mathcal{R}(x_{1},...,x_{N})=\sum_{i=1}^{N}\hat{S}_{i}^{*}x_{i}}\end{array}$ is the reduce operator that combines $N$ coil images ${x_{i}}_{i=1}^{N}$ via esti mated sensitivity maps ${\hat{S}_{i}}_{i=1}^{N}$ , $\hat{S}_{i}^{*}$ is the complex conju- gate of ${\hat{S}}_{i}$ , and $\mathcal{E}(x)=(\hat{S}_{i}x,...,\hat{S}_{N}x)$ is the expand operator that computes coil images from image $x$ . Therefore, the linear forward operator $A$ is computed as $A=M F{\mathcal{E}}$ . $\mathbf{D}$ is a denoising neural network used to refine the complex image. $\hat{S}=\mathrm{SME}(y_{\mathrm{ACS}})$ is computed by a sensitivity map estimation (SME) network from the low-frequency region of k-space $y_{\mathrm{ACS}}$ , called the Auto-Calibration Signal (ACS), which is typically fully sampled. The final updated multi-coil k-space is converted to the image domain by applying an inverse Fourier transform followed by a root-sum-of-squares (RSS) method reduction [14] for each pixel.

其中 $\begin{array}{r}{\mathcal{R}(x_{1},...,x_{N})=\sum_{i=1}^{N}\hat{S}_{i}^{*}x_{i}}\end{array}$ 是通过估计的灵敏度图 ${\hat{S}{i}}{i=1}^{N}$ 将 $N$ 个线圈图像 ${x_{i}}{i=1}^{N}$ 组合的归约算子(reduce operator),$\hat{S}{i}^{*}$ 是 ${\hat{S}}{i}$ 的复共轭,而 $\mathcal{E}(x)=(\hat{S}{i}x,...,\hat{S}{N}x)$ 是从图像 $x$ 计算线圈图像的扩展算子(expand operator)。因此,线性前向算子 $A$ 计算为 $A=M F{\mathcal{E}}$。$\mathbf{D}$ 是用于细化复图像的降噪神经网络。$\hat{S}=\mathrm{SME}(y_{\mathrm{ACS}})$ 由灵敏度图估计(SME)网络从k空间的低频区域 $y_{\mathrm{ACS}}$ (称为自动校准信号(ACS))计算得出,该区域通常被完全采样。最终更新的多线圈k空间通过应用逆傅里叶变换,然后对每个像素采用平方和根(RSS)方法归约[14]转换到图像域。


Fig. 2: Overview of PromptMR in Stage I: an all-in-one unrolled model for MRI reconstruction. Adjacent inputs, depicted in image domain for visual clarity, provide neighboring k-space information for reconstruction. To accommodate different input varieties, the input-type adaptive visual prompt is integrated into each cascade of the unrolled architecture to guide the reconstruction process.

图 2: PromptMR第一阶段概述:用于MRI重建的一体化展开模型。为便于可视化,相邻输入在图像域中展示,为重建提供相邻k空间信息。为适应不同输入类型,输入类型自适应视觉提示被集成到展开架构的每个级联中,以指导重建过程。

3 Method

3 方法

We propose a two-stage pipeline for dynamic and multi-contrast MRI reconstruction, as shown in Fig. 1. Below, we give more details of each stage.

我们提出了一种动态多对比度MRI重建的两阶段流程,如图1所示。下面我们将详细介绍每个阶段。

3.1 Stage I: Filling the K-Space

3.1 第一阶段:填充K空间

The center of the k-space preserves image contrast, and the periphery of the $\mathrm{k\Omega}$ -space contains edge information. In the first stage, we fill the missing k-space data constrained by the existing k-space acquisition and learned image priors.

k空间中心保留图像对比度,而 $\mathrm{k\Omega}$ 空间外围包含边缘信息。在第一阶段,我们通过现有k空间采集数据和学习到的图像先验信息来填充缺失的k空间数据。

Baseline Model We follow the implementation of E2E-VarNet [16] to construct an unrolled model in Stage I. Inspired by the adjacent slice reconstruction (ASR) method [2], which learns inter-slice information by jointly reconstructing a set of adjacent slices instead of relying on a single k-space to be reconstructed, we devise the following new method. We generalize ASR to adjacent k-space reconstruction along any dimension, e.g., temporal/slice/view/contrast dimension, and the updating formula of Eq. 3 is improved as follows:

基线模型 我们遵循E2E-VarNet [16] 的实现方式构建了第一阶段展开式模型。受相邻切片重建 (ASR) 方法 [2] 启发 (该方法通过联合重建一组相邻切片而非依赖单一k空间数据进行重建来学习切片间信息) ,我们设计了以下新方法:将ASR推广至沿任意维度 (如时间/切片/视图/对比度维度) 的相邻k空间重建,并将公式3的更新方式改进如下:

$$
k_{a d j}^{t+1}=k_{a d j}^{t}-\eta^{t}A(k_{a d j}^{t}-y_{a d j})+G(k_{a d j}^{t}),
$$

$$
k_{a d j}^{t+1}=k_{a d j}^{t}-\eta^{t}A(k_{a d j}^{t}-y_{a d j})+G(k_{a d j}^{t}),
$$

where $\boldsymbol{k}{a d j}^{t}=[k_{c-a}^{t},...,k_{c-1}^{t},k_{c}^{t},k_{c+1}^{t},...,k_{c+a}^{t}]$ is the concatenation of the central -space $k_{c}^{t}$ with its $2a$ adjacent k-spaces along a specific dimension. To efficiently extract features from adjacent inputs, we design a Unet-style network [15] with channel attention [3,4], namely CAUnet, for both the denoising network $D$ and the sensitivity map estimation network, as shown in Appendix A.1. The CAUnet has a 3-level encoder-decoder structure. Each level consists of a DownBlock,

其中 $\boldsymbol{k}{a d j}^{t}=[k_{c-a}^{t},...,k_{c-1}^{t},k_{c}^{t},k_{c+1}^{t},...,k_{c+a}^{t}]$ 是中心k空间 $k_{c}^{t}$ 与其沿特定维度的 $2a$ 个相邻k空间的拼接。为了高效地从相邻输入中提取特征,我们为去噪网络 $D$ 和灵敏度图估计网络设计了一个带有通道注意力 [3,4] 的Unet风格网络 [15],称为CAUnet,如附录A.1所示。该CAUnet采用3级编码器-解码器结构,每级包含一个DownBlock。


Fig. 3: Overview of the PromptUnet architecture in PromptMR, featuring a 3-level encoder-decoder design. Each level comprises a DownBlock, UpBlock and Prompt Block. The Prompt Block in the $i$ -th level encodes input-specific context into fixed prompt $P_{i}$ , producing adaptively learned prompt $\hat{P}{i}$ . These prompts, across multiple levels, integrate with decoder features $F_{d,i}$ in the UpBlocks to allow rich hierarchical context learning.

图 3: PromptMR中PromptUnet架构概览,采用3级编码器-解码器设计。每级包含DownBlock、UpBlock和Prompt Block。第$i$级的Prompt Block将输入特定上下文编码为固定提示$P_{i}$,生成自适应学习提示$\hat{P}{i}$。这些多级提示与UpBlock中的解码器特征$F_{d,i}$集成,实现丰富的分层上下文学习。

UpBlock, and corresponding skip connection. The architecture integrates a Bottleneck Block for high-level semantic feature capturing and employs Channel Attention Blocks (CABs) within each block. The overall unrolled architecture is shown in Appendix A.2.

UpBlock 及对应的跳跃连接。该架构集成了用于捕获高级语义特征的 Bottleneck Block,并在每个块内采用了通道注意力块 (CAB)。整体展开架构如附录 A.2 所示。

PromptMR Considering various image types (e.g., different views, different contrasts) with different adjacent types (e.g., dynamic, multi-contrast) under different under sampling rates (e.g., $\times4,\times8,\times10.$ ), instead of training separate models for each specific input, we propose to learn an all-in-one unified model for all possible adjacent inputs. The image structure remains consistent for multicontrast adjacent input, while only the contrast varies. Conversely, the contrast remains constant for dynamic adjacent input, but the image structure shifts. To achieve effective performance on diverse input types, the unified model should be able to encode the contextual information conditioned on the input type. Inspired by the recent development of visual prompt learning [5,6] and prompt learning-based image restoration method [12], we introduce PromptMR, an all-inone approach for MRI reconstruction, as illustrated in Fig. 2. While PromptMR retains the same unrolled architecture of the basline model, it extends CAUnet to PromptUnet by integrating Prompt Blocks to learn input-type adaptive prompts and then interact with decoder features in the UpBlocks at multiple levels, to enrich the input-specific context, as shown in Fig. 3. The Prompt Block at $i$ -th level takes features $F_{d,i}\in\mathbb{R}^{H_{f}\times W_{f}\times C_{f}}$ from the decoder and $N_{p}$ -components fixed prompt $P_{i}\in\mathbb{R}^{N_{p}\times H_{p}\times W_{p}\times C_{p}}$ as input. Then, $F_{d,i}$ are processed by a global average pooling (GAP) layer, followed by a linear layer and a softmax layer to generate the normalized prompt weights ${\omega_{i j}}{j=1}^{N_{p}}.$ These weights linearly combine

PromptMR 针对不同图像类型(如不同视角、不同对比度)和不同相邻类型(如动态、多对比度)在不同欠采样率(如 $\times4,\times8,\times10.$)下的情况,提出学习一个适用于所有可能相邻输入的一体化统一模型,而非为每种特定输入单独训练模型。多对比度相邻输入的图像结构保持一致,仅对比度变化;而动态相邻输入的对比度恒定,但图像结构发生偏移。为实现多样化输入类型的有效性能,统一模型需能根据输入类型编码上下文信息。受视觉提示学习 [5,6] 和基于提示学习的图像恢复方法 [12] 的启发,我们提出 PromptMR——一种 MRI 重建的一体化方法,如图 2 所示。PromptMR 在保留基线模型展开式架构的同时,通过集成提示块(Prompt Blocks)将 CAUnet 扩展为 PromptUnet,以学习输入类型自适应的提示,并在多层级与 UpBlocks 中的解码器特征交互,从而丰富输入特定上下文(如图 3)。第 $i$ 层提示块以解码器特征 $F_{d,i}\in\mathbb{R}^{H_{f}\times W_{f}\times C_{f}}$ 和 $N_{p}$ 分量固定提示 $P_{i}\in\mathbb{R}^{N_{p}\times H_{p}\times W_{p}\times C_{p}}$ 为输入。随后,$F_{d,i}$ 经全局平均池化(GAP)层处理,再通过线性层和 softmax 层生成归一化提示权重 ${\omega_{i j}}{j=1}^{N_{p}}$,这些权重线性组合——

$$
\begin{array}{r}{\hat{P}{i}=\operatorname{Conv}{3\times3}(\operatorname{Interp}(\sum_{j=1}^{N_{p}}\omega_{i j}P_{i j})),\quad\omega_{i}=\operatorname{Softmax}(\operatorname{Linear}(\operatorname{GAP}(F_{d,i})))}\end{array}
$$

$$
\begin{array}{r}{\hat{P}{i}=\operatorname{Conv}{3\times3}(\operatorname{Interp}(\sum_{j=1}^{N_{p}}\omega_{i j}P_{i j})),\quad\omega_{i}=\operatorname{Softmax}(\operatorname{Linear}(\operatorname{GAP}(F_{d,i})))}\end{array}
$$

The generated prompts by the Prompt Blocks at multiple levels can learn hierarchical input-type contextual representations, which are integrated with the decoder features to guide the all-in-one MRI reconstruction.

通过多级提示块生成的提示可以学习分层输入类型的上下文表示,这些表示与解码器特征相结合,以指导一体化MRI重建。

3.2 Stage II: Refining the Image

3.2 阶段二:图像优化

After the first stage, the missing k-space data has been filled, and image aliasing artifacts have been largely removed. However, due to the unrolled nature and memory limitations, the capability of the denoising block we can use is constrained, which may prevent the full exploration of dynamic and multi-contrast information. In stage II, we further explore the inter-frame/-contrast coherence in the image domain for multi-frame/-contrast feature aggregation by using a powerful restoration model, ShiftNet [8], as the refinement network. This network employs stacked Unets and grouped spatio-temporal shift operations to expand the effective receptive fields. Details of the ShiftNet are not covered here, since it is not the core part of this paper, and ShiftNet can be replaced by any state-of-the-art video restoration model.

第一阶段完成后,缺失的k空间数据已被填补,图像混叠伪影也基本消除。但由于网络展开特性和内存限制,我们所能使用的去噪模块性能受到制约,这可能阻碍对动态和多对比信息的充分探索。在第二阶段,我们采用强大的修复模型ShiftNet [8]作为 refinement 网络,进一步挖掘图像域中帧间/对比间的相关性,实现多帧/多对比特征聚合。该网络通过堆叠Unet和分组时空位移操作来扩展有效感受野。ShiftNet的具体细节在此不作赘述,因其非本文核心部分,且该网络可被任何先进的视频修复模型替代。

4 Experiments

4 实验

In this section, we first provide experimental details and results of our proposed method on the CMRxRecon dataset. We use SSIM, PSNR, and NMSE to compare the performance of different reconstruction methods under various acceleration factors ( $\times$ 4, ×8, ×10). Then, we conduct extensive ablation studies of our proposed method and also benchmark on another large-scale MRI dataset, the fastMRI multi-coil knee dataset. For experiments on fastMRI dataset, we refer readers to the Appendix B.

在本节中,我们首先在CMRxRecon数据集上提供所提方法的实验细节和结果。我们使用SSIM、PSNR和NMSE来比较不同重建方法在多种加速因子( ×4、×8、×10 )下的性能。随后,我们对所提方法进行了全面的消融研究,并在另一个大规模MRI数据集fastMRI多线圈膝关节数据集上进行了基准测试。关于fastMRI数据集的实验,请读者参阅附录B。

4.1 CMRxRecon Dataset

4.1 CMRxRecon 数据集

The CMRxRecon Dataset [17] includes 120 cardiac MRI cases of fully sampled dynamic cine and multi-contrast raw k-space data obtained on 3 Tesla magnets. The dynamic cine images in each case include short-axis (SAX), two-chamber (2-CH), three-chamber (3-CH), and four-chamber (4-CH) long-axis (LAX) views. Typically $5\sim10$ slices were acquired for SAX cine, while a single slice was acquired for each LAX view. The cardiac cycle was segmented into $12\sim25$ phases with a temporal resolution of 50 ms. The multi-contrast cardiac MRI in each case is in the SAX view, which contains 9 T1-weighted (T1w) images conducted using a modified look-locker inversion recovery (MOLLI) sequence and 3 T