[论文翻译]基于ResUnet的解剖脑屏障分割


下载PDF:https://arxiv.org/pdf/2012.14567v2.pdf


Ensembled ResUnet for Anatomical Brain Barriers Segmentation

基于 ResUnet 的解剖脑屏障分割

Abstract

Accuracy segmentation of brain structures could be helpful for glioma and radiotherapy planning. However, due to the visual and anatomical differences between different modalities, the accurate segmentation of brain structures becomes challenging.
To address this problem, we first construct a residual block based U-shape network with a deep encoder and shallow decoder, which can trade off the framework performance and efficiency. Then, we introduce the Tversky loss to address the issue of the class imbalance between different foreground and the background classes. Finally, a model ensemble strategy is utilized to remove outliers and further boost performance.

摘要

脑结构的精度分割可能有助于胶质瘤和放射治疗计划。然而,由于不同方式之间的视觉和解剖差异,脑结构的准确分割变得具有挑战性。为了解决这个问题,我们首先使用深度编码器和浅层解码器构建基于剩余的 U 形网络,可以缩短框架性能和效率。然后,我们介绍了 TVERSKY 丢失来解决不同前景和背景类之间的类别不平衡问题。最后,利用模型集合策略来消除异常值并进一步提高性能。

Introduction

Gliomas are the most common type of brain tumors, originating from glial cells in the human brain. In clinical radiation therapy, the precise identification of the clinical target volume (CTV) boundary can ensure the operation effect, whereas the CTV boundary is determined by anatomical structures. Therefore, the accurate and automatic segmentation of brain anatomical structures could improve both efficiency and effectiveness for gliomas treatment.
Recently, deep learning approaches have been proven its the superiority on both 2D natural images and 3D medical modalities segmentation task compared to the traditional methods. Especially for the brain tumor segmentation, proposed a U-Net with an additional VAE branch to reconstruct input images and regularize the shared encoder. established a two-stage cascaded U-Net to refine the coarse prediction from the first stage and capture context information in the second stage. As to brain CTV segmentation, introduced DenseNet to predict resection cavity precise contours.
To raise the researcher interest in the study of brain CTV segmentation, the Anatomical Brain Barriers to Cancer Spread challenge (ABCs) aims to encourage challengers to construct an automatic brain structures segmentation methods, where some critical structures (e.g., structures that are served as barriers to the spread of brain cancers or spared from irradiation) are included in two tasks.
The dataset provides 45 multi-modal images with ground truth annotations for training and validation, 15 images for the online test, and 15 images for the final test. Each case is given with a CT scan and two diagnostic MRI scan which includes contrast-enhanced T1-weighted and T2-weighted FLAIR of the post-operative brain. Besides, all images are co-registered and re-sampled to the size of 164x194x142 pixels with the isotropic resolution of 1.2x1.2x1.2 mm. For the task 1, challengers are asked to segment brain structures for the automatic identification of the CTV. As for the task 2, participants are required to segment structures that used in radiotherapy treatment plan optimization. Evaluation metrics are specified to the Dice score and surface Dice score to evaluate the performance of the proposed algorithms.
In this report, we propose a residual block based U-Net which is composed of a deep encoder and a shallow decoder and implemented with nn-UNet framework. To provide multi-scale guidance for boosting framework performance and accelerate training convergence, we employ a deep supervision strategy in our framework. To suppress the false-negative results and retrieve the missing small targets, we utilize the Tversky loss together with the cross-entropy loss as our criterion. Experiments show that the proposed framework outperforms the compared state-of-the-art methods.

介绍

gliomas 是最常见的脑肿瘤类型,来自人脑中的胶质细胞。在临床放射治疗中,临床目标体积(CTV)边界的精确鉴定可以确保操作效果,而 CTV 边界由解剖结构确定。因此,脑解剖结构的准确性和自动分割可以提高胶质瘤治疗的效率和有效性。最近,与传统方法相比,已经证明了深度学习方法和 3D 医学方式分割任务的优势。特别是对于脑肿瘤分割,提出了 U-Net,其中具有额外的 VAE 分支来重建输入图像并将共享编码器正规化。建立了两阶段的级联 U-Net,以优化来自第一阶段的粗略预测并在第二阶段捕获上下文信息。至于脑 CTV 分割,引入 DENSENET 预测切除腔精确轮廓。为了提高研究人员对脑 CTV 分割研究的兴趣,癌症传播挑战(ABCS)的解剖学脑障碍旨在鼓励挑战者构建自动脑结构分割方法,其中一些关键结构(例如,作为屏障的结构脑癌或从照射中施加的蔓延)包括在两个任务中。 DataSet 提供了 45 个多模态图像,用于训练和验证的地面真理注释,为在线测试的 15 张图像,以及最终测试的 15 张图像。每种情况都具有 CT 扫描和两个诊断 MRI 扫描,其包括与后术后脑的对比度增强的 T1 加权和 T2 加权 Flair。此外,所有图像都是共同登记的,并重新采样到 164x194x142 像素的大小,各向同性分辨率为 1.2x1.2x1.2 mm。对于任务 1,要求挑战者被要求分段为自动识别 CTV 的脑结构。至于任务 2,参与​​者需要在放射治疗计划优化中使用的分段结构。评估指标指定为骰子得分和表面骰子分数,以评估所提出的算法的性能。在本报告中,我们提出了一种基于剩余的 U-Net,其由深编码器和浅解码器组成,并用 NN-UNET 框架实现。为提高框架性能并加速训练融合提供多种规模指导,我们在我们的框架中采用了深度的监督战略。为了抑制假阴性结果并检索丢失的小目标,我们将 TVersky 损失与跨熵丢失一起使用作为我们的标准。实验表明,所提出的框架优于比较的最先进的方法。

Method

Based on the U-Net architecture and nn-UNet framework, we propose a variant of this approach based on residual blocks. The details are illustrated as follows.

image

基于 U-Net 架构和 NN-UNET 框架的方法

我们提出了一种基于残差块的这种方法的变种。细节如下说明。

Encoder design

The proposed encoder is composed of residual blocks as shown in Fig.1, which consist of two 3x3x3 3D convolutions layers following with a normalization layer and a activation layer. Then, an identity skip connection operation is employed to link the shallow and the corresponding deep level features. Specifically, we use Instance Normalization (IN) as the normalization layer, since it performs better than BatchNorm when batch size is small (ours is 2), and requires less computation cost than Group Normalization (GN). In addition, We utilize LeakyReLU as the activation layer to retain more information than ReLU. Totally, we have 5 spatial levels with the number of residual blocks in each level of 1, 2, 3, 4, 4. The number of filters, which is implemented by a 3D stridden convolution layer with 3x3x3 filter and a stride of 2, is initiated with 32. Then, we doubles this number after each downsample operation. Notably, the number of the filter in the last level is set to 320 for computational efficiency. The last downsample convolution avoid the axis z for retaining the slice-wise resolution. We randomly crop the patch with the size of 3x128x160x112 from 3-modality images as the framework input. After the proposed framework processing, the final output is obtained with a size of 320x8x10x7.

编码器设计

所提出的编码器由剩余块组成,如图 1 所示,其由两个 3x3x3 3D 卷积图层组成,该层跟随归一化层和激活层。然后,采用标识跳过连接操作来链接浅层和相应的深度特征。具体地,我们使用实例归一化(IN)作为归一化层,因为当批量大小为小(我们的 2)时它比 Batchnorm 更好,并且需要较少的计算成本,而不是组标准化(GN)。另外,我们利用 Leaceryrue 作为激活层来保留更多信息而不是 Relu。完全,我们有 5 个空间水平,每个级别的剩余块数为 1,2,3,4,4,4。过滤器的数量,由 3×3x3 滤波器的 3D 横向卷积层和 2 的步幅实现的滤波器的数量。用 32 开始启动。然后,我们在每个下次操作后都会加倍此数量。值得注意的是,最后一个级别中的滤波器的数量被设置为 320 以进行计算效率。最后一个下面的卷积避免了轴 Z,用于保留切片明示的分辨率。我们随机裁剪了 3×128x160x112 的贴片,从 3 个模态图像作为框架输入。在提出的框架处理之后,获得最终输出,尺寸为 320x8x10x7。

Decoder design

The decoder structure is the opposite operation of the encoder, composed of an upsample block and a standard convolution block, where the upsample block is implemented with a 1x1x1 3D convolution layer and a 3D transpose convolution layer. The 3D convolution layer aims to reduce the number of features by a factor of 2, and the 3D transpose convolution layer is used for double the spatial dimension. The standard convolution block consists of a simple 3x3x3 3D convolution layer followed with a Instance Normalization and a Leaky ReLU layers. At the end of the decoder, a 1x1x1 convolution layer with the output channels of target classes conjoins a softmax activation layer to get the final prediction. The shallow decoder could not only boost the speed of training and inference stage but also avoid overfitting. To accelerate model convergence and provide multi-scale guidance, we also introduce a deepvision strategy, where other three levels of the features will be processed with an extra output block so as to obtain the additional predictions and supervised by the same loss function. The final loss will be weighted differently with respect to the importance of each level output. Here, we set the weights of 8/15, 4/15, 2/15, and 1/15 for the outputs from level 1, 2, 3, and 4 respectively.

解码器设计

解码器结构是编码器的相反操作,由上置块和标准卷积块组成,其中 UPSample 块用 1x1x1 3D 卷积层和 3D 转置卷积层实现。 3D 卷积层旨在将特征的数量减少为 2 倍,3D 转置卷积层用于双空间尺寸。标准卷积块由一个简单的 3x3x3 3D 卷积图层组成,然后具有实例归一化和泄漏的 Relu 层。在解码器的末尾,一个 1x1x1 卷积层,其输出通道的目标类别与 Softmax 激活层混合以获得最终预测。浅层解码器不仅可以提高训练和推理阶段的速度,而且还避免过度装备。为了加速模型融合并提供多种规模指导,我们还引入了深度策略,其中将使用额外的输出块处理其他三个特征,以便获得附加预测和由相同的丢失功能监督。最终损失将在各级输出的重要性方面不同。在这里,我们分别设置 8/15,4 / 15,2 / 15 和 1/15 的重量,分别为来自级别 1,2,3 和 4 的输出。

Loss function

We utilize the DC-CE loss as our criterion, which is composed of origin linear Dice loss and cross-entropy loss:

$$ L_{DCCE} = -\frac{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} y_{i, c}+\epsilon}{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c}+y_{i, c}+\epsilon} -\frac{1}{N}\frac{1}{C}\sum_{i=1}^{N}\sum_{c=1}^{C} y_{i, c}\log\left(p_{i, c}\right)
$$

where $ N $ is the total number of pixels, and $ C $ denotes all classes. $ p_{i, c} $ and $ y_{i, c} $ represent the prediction and annotation of pixel $ i $ and class $ c $ , respectively. The Tversky loss is also introduced to train our proposed framework, which can be formulated as:

$$ L_{Tversky}=\frac{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} y_{i, c}+\epsilon}{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} y_{i, c}+\alpha \sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} \bar{y}{i, c}+\beta \sum{i=1}^{N}\sum_{c=1}^{C} \bar{p}{i, c} y{i, c}+\epsilon} $$

where $ p_{i, c}\bar{y}{i, c} $ denotes the false positive (FP) pixels; $ bar{p}{i, c} y_{i, c} $ denotes the false negative (FN) pixels. The importance of FP and FN is weighted by $ \alpha $ and $ \beta $ , which are set to 0.3 and 0.7, respectively.

损失函数

我们利用 DC-CE 损失作为我们的标准,由原点线性骰子丢失和跨熵丢失组成:

$$ l_ {dcce} = - \ frac {\ sum_ {i = 1} ^ { n} \ sum_ {c = 1} ^ {c} p_ {i,c} y_ {i,c} + \ epsilon} {\ sum_ {i = 1} ^ {n} \ sum_ {c = 1} ^ { C} p_ {i,c} + y_ {i,c} + \ epsilon} - \ frac {1} {n} \ frac {1} {c} \ sum_ {i = 1} ^ {n} \ sum_ { c = 1} ^ {c} y_ {i,c} \ log \ left(p_ {i,c} \右)$$

其中$N $是像素的总数,$C $表示所有类。 $p_{i, c} $和$y_{i, c} $分别表示像素$i$和 class $c$的预测和注释。还介绍了 TVERSKY 损失来培训我们提出的框架,可以制定为:

$$ l_ {tversky} = \ frac {\ sum_ {i = 1} ^ {n} \ sum_ {c} p_ {i,c} y_ {i,c} + \ epsilon} {\ sum_ {i = 1} ^ {n} \ sum_ {c = 1} ^ {c} p_ {i,c} y_ {i,c } + \ alpha \ sum_ {i = 1} ^ {n} \ sum_ {c = 1} ^ {c} p_ {i,c} \ bar {y} _ {i,c} + \ beta \ sum_ {i = 1} ^ {n} \ sum_ {c = 1} ^ {c} \ bar {p} _ {i,c} y_ {i,c} y_ {i,c} + epsilon} $$

其中$p_{i, c}\bar{y}{i, c} $表示假阳性(FP)像素; $bar{p}{i, c} y_{i, c} $表示假阴性(FN)像素。 FP 和 Fn 的重要性由$\alpha $和$\beta $分别设置为 0.3 和 0.7。

Pseudo training with model ensemble

For a further improvement of segmentation accuracy, we gradually employ the model ensemble strategy and pseudo training strategy. At first, we calculate the average of all selected models to get the prediction of the test volumes:

$$ P^{*}=\frac{1}{N}\sum_{i=1}^{N} P_{i} $$

The $ P_{i} $ denotes the specific prediction of different models, while the $ P^{*} $ denotes the final ensembled prediction. We find it more stable than any single prediction since the error-prone outliers are removed in the ensemble process. Based on the reliable predictions, we treat them as the pseudo labels for the test volumes. The final loss function for the model training can be described as follow:

$$ L_{hybrid}(V, L)=L_{DCCE}(V, L)+L_{Tversky}(V, L) $$

$$ L_{final}=L_{hybrid}\left(V_{s}, L_{s}\right)+L_{hybrid}\left(V_{t}, P_{t}^{*}\right) $$

We define the sum of $ L_{DCCE} $ and $ L_{Tversky} $ as $ L_{hybrid} $ , and utilize the ensembled prediction $ P^{*} $ as the pseudo label for test volumes. The pseudo training strategy can bring the model more training pairs, and the model can learn reliable information from pseudo labels. The result in Table t1 proved the effectiveness of the methods in the section.

伪训练与模型集合

进一步提高分割准确性,我们逐步采用模型集合策略和伪训练策略。首先,我们计算所有所选模型的平均值以获得测试卷的预测:

$$ p ^ {*} = \ frac {1} {n} \ sum_ {i = 1} ^ {n} p_ {i $$

$P_{i} $表示不同模型的特定预测,而$P^{*} $表示最终的集成预测。我们发现它比任何单个预测更稳定,因为在集合过程中删除了易于容易出错的异常值。根据可靠的预测,我们将它们视为测试卷的伪标签。模型训练的最终损失函数可以描述如下:

$$ l_ {hybrid}(v,l)= l_ {dcce}(v,l)+ l_ {tversky}(v,l)$$

$$ l_ {final} = l_ {hybrid} \ left(v_ {s},l_ {s}右)+ l_ {hybrid} \ left(v_ {t},p_ {t} ^ {*} \ rectle)$$

我们定义$L_{DCCE} $和$L_{Tversky} $的总和为$L_{hybrid} $,并利用集合的预测$P^{*} $作为伪标签进行测试卷。伪训练策略可以使模型更多的训练对,并且该模型可以从伪标签学习可靠的信息。表 T1 的结果证明了该部分中的方法的有效性。

Optimization

We utilize the SGD with initial learning rate of $ \alpha_{0}=1e-4 $ and momentum of 0.99 to optimize our network. Poly strategy is employed to progressively decrease learning rate according to:

$$ \alpha=\alpha_{0} *\left(1-\frac{e}{N_{e}}\right)^{0.9} $$

where $ e $ an iterator of the current epoch; $ N_e $ is the total number of the training epochs. We set $ N_e=1000 $ with batch size of 2 in the training stage. L2 norm regularization with a weight of $ 1e-5 $ is used to prevent overfitting.

优化

我们利用 SGD,具有$\alpha_{0}=1e-4 $的初始学习速率和 0.99 的势头来优化我们的网络。根据:

$$ \ alpha = \ alpha_ {0} * \ left(1- \ frac {e} {n_ {e} \ \右)^ {0.9} $$

其中$e $当前时代的迭代器; $N_e $是培训时期的总数。我们在训练阶段的批量大小设置了$N_e=1000 $。 L2 规范正则正规,重量为$1e-5 $来防止过度拟合。

Experiments

实验

Implementation details and data processing

In this section, we will demonstrate the experimental details of our method. Firstly, we adopt the normalization for three modalities images by subtracting the mean values and dividing the variances. Specifically, to reduce the negative effect from extremum in CT images, the gray intensity in CT modality is clipped into [0.5, 0.995]. Afterwards, we concatenate 3 modalities to obtain a 3-channel volume as the framework input. A series of data augmentation strategies are employed, including rotation, scale, mirror, gamma correction, and brightness additive, to improve the robustness of the framework. As to the inference phase, the test time augmentation strategy(e.g., sliding window and flipping across three axes) is introduced to improve the performance. It is worth noting that adopting horizontal flip along with the x-axis could greatly reduce the accuracy in our experiment. The reason might be that such operation will misleads the framework when learning the paired tissues such as eyes and cochlea. Therefore, we decide to avoid the horizontal flip along with x-axis in Task 2. To evaluate the effectiveness of the proposed framework, we employ the 5-fold cross validations, and choose the proper models to generate final submission. All proposed framework is implemented in PyTorch using an NVIDIA Tesla V100 GPU.

实现详细信息和数据处理

在本节中,我们将展示我们方法的实验细节。首先,通过减去平均值并除去差异来采用三个模态图像的归一化。具体而言,为了减少 CT 图像中极值的负效应,CT 模态的灰色强度被夹在[0.5,0.995]中。然后,我们连接 3 个模态以获得 3 声道卷作为框架输入。采用一系列数据增强策略,包括旋转,尺度,镜像,伽马校正和亮度添加剂,以改善框架的鲁棒性。关于推理阶段,引入了测试时间增强策略(例如,滑动窗口,并在三轴翻转)以提高性能。值得注意的是,采用水平翻转与 X 轴一起可以大大降低我们实验中的准确性。原因可能是这种操作在学习诸如眼睛和耳蜗的成对组织时会误导框架。因此,我们决定避免在任务 2 中与 X 轴一起避免水平翻转。以评估所提出的框架的有效性,我们采用了 5 倍的交叉验证,并选择了正确的模型以生成最终提交的模型。所有提议的框架都是在 Pytorch 中实现的,使用 NVIDIA Tesla V100 GPU。

Quantitative and Qualitative Analysis

We evaluate the performance of the proposed frameworks on the online test dataset, with the evaluation metrics as Dice score (DSC) and surface Dice score (SDSC) for Task1 and Task2. The experiment result as shown in Table t1, we choose nnU-Net as the state-of-the-art for comparison, which achieves 87.2% in DSC and 97.4% in SDSC of Task1 and 77.7% in DSC and 92.6% in SDSC of Task2, respectively. In contrast to nn-UNet, our ResUnet achieves the improvement with 0.3% in DSC and 0.3% in SDSC of Task2, and ResUnet with Tversky loss achieves improvement with 0.1% of DSC and 0.1% of SDSC in Task2. To further boost the framework performance, we ensemble the prediction of above methods to get better results. We visualize the predictions for a qualitative analysis. As shown in Fig.results the predictions of our proposed framework are very close to corresponding labels. We also reconstructed the prediction in 3D in Fig.3d, where all target issues are clearly identified.

image

image

定量和定性分析

我们评估在线测试数据集的提出框架的性能,评估度量为 DICE 评分(DSC)和 TASK2 的曲面骰子分数(SDSC)。如表 T1 所示的实验结果,我们选择 NNU-NET 作为最先进的比较,在 DSC 中达到 87.2%,在 DSC 的任务 1 和 77.7%的 SDSC 中获得 97.4%,SDSC 的 92.6%任务 2 分别。与 NN-UNET 相比,我们的 RESUNT 达到了 DSC 中 0.3%的改善,在 TASK2 的 SDSC 中为 0.3%,并且具有 TVERSKY 损失的 RESUNT,在任务 2 中的 0.1%的 DSC 和 0.1%的 SDSC 实现了改善。为了进一步提高框架性能,我们集合了上述方法的预测,以获得更好的结果。我们可视化定性分析的预测。如图结果所示,我们建议框架的预测非常接近相应的标签。我们还重建了图 3D 中的 3D 预测,其中所有目标问题都明确识别。

Conclusions

In this study, we proposed a effective framework for Anatomical Brain Barriers to Cancer Spread challenge. Specifically, we used residual block based U-shape network as the proposed architecture and the Tversky loss as the criterion, to enforce the the feature extraction ability. The ensemble strategy was adopted to refine the prediction and get the better result. Experiments on the online test set varied the efficacy of proposed framework.

结论

在本研究中,我们提出了对癌症传播挑战的解剖学脑壁障碍有效框架。具体地,我们使用基于残差块的 U 形网络作为所提出的架构和 TVersky 丢失作为标准,以强制强制特征提取能力。采用了集合策略来改进预测并获得更好的结果。在线测试集上的实验变化了所提出的框架的功效。