[论文翻译]基于ResUnet的解剖脑屏障分割


原文地址:https://arxiv.org/pdf/2012.14567v2.pdf


Ensembled ResUnet for Anatomical Brain Barriers Segmentation

基于ResUnet的解剖脑屏障分割

Abstract

Accuracy segmentation of brain structures could be helpful for glioma and radiotherapy planning. However, due to the visual and anatomical differences between different modalities, the accurate segmentation of brain structures becomes challenging.
To address this problem, we first construct a residual block based U-shape network with a deep encoder and shallow decoder, which can trade off the framework performance and efficiency. Then, we introduce the Tversky loss to address the issue of the class imbalance between different foreground and the background classes. Finally, a model ensemble strategy is utilized to remove outliers and further boost performance.

摘要

脑结构的精度分割可能有助于胶质瘤和放射治疗计划。然而,由于不同方式之间的视觉和解剖差异,脑结构的准确分割变得具有挑战性。为了解决这个问题,我们首先使用深度编码器和浅层解码器构建基于剩余的U形网络,可以缩短框架性能和效率。然后,我们介绍了TVERSKY丢失来解决不同前景和背景类之间的类别不平衡问题。最后,利用模型集合策略来消除异常值并进一步提高性能。

Introduction

Gliomas are the most common type of brain tumors, originating from glial cells in the human brain. In clinical radiation therapy, the precise identification of the clinical target volume (CTV) boundary can ensure the operation effect, whereas the CTV boundary is determined by anatomical structures. Therefore, the accurate and automatic segmentation of brain anatomical structures could improve both efficiency and effectiveness for gliomas treatment.
Recently, deep learning approaches have been proven its the superiority on both 2D natural images and 3D medical modalities segmentation task compared to the traditional methods. Especially for the brain tumor segmentation, proposed a U-Net with an additional VAE branch to reconstruct input images and regularize the shared encoder. established a two-stage cascaded U-Net to refine the coarse prediction from the first stage and capture context information in the second stage. As to brain CTV segmentation, introduced DenseNet to predict resection cavity precise contours.
To raise the researcher interest in the study of brain CTV segmentation, the Anatomical Brain Barriers to Cancer Spread challenge (ABCs) aims to encourage challengers to construct an automatic brain structures segmentation methods, where some critical structures (e.g., structures that are served as barriers to the spread of brain cancers or spared from irradiation) are included in two tasks.
The dataset provides 45 multi-modal images with ground truth annotations for training and validation, 15 images for the online test, and 15 images for the final test. Each case is given with a CT scan and two diagnostic MRI scan which includes contrast-enhanced T1-weighted and T2-weighted FLAIR of the post-operative brain. Besides, all images are co-registered and re-sampled to the size of 164x194x142 pixels with the isotropic resolution of 1.2x1.2x1.2 mm. For the task 1, challengers are asked to segment brain structures for the automatic identification of the CTV. As for the task 2, participants are required to segment structures that used in radiotherapy treatment plan optimization. Evaluation metrics are specified to the Dice score and surface Dice score to evaluate the performance of the proposed algorithms.
In this report, we propose a residual block based U-Net which is composed of a deep encoder and a shallow decoder and implemented with nn-UNet framework. To provide multi-scale guidance for boosting framework performance and accelerate training convergence, we employ a deep supervision strategy in our framework. To suppress the false-negative results and retrieve the missing small targets, we utilize the Tversky loss together with the cross-entropy loss as our criterion. Experiments show that the proposed framework outperforms the compared state-of-the-art methods.

介绍

gliomas是最常见的脑肿瘤类型,来自人脑中的胶质细胞。在临床放射治疗中,临床目标体积(CTV)边界的精确鉴定可以确保操作效果,而CTV边界由解剖结构确定。因此,脑解剖结构的准确性和自动分割可以提高胶质瘤治疗的效率和有效性。最近,与传统方法相比,已经证明了深度学习方法和3D医学方式分割任务的优势。特别是对于脑肿瘤分割,提出了U-Net,其中具有额外的VAE分支来重建输入图像并将共享编码器正规化。建立了两阶段的级联U-Net,以优化来自第一阶段的粗略预测并在第二阶段捕获上下文信息。至于脑CTV分割,引入DENSENET预测切除腔精确轮廓。为了提高研究人员对脑CTV分割研究的兴趣,癌症传播挑战(ABCS)的解剖学脑障碍旨在鼓励挑战者构建自动脑结构分割方法,其中一些关键结构(例如,作为屏障的结构脑癌或从照射中施加的蔓延)包括在两个任务中。 DataSet提供了45个多模态图像,用于训练和验证的地面真理注释,为在线测试的15张图像,以及最终测试的15张图像。每种情况都具有CT扫描和两个诊断MRI扫描,其包括与后术后脑的对比度增强的T1加权和T2加权Flair。此外,所有图像都是共同登记的,并重新采样到164x194x142像素的大小,各向同性分辨率为1.2x1.2x1.2 mm。对于任务1,要求挑战者被要求分段为自动识别CTV的脑结构。至于任务2,参与​​者需要在放射治疗计划优化中使用的分段结构。评估指标指定为骰子得分和表面骰子分数,以评估所提出的算法的性能。在本报告中,我们提出了一种基于剩余的U-Net,其由深编码器和浅解码器组成,并用NN-UNET框架实现。为提高框架性能并加速训练融合提供多种规模指导,我们在我们的框架中采用了深度的监督战略。为了抑制假阴性结果并检索丢失的小目标,我们将TVersky损失与跨熵丢失一起使用作为我们的标准。实验表明,所提出的框架优于比较的最先进的方法。

Method

Based on the U-Net architecture and nn-UNet framework, we propose a variant of this approach based on residual blocks. The details are illustrated as follows.

image

基于U-Net架构和NN-UNET框架的方法

我们提出了一种基于残差块的这种方法的变种。细节如下说明。

Encoder design

The proposed encoder is composed of residual blocks as shown in Fig.1, which consist of two 3x3x3 3D convolutions layers following with a normalization layer and a activation layer. Then, an identity skip connection operation is employed to link the shallow and the corresponding deep level features. Specifically, we use Instance Normalization (IN) as the normalization layer, since it performs better than BatchNorm when batch size is small (ours is 2), and requires less computation cost than Group Normalization (GN). In addition, We utilize LeakyReLU as the activation layer to retain more information than ReLU. Totally, we have 5 spatial levels with the number of residual blocks in each level of 1, 2, 3, 4, 4. The number of filters, which is implemented by a 3D stridden convolution layer with 3x3x3 filter and a stride of 2, is initiated with 32. Then, we doubles this number after each downsample operation. Notably, the number of the filter in the last level is set to 320 for computational efficiency. The last downsample convolution avoid the axis z for retaining the slice-wise resolution. We randomly crop the patch with the size of 3x128x160x112 from 3-modality images as the framework input. After the proposed framework processing, the final output is obtained with a size of 320x8x10x7.

编码器设计

所提出的编码器由剩余块组成,如图1所示,其由两个3x3x3 3D卷积图层组成,该层跟随归一化层和激活层。然后,采用标识跳过连接操作来链接浅层和相应的深度特征。具体地,我们使用实例归一化(IN)作为归一化层,因为当批量大小为小(我们的2)时它比Batchnorm更好,并且需要较少的计算成本,而不是组标准化(GN)。另外,我们利用Leaceryrue作为激活层来保留更多信息而不是Relu。完全,我们有5个空间水平,每个级别的剩余块数为1,2,3,4,4,4。过滤器的数量,由3×3x3滤波器的3D横向卷积层和2的步幅实现的滤波器的数量。用32开始启动。然后,我们在每个下次操作后都会加倍此数量。值得注意的是,最后一个级别中的滤波器的数量被设置为320以进行计算效率。最后一个下面的卷积避免了轴Z,用于保留切片明示的分辨率。我们随机裁剪了3×128x160x112的贴片,从3个模态图像作为框架输入。在提出的框架处理之后,获得最终输出,尺寸为320x8x10x7。

Decoder design

The decoder structure is the opposite operation of the encoder, composed of an upsample block and a standard convolution block, where the upsample block is implemented with a 1x1x1 3D convolution layer and a 3D transpose convolution layer. The 3D convolution layer aims to reduce the number of features by a factor of 2, and the 3D transpose convolution layer is used for double the spatial dimension. The standard convolution block consists of a simple 3x3x3 3D convolution layer followed with a Instance Normalization and a Leaky ReLU layers. At the end of the decoder, a 1x1x1 convolution layer with the output channels of target classes conjoins a softmax activation layer to get the final prediction. The shallow decoder could not only boost the speed of training and inference stage but also avoid overfitting. To accelerate model convergence and provide multi-scale guidance, we also introduce a deepvision strategy, where other three levels of the features will be processed with an extra output block so as to obtain the additional predictions and supervised by the same loss function. The final loss will be weighted differently with respect to the importance of each level output. Here, we set the weights of 8/15, 4/15, 2/15, and 1/15 for the outputs from level 1, 2, 3, and 4 respectively.

解码器设计

解码器结构是编码器的相反操作,由上置块和标准卷积块组成,其中UPSample块用1x1x1 3D卷积层和3D转置卷积层实现。 3D卷积层旨在将特征的数量减少为2倍,3D转置卷积层用于双空间尺寸。标准卷积块由一个简单的3x3x3 3D卷积图层组成,然后具有实例归一化和泄漏的Relu层。在解码器的末尾,一个1x1x1卷积层,其输出通道的目标类别与Softmax激活层混合以获得最终预测。浅层解码器不仅可以提高训练和推理阶段的速度,而且还避免过度装备。为了加速模型融合并提供多种规模指导,我们还引入了深度策略,其中将使用额外的输出块处理其他三个特征,以便获得附加预测和由相同的丢失功能监督。最终损失将在各级输出的重要性方面不同。在这里,我们分别设置8/15,4 / 15,2 / 15和1/15的重量,分别为来自级别1,2,3和4的输出。

Loss function

We utilize the DC-CE loss as our criterion, which is composed of origin linear Dice loss and cross-entropy loss:

$$ L_{DCCE} = -\frac{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} y_{i, c}+\epsilon}{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c}+y_{i, c}+\epsilon} -\frac{1}{N}\frac{1}{C}\sum_{i=1}^{N}\sum_{c=1}^{C} y_{i, c}\log\left(p_{i, c}\right)
$$

where $ N $ is the total number of pixels, and $ C $ denotes all classes. $ p_{i, c} $ and $ y_{i, c} $ represent the prediction and annotation of pixel $ i $ and class $ c $ , respectively. The Tversky loss is also introduced to train our proposed framework, which can be formulated as:

$$ L_{Tversky}=\frac{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} y_{i, c}+\epsilon}{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} y_{i, c}+\alpha \sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} \bar{y}_{i, c}+\beta \sum_{i=1}^{N}\sum_{c=1}^{C} \bar{p}_{i, c} y_{i, c}+\epsilon} $$

where $ p_{i, c}\bar{y}{i, c} $ denotes the false positive (FP) pixels; $ bar{p}{i, c} y_{i, c} $ denotes the false negative (FN) pixels. The importance of FP and FN is weighted by $ \alpha $ and $ \beta $ , which are set to 0.3 and 0.7, respectively.

损失函数

我们利用DC-CE损失作为我们的标准,由原点线性骰子丢失和跨熵丢失组成:

$$ l_ {dcce} = - \ frac {\ sum_ {i = 1} ^ { n} \ sum_ {c = 1} ^ {c} p_ {i,c} y_ {i,c} + \ epsilon} {\ sum_ {i = 1} ^ {n} \ sum_ {c = 1} ^ { C} p_ {i,c} + y_ {i,c} + \ epsilon} - \ frac {1} {n} \ frac {1} {c} \ sum_ {i = 1} ^ {n} \ sum_ { c = 1} ^ {c} y_ {i,c} \ log \ left(p_ {i,c} \右)$$

其中$N $是像素的总数,$C $表示所有类。 $p_{i, c} $和$y_{i, c} $分别表示像素$i$和class $c$的预测和注释。还介绍了TVERSKY损失来培训我们提出的框架,可以制定为:

$$ l_ {tversky} = \ frac {\ sum_ {i = 1} ^ {n} \ sum_ {c} p_ {i,c} y_ {i,c} + \ epsilon} {\ sum_ {i = 1} ^ {n} \ sum_ {c = 1} ^ {c} p_ {i,c} y_ {i,c } + \ alpha \ sum_ {i = 1} ^ {n} \ sum_ {c = 1} ^ {c} p_ {i,c} \ bar {y} _ {i,c} + \ beta \ sum_ {i = 1} ^ {n} \ sum_ {c = 1} ^ {c} \ bar {p} _ {i,c} y_ {i,c} y_ {i,c} + epsilon} $$

其中$p_{i, c}\bar{y}{i, c} $表示假阳性(FP)像素; $bar{p}{i, c} y_{i, c} $表示假阴性(FN)像素。 FP和Fn的重要性由$\alpha $和$\beta $分别设置为0.3和0.7。

Pseudo training with model ensemble

For a further improvement of segmentation accuracy, we gradually employ the model ensemble strategy and pseudo training strategy. At first, we calculate the average of all selected models to get the prediction of the test volumes:

$$ P^{*}=\frac{1}{N}\sum_{i=1}^{N} P_{i} $$

The $ P_{i} $ denotes the specific prediction of different models, while the $ P^{*} $ denotes the final ensembled prediction. We find it more stable than any single prediction since the error-prone outliers are removed in the ensemble process. Based on the reliable predictions, we treat them as the pseudo labels for the test volumes. The final loss function for the model training can be described as follow:

$$ L_{hybrid}(V, L)=L_{DCCE}(V, L)+L_{Tversky}(V, L) $$

$$ L_{final}=L_{hybrid}\left(V_{s}, L_{s}\right)+L_{hybrid}\left(V_{t}, P_{t}^{*}\right) $$

We define the sum of $ L_{DCCE} $ and $ L_{Tversky} $ as $ L_{hybrid} $ , and utilize the ensembled prediction $ P^{*} $ as the pseudo label for test volumes. The pseudo training strategy can bring the model more training pairs, and the model can learn reliable information from pseudo labels. The result in Table t1 proved the effectiveness of the methods in the section.

伪训练与模型集合

进一步提高分割准确性,我们逐步采用模型集合策略和伪训练策略。首先,我们计算所有所选模型的平均值以获得测试卷的预测:

$$ p ^ {*} = \ frac {1} {n} \ sum_ {i = 1} ^ {n} p_ {i $$

$P_{i} $表示不同模型的特定预测,而$P^{*} $表示最终的集成预测。我们发现它比任何单个预测更稳定,因为在集合过程中删除了易于容易出错的异常值。根据可靠的预测,我们将它们视为测试卷的伪标签。模型训练的最终损失函数可以描述如下:

$$ l_ {hybrid}(v,l)= l_ {dcce}(v,l)+ l_ {tversky}(v,l)$$

$$ l_ {final} = l_ {hybrid} \ left(v_ {s},l_ {s}右)+ l_ {hybrid} \ left(v_ {t},p_ {t} ^ {*} \ rectle)$$

我们定义$L_{DCCE} $和$L_{Tversky} $的总和为$L_{hybrid} $,并利用集合的预测$P^{*} $作为伪标签进行测试卷。伪训练策略可以使模型更多的训练对,并且该模型可以从伪标签学习可靠的信息。表T1的结果证明了该部分中的方法的有效性。

Optimization

We utilize the SGD with initial learning rate of $ \alpha_{0}=1e-4 $ and momentum of 0.99 to optimize our network. Poly strategy is employed to progressively decrease learning rate according to:

$$ \alpha=\alpha_{0} *\left(1-\frac{e}{N_{e}}\right)^{0.9} $$

where $ e $ an iterator of the current epoch; $ N_e $ is the total number of the training epochs. We set $ N_e=1000 $ with batch size of 2 in the training stage. L2 norm regularization with a weight of $ 1e-5 $ is used to prevent overfitting.

优化

我们利用SGD,具有$\alpha_{0}=1e-4 $的初始学习速率和0.99的势头来优化我们的网络。根据:

$$ \ alpha = \ alpha_ {0} * \ left(1- \ frac {e} {n_ {e} \ \右)^ {0.9} $$

其中$e $当前时代的迭代器; $N_e $是培训时期的总数。我们在训练阶段的批量大小设置了$N_e=1000 $。 L2规范正则正规,重量为$1e-5 $来防止过度拟合。

Experiments

实验

Implementation details and data processing

In this section, we will demonstrate the experimental details of our method. Firstly, we adopt the normalization for three modalities images by subtracting the mean values and dividing the variances. Specifically, to reduce the negative effect from extremum in CT images, the gray intensity in CT modality is clipped into [0.5, 0.995]. Afterwards, we concatenate 3 modalities to obtain a 3-channel volume as the framework input. A series of data augmentation strategies are employed, including rotation, scale, mirror, gamma correction, and brightness additive, to improve the robustness of the framework. As to the inference phase, the test time augmentation strategy(e.g., sliding window and flipping across three axes) is introduced to improve the performance. It is worth noting that adopting horizontal flip along with the x-axis could greatly reduce the accuracy in our experiment. The reason might be that such operation will misleads the framework when learning the paired tissues such as eyes and cochlea. Therefore, we decide to avoid the horizontal flip along with x-axis in Task 2. To evaluate the effectiveness of the proposed framework, we employ the 5-fold cross validations, and choose the proper models to generate final submission. All proposed framework is implemented in PyTorch using an NVIDIA Tesla V100 GPU.

实现详细信息和数据处理

在本节中,我们将展示我们方法的实验细节。首先,通过减去平均值并除去差异来采用三个模态图像的归一化。具体而言,为了减少CT图像中极值的负效应,CT模态的灰色强度被夹在[0.5,0.995]中。然后,我们连接3个模态以获得3声道卷作为框架输入。采用一系列数据增强策略,包括旋转,尺度,镜像,伽马校正和亮度添加剂,以改善框架的鲁棒性。关于推理阶段,引入了测试时间增强策略(例如,滑动窗口,并在三轴翻转)以提高性能。值得注意的是,采用水平翻转与X轴一起可以大大降低我们实验中的准确性。原因可能是这种操作在学习诸如眼睛和耳蜗的成对组织时会误导框架。因此,我们决定避免在任务2中与X轴一起避免水平翻转。以评估所提出的框架的有效性,我们采用了5倍的交叉验证,并选择了正确的模型以生成最终提交的模型。所有提议的框架都是在Pytorch中实现的,使用NVIDIA Tesla V100 GPU。

Quantitative and Qualitative Analysis

We evaluate the performance of the proposed frameworks on the online test dataset, with the evaluation metrics as Dice score (DSC) and surface Dice score (SDSC) for Task1 and Task2. The experiment result as shown in Table t1, we choose nnU-Net as the state-of-the-art for comparison, which achieves 87.2% in DSC and 97.4% in SDSC of Task1 and 77.7% in DSC and 92.6% in SDSC of Task2, respectively. In contrast to nn-UNet, our ResUnet achieves the improvement with 0.3% in DSC and 0.3% in SDSC of Task2, and ResUnet with Tversky loss achieves improvement with 0.1% of DSC and 0.1% of SDSC in Task2. To further boost the framework performance, we ensemble the prediction of above methods to get better results. We visualize the predictions for a qualitative analysis. As shown in Fig.results the predictions of our proposed framework are very close to corresponding labels. We also reconstructed the prediction in 3D in Fig.3d, where all target issues are clearly identified.

image

image

定量和定性分析

我们评估在线测试数据集的提出框架的性能,评估度量为DICE评分(DSC)和TASK2的曲面骰子分数(SDSC)。如表T1所示的实验结果,我们选择NNU-NET作为最先进的比较,在DSC中达到87.2%,在DSC的任务1和77.7%的SDSC中获得97.4%,SDSC的92.6%任务2分别。与NN-UNET相比,我们的RESUNT达到了DSC中0.3%的改善,在TASK2的SDSC中为0.3%,并且具有TVERSKY损失的RESUNT,在任务2中的0.1%的DSC和0.1%的SDSC实现了改善。为了进一步提高框架性能,我们集合了上述方法的预测,以获得更好的结果。我们可视化定性分析的预测。如图结果所示,我们建议框架的预测非常接近相应的标签。我们还重建了图3D中的3D预测,其中所有目标问题都明确识别。

Conclusions

In this study, we proposed a effective framework for Anatomical Brain Barriers to Cancer Spread challenge. Specifically, we used residual block based U-shape network as the proposed architecture and the Tversky loss as the criterion, to enforce the the feature extraction ability. The ensemble strategy was adopted to refine the prediction and get the better result. Experiments on the online test set varied the efficacy of proposed framework.

结论

在本研究中,我们提出了对癌症传播挑战的解剖学脑壁障碍有效框架。具体地,我们使用基于残差块的U形网络作为所提出的架构和TVersky丢失作为标准,以强制强制特征提取能力。采用了集合策略来改进预测并获得更好的结果。在线测试集上的实验变化了所提出的框架的功效。