Ensembled ResUnet for Anatomical Brain Barriers Segmentation
基于ResUnet的解剖脑屏障分割
Abstract
Accuracy segmentation of brain structures could be helpful for glioma and radiotherapy planning. However, due to the visual and anatomical differences between different modalities, the accurate segmentation of brain structures becomes challenging.
To address this problem, we first construct a residual block based U-shape network with a deep encoder and shallow decoder, which can trade off the framework performance and efficiency. Then, we introduce the Tversky loss to address the issue of the class imbalance between different foreground and the background classes. Finally, a model ensemble strategy is utilized to remove outliers and further boost performance.
摘要
脑结构的精度分割可能有助于胶质瘤和放射治疗计划。然而,由于不同方式之间的视觉和解剖差异,脑结构的准确分割变得具有挑战性。为了解决这个问题,我们首先使用深度编码器和浅层解码器构建基于剩余的U形网络,可以缩短框架性能和效率。然后,我们介绍了TVERSKY丢失来解决不同前景和背景类之间的类别不平衡问题。最后,利用模型集合策略来消除异常值并进一步提高性能。
Introduction
Gliomas are the most common type of brain tumors, originating from glial cells in the human brain. In clinical radiation therapy, the precise identification of the clinical target volume (CTV) boundary can ensure the operation effect, whereas the CTV boundary is determined by anatomical structures. Therefore, the accurate and automatic segmentation of brain anatomical structures could improve both efficiency and effectiveness for gliomas treatment.
Recently, deep learning approaches have been proven its the superiority on both 2D natural images and 3D medical modalities segmentation task compared to the traditional methods. Especially for the brain tumor segmentation, proposed a U-Net with an additional VAE branch to reconstruct input images and regularize the shared encoder. established a two-stage cascaded U-Net to refine the coarse prediction from the first stage and capture context information in the second stage. As to brain CTV segmentation, introduced DenseNet to predict resection cavity precise contours.
To raise the researcher interest in the study of brain CTV segmentation, the Anatomical Brain Barriers to Cancer Spread challenge (ABCs) aims to encourage challengers to construct an automatic brain structures segmentation methods, where some critical structures (e.g., structures that are served as barriers to the spread of brain cancers or spared from irradiation) are included in two tasks.
The dataset provides 45 multi-modal images with ground truth annotations for training and validation, 15 images for the online test, and 15 images for the final test. Each case is given with a CT scan and two diagnostic MRI scan which includes contrast-enhanced T1-weighted and T2-weighted FLAIR of the post-operative brain. Besides, all images are co-registered and re-sampled to the size of 164x194x142 pixels with the isotropic resolution of 1.2x1.2x1.2 mm. For the task 1, challengers are asked to segment brain structures for the automatic identification of the CTV. As for the task 2, participants are required to segment structures that used in radiotherapy treatment plan optimization. Evaluation metrics are specified to the Dice score and surface Dice score to evaluate the performance of the proposed algorithms.
In this report, we propose a residual block based U-Net which is composed of a deep encoder and a shallow decoder and implemented with nn-UNet framework. To provide multi-scale guidance for boosting framework performance and accelerate training convergence, we employ a deep supervision strategy in our framework. To suppress the false-negative results and retrieve the missing small targets, we utilize the Tversky loss together with the cross-entropy loss as our criterion. Experiments show that the proposed framework outperforms the compared state-of-the-art methods.
介绍
gliomas是最常见的脑肿瘤类型,来自人脑中的胶质细胞。在临床放射治疗中,临床目标体积(CTV)边界的精确鉴定可以确保操作效果,而CTV边界由解剖结构确定。因此,脑解剖结构的准确性和自动分割可以提高胶质瘤治疗的效率和有效性。最近,与传统方法相比,已经证明了深度学习方法和3D医学方式分割任务的优势。特别是对于脑肿瘤分割,提出了U-Net,其中具有额外的VAE分支来重建输入图像并将共享编码器正规化。建立了两阶段的级联U-Net,以优化来自第一阶段的粗略预测并在第二阶段捕获上下文信息。至于脑CTV分割,引入DENSENET预测切除腔精确轮廓。为了提高研究人员对脑CTV分割研究的兴趣,癌症传播挑战(ABCS)的解剖学脑障碍旨在鼓励挑战者构建自动脑结构分割方法,其中一些关键结构(例如,作为屏障的结构脑癌或从照射中施加的蔓延)包括在两个任务中。 DataSet提供了45个多模态图像,用于训练和验证的地面真理注释,为在线测试的15张图像,以及最终测试的15张图像。每种情况都具有CT扫描和两个诊断MRI扫描,其包括与后术后脑的对比度增强的T1加权和T2加权Flair。此外,所有图像都是共同登记的,并重新采样到164x194x142像素的大小,各向同性分辨率为1.2x1.2x1.2 mm。对于任务1,要求挑战者被要求分段为自动识别CTV的脑结构。至于任务2,参与者需要在放射治疗计划优化中使用的分段结构。评估指标指定为骰子得分和表面骰子分数,以评估所提出的算法的性能。在本报告中,我们提出了一种基于剩余的U-Net,其由深编码器和浅解码器组成,并用NN-UNET框架实现。为提高框架性能并加速训练融合提供多种规模指导,我们在我们的框架中采用了深度的监督战略。为了抑制假阴性结果并检索丢失的小目标,我们将TVersky损失与跨熵丢失一起使用作为我们的标准。实验表明,所提出的框架优于比较的最先进的方法。
Method
Based on the U-Net architecture and nn-UNet framework, we propose a variant of this approach based on residual blocks. The details are illustrated as follows.
基于U-Net架构和NN-UNET框架的方法
我们提出了一种基于残差块的这种方法的变种。细节如下说明。
Encoder design
The proposed encoder is composed of residual blocks as shown in Fig.1, which consist of two 3x3x3 3D convolutions layers following with a normalization layer and a activation layer. Then, an identity skip connection operation is employed to link the shallow and the corresponding deep level features. Specifically, we use Instance Normalization (IN) as the normalization layer, since it performs better than BatchNorm when batch size is small (ours is 2), and requires less computation cost than Group Normalization (GN). In addition, We utilize LeakyReLU as the activation layer to retain more information than ReLU. Totally, we have 5 spatial levels with the number of residual blocks in each level of 1, 2, 3, 4, 4. The number of filters, which is implemented by a 3D stridden convolution layer with 3x3x3 filter and a stride of 2, is initiated with 32. Then, we doubles this number after each downsample operation. Notably, the number of the filter in the last level is set to 320 for computational efficiency. The last downsample convolution avoid the axis z for retaining the slice-wise resolution. We randomly crop the patch with the size of 3x128x160x112 from 3-modality images as the framework input. After the proposed framework processing, the final output is obtained with a size of 320x8x10x7.
编码器设计
所提出的编码器由剩余块组成,如图1所示,其由两个3x3x3 3D卷积图层组成,该层跟随归一化层和激活层。然后,采用标识跳过连接操作来链接浅层和相应的深度特征。具体地,我们使用实例归一化(IN)作为归一化层,因为当批量大小为小(我们的2)时它比Batchnorm更好,并且需要较少的计算成本,而不是组标准化(GN)。另外,我们利用Leaceryrue作为激活层来保留更多信息而不是Relu。完全,我们有5个空间水平,每个级别的剩余块数为1,2,3,4,4,4。过滤器的数量,由3×3x3滤波器的3D横向卷积层和2的步幅实现的滤波器的数量。用32开始启动。然后,我们在每个下次操作后都会加倍此数量。值得注意的是,最后一个级别中的滤波器的数量被设置为320以进行计算效率。最后一个下面的卷积避免了轴Z,用于保留切片明示的分辨率。我们随机裁剪了3×128x160x112的贴片,从3个模态图像作为框架输入。在提出的框架处理之后,获得最终输出,尺寸为320x8x10x7。
Decoder design
The decoder structure is the opposite operation of the encoder, composed of an upsample block and a standard convolution block, where the upsample block is implemented with a 1x1x1 3D convolution layer and a 3D transpose convolution layer. The 3D convolution layer aims to reduce the number of features by a factor of 2, and the 3D transpose convolution layer is used for double the spatial dimension. The standard convolution block consists of a simple 3x3x3 3D convolution layer followed with a Instance Normalization and a Leaky ReLU layers. At the end of the decoder, a 1x1x1 convolution layer with the output channels of target classes conjoins a softmax activation layer to get the final prediction. The shallow decoder could not only boost the speed of training and inference stage but also avoid overfitting. To accelerate model convergence and provide multi-scale guidance, we also introduce a deepvision strategy, where other three levels of the features will be processed with an extra output block so as to obtain the additional predictions and supervised by the same loss function. The final loss will be weighted differently with respect to the importance of each level output. Here, we set the weights of 8/15, 4/15, 2/15, and 1/15 for the outputs from level 1, 2, 3, and 4 respectively.
解码器设计
解码器结构是编码器的相反操作,由上置块和标准卷积块组成,其中UPSample块用1x1x1 3D卷积层和3D转置卷积层实现。 3D卷积层旨在将特征的数量减少为2倍,3D转置卷积层用于双空间尺寸。标准卷积块由一个简单的3x3x3 3D卷积图层组成,然后具有实例归一化和泄漏的Relu层。在解码器的末尾,一个1x1x1卷积层,其输出通道的目标类别与Softmax激活层混合以获得最终预测。浅层解码器不仅可以提高训练和推理阶段的速度,而且还避免过度装备。为了加速模型融合并提供多种规模指导,我们还引入了深度策略,其中将使用额外的输出块处理其他三个特征,以便获得附加预测和由相同的丢失功能监督。最终损失将在各级输出的重要性方面不同。在这里,我们分别设置8/15,4 / 15,2 / 15和1/15的重量,分别为来自级别1,2,3和4的输出。
Loss function
We utilize the DC-CE loss as our criterion, which is composed of origin linear Dice loss and cross-entropy loss:
$$ L_{DCCE} = -\frac{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} y_{i, c}+\epsilon}{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c}+y_{i, c}+\epsilon} -\frac{1}{N}\frac{1}{C}\sum_{i=1}^{N}\sum_{c=1}^{C} y_{i, c}\log\left(p_{i, c}\right)
$$
where $ N $ is the total number of pixels, and $ C $ denotes all classes. $ p_{i, c} $ and $ y_{i, c} $ represent the prediction and annotation of pixel $ i $ and class $ c $ , respectively. The Tversky loss is also introduced to train our proposed framework, which can be formulated as:
$$ L_{Tversky}=\frac{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} y_{i, c}+\epsilon}{\sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} y_{i, c}+\alpha \sum_{i=1}^{N}\sum_{c=1}^{C} p_{i, c} \bar{y}_{i, c}+\beta \sum_{i=1}^{N}\sum_{c=1}^{C} \bar{p}_{i, c} y_{i, c}+\epsilon} $$
where $ p_{i, c}\bar{y}{i, c} $ denotes the false positive (FP) pixels; $ bar{p}{i, c} y_{i, c} $ denotes the false negative (FN) pixels. The importance of FP and FN is weighted by $ \alpha $ and $ \beta $ , which are set to 0.3 and 0.7, respectively.
损失函数
我们利用DC-CE损失作为我们的标准,由原点线性骰子丢失和跨熵丢失组成:
$$ l_ {dcce} = - \ frac {\ sum_ {i = 1} ^ { n} \ sum_ {c = 1} ^ {c} p_ {i,c} y_ {i,c} + \ epsilon} {\ sum_ {i = 1} ^ {n} \ sum_ {c = 1} ^ { C} p_ {i,c} + y_ {i,c} + \ epsilon} - \ frac {1} {n} \ frac {1} {c} \ sum_ {i = 1} ^ {n} \ sum_ { c = 1} ^ {c} y_ {i,c} \ log \ left(p_ {i,c} \右)$$
其中$N $是像素的总数,$C $表示所有类。 $p_{i, c} $和$y_{i, c} $分别表示像素$i$和class $c$的预测和注释。还介绍了TVERSKY损失来培训我们提出的框架,可以制定为:
$$ l_ {tversky} = \ frac {\ sum_ {i = 1} ^ {n} \ sum_ {c} p_ {i,c} y_ {i,c} + \ epsilon} {\ sum_ {i = 1} ^ {n} \ sum_ {c = 1} ^ {c} p_ {i,c} y_ {i,c } + \ alpha \ sum_ {i = 1} ^ {n} \ sum_ {c = 1} ^ {c} p_ {i,c} \ bar {y} _ {i,c} + \ beta \ sum_ {i = 1} ^ {n