Transfer Learning in Polyp and Endoscopic Tool Segmentation from Colon os copy Images
结肠镜图像中息肉和内镜器械分割的迁移学习
Abstract
摘要
Colorectal cancer is one of the deadliest and most widespread types of cancer in the world. Colon os copy is the procedure used to detect and diagnose polyps from the colon, but today’s detection rate shows a significant error rate that affects diagnosis and treatment. An automatic image segmentation algorithm may help doctors to improve the detection rate of pathological polyps in the colon. Furthermore, segmenting endoscopic tools in images taken during colon os copy may contribute towards robotic assisted surgery. In this study, we used both pre-trained and not pre-trained segmentation models. We trained and validated both on two different data sets, containing images of polyps and endoscopic tools. Finally, we applied the models on two separate test sets. The best polyp model got a dice score $=0.857$ and the test instrument model got a dice score $=~0.948$ . Moreover, we found that pre-training of the models increased the performance when segmenting polyps and endoscopic tools.
结直肠癌是全球致死率最高且最普遍的癌症类型之一。结肠镜检查是用于检测和诊断结肠息肉的手术,但目前的检出率存在显著误差,影响诊断和治疗。自动图像分割算法可帮助医生提高结肠病理息肉的检出率。此外,在结肠镜检查图像中分割内窥镜工具有助于机器人辅助手术。本研究同时使用了预训练和非预训练的分割模型,在两个不同数据集(包含息肉和内窥镜工具图像)上进行了训练与验证,最终在两个独立测试集上应用模型。最佳息肉模型的Dice系数达$=0.857$,器械测试模型的Dice系数为$=~0.948$。此外,我们发现模型预训练能提升息肉与内窥镜工具的分割性能。
Keywords: Polyp segmentation, Transfer learning, Hyperkvasir, Convolutional neural networks, MedAI challenge
关键词:息肉分割、迁移学习、Hyperkvasir数据集、卷积神经网络 (Convolutional Neural Networks)、MedAI挑战赛
Introduction
引言
Colorectal cancer (CRC) was the third most common and second most deadly cancer type worldwide in 2020 [1]. CRC is strongly associated with colorectal polyps, and colon os copy is considered to be the best method for the detection of colorectal polyps [2, 3]. Studies have shown that between $6%$ and $27%$ of the colorectal polyps are missed by the clinicians during the colon osco pic examination [4]. On the other hand, artificial intelligence (AI) and image segmentation have shown to be useful in segmenting colorectal polyps [2, 3], and this may help the end osco pi sts to detect the polyps that otherwise are being overseen. Detection of colorectal polyps and endoscopic tools may also play a role in the development of roboticassisted surgical systems [5]. A recent study showed that pre-trained Convolutional Neural Networks (CNN) improved the performance in classifying colorectal polyps from colon os copy images [6], but still it is not explored whether a pre-trained segmentation models will improve the performance of colorectal polyp segmentation. In this study, which is a part of a machine learning challenge [7], we aim to assess pre-trained and not pre-trained CNNs to detect polyps and endoscopic tools from colon osco pic images.
结直肠癌 (CRC) 是2020年全球第三大常见癌症和第二大致死癌症类型 [1]。CRC与结肠息肉密切相关,而结肠镜检查被认为是检测结肠息肉的最佳方法 [2, 3]。研究表明,临床医生在结肠镜检查中会漏诊约 $6%$ 至 $27%$ 的结肠息肉 [4]。另一方面,人工智能 (AI) 和图像分割技术已被证明在分割结肠息肉方面具有实用价值 [2, 3],这可能帮助内镜医师发现那些原本会被忽视的息肉。结肠息肉检测和内镜工具也可能在机器人辅助手术系统的发展中发挥作用 [5]。最近一项研究表明,预训练的卷积神经网络 (CNN) 提高了从结肠镜图像中分类结肠息肉的性能 [6],但尚未探讨预训练的分割模型是否会提高结肠息肉分割的性能。在本研究中(这是一项机器学习挑战赛的组成部分 [7]),我们旨在评估预训练和未预训练的CNN在结肠镜图像中检测息肉和内镜工具的性能。
Methods
方法
Two models were developed as part of the challenge; one model to segment polyps in images and another to segment endoscopic tools in images. A CNN is a data-driven type of model and thus we had to train the model on some relevant data. The polyp model was trained on the Kvasir-SEG open data set consisting of 1000 images, containing one or more polyps [8], whereas the instrument model was trained on Kvasir-Instrument, which is another open data set consisting of 590 images, containing different endoscopic tools [5]. Both of the data sets also contained a corresponding annotated mask for each of the images, highlighting the polyps or endoscopic tools in the images.
作为挑战的一部分,开发了两个模型:一个用于分割图像中的息肉,另一个用于分割图像中的内窥镜工具。CNN (Convolutional Neural Network) 是一种数据驱动型模型,因此我们需要在相关数据上对其进行训练。息肉模型基于 Kvasir-SEG 开放数据集训练,该数据集包含 1000 张含有一个或多个息肉的图像 [8];而器械模型则基于 Kvasir-Instrument 训练,这是另一个开放数据集,包含 590 张带有不同内窥镜工具的图像 [5]。两个数据集均提供了每张图像对应的标注掩膜,用于突出显示图像中的息肉或内窥镜工具。
Data preprocessing: The images and masks in the data sets vary in resolution, and thus, they had to be resized in order to be fed to the CNN models. We selected $256x256$ pixels as the size of the input image and the predicted mask.
数据预处理:数据集中的图像和掩码分辨率各异,因此需要调整尺寸以输入CNN模型。我们选择$256x256$像素作为输入图像和预测掩码的尺寸。
Model architectures: The model architectures were retrieved from a Python library; "Segmentation Models" [9], that contains different CNN architectures. This library provides models with both untrained and pretrained weights. Pre-trained weights are achieved by training on ImageNet [10]. To find the best fit for our data sets we tested the following architectures provided by the library: Efficient Net, MobileNet, SE-ResNet, Inception, ResNet and VGG. The results of these experiments are publicly available.1
模型架构:模型架构是从一个名为"Segmentation Models"[9]的Python库中获取的,该库包含不同的CNN架构。这个库提供了带有未训练和预训练权重的模型。预训练权重是通过在ImageNet[10]上训练获得的。为了找到最适合我们数据集的架构,我们测试了该库提供的以下架构:Efficient Net、MobileNet、SE-ResNet、Inception、ResNet和VGG。这些实验的结果是公开可用的。1
Augmentations: Augmentations were applied on the training data in order to create a more versatile data set and achieve better generalization. We used nine different augmentation techniques: Random noise, gaussian blur, random rotation, image brightness, horizontal flip, vertical flip, random horizontal shift, random vertical shift and random zoom, for which an unique integer from 1 to 9 was assigned. For each epoch, the images and masks used to train the models, were given a random integer between zero and nine. The augmentation technique with the corresponding integer were used on the given image and mask. If the random integer was zero, no augmentation was applied.
数据增强:为了创建更具多样性的数据集并实现更好的泛化能力,我们在训练数据上应用了增强技术。我们使用了九种不同的增强方法:随机噪声、高斯模糊、随机旋转、图像亮度调整、水平翻转、垂直翻转、随机水平平移、随机垂直平移和随机缩放,并为每种方法分配了1至9的唯一编号。每个训练周期中,模型所用的图像和掩码会被随机赋予0至9之间的整数。若该整数非零,则对图像和掩码实施对应编号的增强技术;若为零则不进行任何增强处理。
Model selection 10-folded cross-validation on the development set were used to find the best model architecture and model parameters. The performance were measured using Dice similarity coefficient (DSC) and Intersection over Union (IoU) on the validation folds.
模型选择
在开发集上采用10折交叉验证来确定最佳模型架构和模型参数。通过在验证折上计算Dice相似系数 (DSC) 和交并比 (IoU) 来评估性能。
In the model selection phase, the learning rate was reduced during training, using a learning rate scheduler, which was set to lower the learning rate by a factor of ten when the IoU-score did not improve over three consecutive epochs.
在模型选择阶段,训练过程中通过使用学习率调度器逐步降低学习率,当IoU分数连续三个周期未提升时,学习率会降低十倍。
Clinical relevance and model transparency
临床相关性与模型透明度
A polyp segmentation algorithm, like the one presented in this study, could probably be used as a decision tool for end osco pi sts. To make the segmentation tool more clinically relevant, and to streamline the work of the end osco pi sts, we developed a polyp counter algorithm. This algorithm detects the contours of the segmented polyps in the masks and counts objects. The purpose of this algorithm is to tell if or how many polyps there are in each image, so the doctors only need to look at the images with detected polyps and ignore the images without detected polyps. Moreover, the masks provided will highlight the polyps and improve the end osco pi sts focus on the abnormalities in the colon os copy images. The polyp counter algorithm and the rest of the code developed in this project are publicly available on GitHub 2.
本研究提出的息肉分割算法可作为内镜医师的决策辅助工具。为提升该分割工具的临床实用性并优化内镜医师工作流程,我们开发了息肉计数算法。该算法通过检测掩膜中分割息肉轮廓实现目标计数,其核心功能是判断每幅图像是否存在息肉及具体数量,使医生只需关注存在息肉标注的图像。此外,提供的掩膜会高亮显示息肉区域,有助于内镜医师更聚焦于肠镜图像中的异常病变。本项目的息肉计数算法及其他相关代码已在GitHub开源[2]。
Results
结果
Model deployment: From our experiments, we found that efficient net b 1 outperformed the other model architectures tested. Furthermore, we did experimental fine tuning of the hyper parameters and the settings which gave the highest mean DSC are shown in Table 1. Efficient net b 1 with the settings shown in Table 1 was finally used to train the models on the whole development set and applied on the test data which consisted of 300 images with endoscopic tools and 300 images with colorectal polyps. The predicted masks were used to participate in the MedAI challenge. In the final training procedure, the learning rate schedule was programmed to imitate the best learning rate schedule found during model selection.
模型部署:通过实验,我们发现EfficientNet B1的表现优于其他测试的模型架构。此外,我们对超参数进行了实验性微调,表1展示了获得最高平均DSC(Dice相似系数)的设置。最终采用表1所示设置的EfficientNet B1在整个开发集上训练模型,并应用于包含300张内窥镜工具图像和300张结直肠息肉图像的测试数据。预测的掩模用于参与MedAI挑战赛。在最终训练过程中,学习率调度方案被编程为模拟模型选择阶段找到的最佳学习率调度方案。
Table 1: Model architectures, parameters and hyperparameters used to train the final Polyp and Instrument model.
| Parameter | Polyp | Instrument |
| Modelarchitecture | efficientnetb1 | efficientnetb1 |
| Pre-trained | Yes | Yes |
| Batch size | 30 | 30 |
| Epochs | 20 | 35 |
| Initial learning rate | 0.001 | 0.001 |
| Optimizer | Adam | Adam |
| Lossfunction | loU | loU |
表 1: 用于训练最终息肉和器械模型的架构、参数及超参数。
| 参数 | 息肉模型 | 器械模型 |
|---|---|---|
| 模型架构 (Model architecture) | efficientnetb1 | efficientnetb1 |
| 预训练 (Pre-trained) | 是 | 是 |
| 批大小 (Batch size) | 30 | 30 |
| 训练轮数 (Epochs) | 20 | 35 |
| 初始学习率 (Initial learning rate) | 0.001 | 0.001 |
| 优化器 (Optimizer) | Adam | Adam |
| 损失函数 (Loss function) | loU | loU |
The segmentation performance of the best instrument and polyp models are summarized in Table 2.
最佳器械和息肉模型的分割性能总结如表 2 所示。
| 数据集 | 指标 | 开发集 | 测试集 |
|---|---|---|---|
| Polyp | DSC IoU | 0.874 ± 0.011 0.804±0.013 | 0.857 0.800 |
| Instrument | DSC IoU | 0.937±0.015 0.893±0.020 | 0.948 0.911 |
Table 2: Dice similarity coefficient (DSC) and Intersection over Union (IoU) score achieved on both the polyp and instrument development sets and test sets. The scores on the development sets are achieved using 10-fold cross-validation.
表 2: 在息肉和器械开发集及测试集上取得的Dice相似系数(DSC)和交并比(IoU)分数。开发集分数采用10折交叉验证获得。
The same models described in Table 1 and scored in Table 2, but without pre-training on ImageNet, were scored on the development sets using cross-validation. The model applied on the polyp data set achieved a DSC score of $0.653\pm0.072$ and a IoU score of $0.541\pm0.084$ . The model applied on the instrument data set achieved a DSC score of $0.888\pm:0.028$ and a IoU score of $0.822\pm0.036$ .
表1中描述并在表2中评分的相同模型,但未在ImageNet上进行预训练,使用交叉验证在开发集上进行了评分。应用于息肉数据集的模型取得了DSC分数$0.653\pm0.072$和IoU分数$0.541\pm0.084$。应用于器械数据集的模型取得了DSC分数$0.888\pm0.028$和IoU分数$0.822\pm0.036$。
Conclusion
结论
The results of this study show that the model which performed best on the development sets, according to our experiments, also generalized well to the MedAI test sets. Secondly, we found that pre-training the model on Imagenet significantly increased the performance on both the polyp and instrument development sets. These results may have implications for further work within the field of polyp segmentation, but also in other image segmentation tasks.
本研究结果表明,根据我们的实验,在开发集上表现最佳的模型也能很好地泛化到MedAI测试集。其次,我们发现基于Imagenet进行预训练显著提升了模型在息肉和器械开发集上的性能。这些发现可能对息肉分割领域的后续研究具有启示意义,同时也适用于其他图像分割任务。
