Transfer Learning in Polyp and Endoscopic Tool Segmentation from Colon os copy Images
结肠镜图像中息肉和内镜器械分割的迁移学习
Abstract
摘要
Colorectal cancer is one of the deadliest and most widespread types of cancer in the world. Colon os copy is the procedure used to detect and diagnose polyps from the colon, but today’s detection rate shows a significant error rate that affects diagnosis and treatment. An automatic image segmentation algorithm may help doctors to improve the detection rate of pathological polyps in the colon. Furthermore, segmenting endoscopic tools in images taken during colon os copy may contribute towards robotic assisted surgery. In this study, we used both pre-trained and not pre-trained segmentation models. We trained and validated both on two different data sets, containing images of polyps and endoscopic tools. Finally, we applied the models on two separate test sets. The best polyp model got a dice score $=0.857$ and the test instrument model got a dice score $=~0.948$ . Moreover, we found that pre-training of the models increased the performance when segmenting polyps and endoscopic tools.
结直肠癌是全球致死率最高且最普遍的癌症类型之一。结肠镜检查是用于检测和诊断结肠息肉的手术,但目前的检出率存在显著误差,影响诊断和治疗。自动图像分割算法可帮助医生提高结肠病理息肉的检出率。此外,在结肠镜检查图像中分割内窥镜工具有助于机器人辅助手术。本研究同时使用了预训练和非预训练的分割模型,在两个不同数据集(包含息肉和内窥镜工具图像)上进行了训练与验证,最终在两个独立测试集上应用模型。最佳息肉模型的Dice系数达$=0.857$,器械测试模型的Dice系数为$=~0.948$。此外,我们发现模型预训练能提升息肉与内窥镜工具的分割性能。
Keywords: Polyp segmentation, Transfer learning, Hyperkvasir, Convolutional neural networks, MedAI challenge
关键词:息肉分割、迁移学习、Hyperkvasir数据集、卷积神经网络 (Convolutional Neural Networks)、MedAI挑战赛
Introduction
引言
Colorectal cancer (CRC) was the third most common and second most deadly cancer type worldwide in 2020 [1]. CRC is strongly associated with colorectal polyps, and colon os copy is considered to be the best method for the detection of colorectal polyps [2, 3]. Studies have shown that between $6%$ and $27%$ of the colorectal polyps are missed by the clinicians during the colon osco pic examination [4]. On the other hand, artificial intelligence (AI) and image segmentation have shown to be useful in segmenting colorectal polyps [2, 3], and this may help the end osco pi sts to detect the polyps that otherwise are being overseen. Detection of colorectal polyps and endoscopic tools may also play a role in the development of roboticassisted surgical systems [5]. A recent study showed that pre-trained Convolutional Neural Networks (CNN) improved the performance in classifying colorectal polyps from colon os copy images [6], but still it is not explored whether a pre-trained segmentation models will improve the performance of colorectal polyp segmentation. In this study, which is a part of a machine learning challenge [7], we aim to assess pre-trained and not pre-trained CNNs to detect polyps and endoscopic tools from colon osco pic images.
结直肠癌 (CRC) 是2020年全球第三大常见癌症和第二大致死癌症类型 [1]。CRC与结肠息肉密切相关,而结肠镜检查被认为是检测结肠息肉的最佳方法 [2, 3]。研究表明,临床医生在结肠镜检查中会漏诊约 $6%$ 至 $27%$ 的结肠息肉 [4]。另一方面,人工智能 (AI) 和图像分割技术已被证明在分割结肠息肉方面具有实用价值 [2, 3],这可能帮助内镜医师发现那些原本会被忽视的息肉。结肠息肉检测和内镜工具也可能在机器人辅助手术系统的发展中发挥作用 [5]。最近一项研究表明,预训练的卷积神经网络 (CNN) 提高了从结肠镜图像中分类结肠息肉的性能 [6],但尚未探讨预训练的分割模型是否会提高结肠息肉分割的性能。在本研究中(这是一项机器学习挑战赛的组成部分 [7]),我们旨在评估预训练和未预训练的CNN在结肠镜图像中检测息肉和内镜工具的性能。
Methods
方法
Two models were developed as part of the challenge; one model to segment polyps in images and another to segment endoscopic tools in images. A CNN is a data-driven type of model and thus we had to train the model on some relevant data. The polyp model was trained on the Kvasir-SEG open data set consisting of 1000 images, containing one or more polyps [8], whereas the instrument model was trained on Kvasir-Instrument, which is another open data set consisting of 590 images, containing different endoscopic tools [5]. Both of the data sets also contained a corresponding annotated mask for each of the images, highlighting the polyps or endoscopic tools in the images.
作为挑战的一部分,开发了两个模型:一个用于分割图像中的息肉,另一个用于分割图像中的内窥镜工具。CNN (Convolutional Neural Network) 是一种数据驱动型模型,因此我们需要在相关数据上对其进行训练。息肉模型基于 Kvasir-SEG 开放数据集训练,该数据集包含 1000 张含有一个或多个息肉的图像 [8];而器械模型则基于 Kvasir-Instrument 训练,这是另一个开放数据集,包含 590 张带有不同内窥镜工具的图像 [5]。两个数据集均提供了每张图像对应的标注掩膜,用于突出显示图像中的息肉或内窥镜工具。
Data preprocessing: The images and masks in the data sets vary in resolution, and thus, they had to be resized in order to be fed to the CNN models. We selected $256x256$ pixels as the size of the input image and the predicted mask.
数据预处理:数据集中的图像和掩码分辨率各异,因此需要调整尺寸以输入CNN模型。我们选择$256x256$像素作为输入图像和预测掩码的尺寸。
Model architectures: The model architectures were retrieved from a Python library; "Segmentation Models" [9], that contains different CNN architectures. This library provides models with both untrained and pretrained weights. Pre-trained weights are achieved by training on ImageNet [10]. To find the best fit for our data sets we tested the following architectures provided by the library: Efficient Net, MobileNet, SE-ResNet, Inception, ResNet and VGG. The results of these experiments are publicly available.1
模型架构:模型架构是从一个名为"Segmentation Models"[9]的Python库中获取的,该库包含不同的CNN架构。这个库提供了带有未训练和预训练权重的模型。预训练权重是通过在ImageNet[10]上训练获得的。为了找到最适合我们数据集的架构,我们测试了该库提供的以下架构:Efficient Net、MobileNet、SE-ResNet、Inception、ResNet和VGG。这些实验的结果是公开可用的。1
Augmentations: Augmentations were applied on the training data in order to create a more versatile data set and achieve better generalization. We used nine different augmentation techniques: Random noise, gaussian blur, random rotation, image brightness, horizontal flip, vertical flip, random horizontal shift, random vertical shift and random zoom, for which an unique integer from 1 to 9 was assigned. For each epoch, the images and masks used to train the models, were given a random integer between zero and nine. The augmentation technique with the corresponding integer were used on the given image and mask. If the random integer was zero, no augmentation was applied.
数据增强:为了创建更具多样性的数据集并实现更好的泛化能力,我们在训练数据上应用了增强技术。我们使用了九种不同的增强方法:随机噪声、高斯模糊、随机旋转、图像亮度调整、水平翻转、垂直翻转、随机水平平移、随机垂直平移和随机缩放,并为每种方法分配了1至9的唯一编号。每个训练周期中,模型所用的图像和掩码会被随机赋予0至9之间的整数。若该整数非零,则对图像和掩码实施对应编号的增强技术;若为零则不进行任何增强处理。
Model selection 10-folded cross-validation on the development set were used to find the best model architecture and model parameters. The performance were measured using Dice similarity coefficient (DSC) and Intersection over Union (IoU) on the validation folds.
模型选择
在开发集上采用10折交叉验证来确定最佳模型架构和模型参数。通过在验证折上计算Dice相似系数 (DSC) 和交并比 (IoU) 来评估性能。
In the model selection phase, the learning rate was reduced during training, using a learning rate scheduler, which was set to lower the learning rate by a factor of ten when the IoU-score did not improve over three consecutive epochs.
在模型选择阶段,训练过程中通过使用学习率调度器逐步降低学习率,当IoU分数连续三个周期未提升时,学习率会降低十倍。
Clinical relevance and model transparency
临床相关性与模型透明度
A polyp segmentation algorithm, like the one presented in this study, could probably be used as a decision tool for end osco pi sts. To make the segmentation tool more clinically relevant, and to streamline the work of the end osco pi sts, we developed a p