[论文翻译]提升GPT-4V在医疗任务中的表现:关于提示工程策略的综合研究


原文地址:https://arxiv.org/pdf/2312.04344


ENHANCING MEDICAL TASK PERFORMANCE IN GPT-4V: A COMPREHENSIVE STUDY ON PROMPT ENGINEERING STRATEGIES

提升GPT-4V在医疗任务中的表现:关于提示工程策略的综合研究

Pengcheng Chen Shanghai AI Laboratory chen peng cheng@pjlab.org.cn

彭城 陈
上海人工智能实验室
chenpengcheng@pjlab.org.cn

Ziyan Huang Shanghai AI Laboratory huangziyan@pjlab.org.cn

Ziyan Huang 上海人工智能实验室 huangziyan@pjlab.org.cn

Zhongying Deng Shanghai AI Laboratory deng zhong ying@pjlab.org.cn

邓中英 上海人工智能实验室 dengzhongying@pjlab.org.cn

Tianbin Li Shanghai AI Laboratory litianbin@pjlab.org.cn

李添彬 上海人工智能实验室 litianbin@pjlab.org.cn

Yanzhou Su Shanghai AI Laboratory suyanzhou@pjlab.org.cn

Yanzhou Su 上海人工智能实验室 suyanzhou@pjlab.org.cn

Haoyu Wang Shanghai AI Laboratory wanghaoyu@pjlab.org.cn

王浩宇 上海人工智能实验室 wanghaoyu@pjlab.org.cn

Jin Ye Shanghai AI Laboratory yejin@pjlab.org.cn

金烨 上海人工智能实验室 yejin@pjlab.org.cn

Yu Qiao Shanghai AI Laboratory qiaoyu@pjlab.org.cn

Yu Qiao 上海人工智能实验室 qiaoyu@pjlab.org.cn

Junjun He Shanghai AI Laboratory hejunjun@pjlab.org.cn

贺俊俊 上海人工智能实验室 hejunjun@pjlab.org.cn

December 13, 2023

2023年12月13日

ABSTRACT

摘要

OpenAI’s latest large vision-language model (LVLM), GPT-4V(ision), has piqued considerable interest for its potential in medical applications. Despite its promise, recent studies and internal reviews highlight its under performance in specialized medical tasks. This paper explores the boundary of GPT-4V’s capabilities in medicine, particularly in processing complex imaging data from endo s copies, CT scans, and MRIs etc. Leveraging open-source datasets, we assessed its foundational competencies, identifying substantial areas for enhancement. Our research emphasizes prompt engineering, an often-under utilized strategy for improving AI responsiveness. Through iterative testing, we refined the model’s prompts, significantly improving its interpretative accuracy and relevance in medical imaging. From our comprehensive evaluations, we distilled 10 effective prompt engineering techniques, each fortifying GPT-4V’s medical acumen. These methodical enhancements facilitate more reliable, precise, and clinically valuable insights from GPT-4V, advancing its oper ability in critical healthcare environments. Our findings are pivotal for those employing AI in medicine, providing clear, actionable guidance on harnessing GPT-4V’s full diagnostic potential.

OpenAI 最新推出的大型视觉语言模型 (LVLM) GPT-4V(ision) 因其在医疗应用中的潜力而备受关注。尽管前景广阔,但近期研究和内部评估表明,该模型在专业医疗任务中表现欠佳。本文探讨了 GPT-4V 在医学领域的能力边界,特别是在处理内窥镜、CT 扫描和 MRI 等复杂影像数据时的表现。基于开源数据集,我们评估了其基础能力,发现了大量待改进领域。研究重点聚焦提示工程 (prompt engineering) 这一常被低估的 AI 响应优化策略。通过迭代测试,我们优化了模型的提示词,显著提升了其在医学影像解读中的准确性和相关性。经过系统评估,我们提炼出 10 项有效的提示工程技术,每项都能强化 GPT-4V 的医疗判断力。这些方法论的改进使 GPT-4V 能输出更可靠、精准且具临床价值的洞察,提升了其在关键医疗场景中的适用性。本研究成果为医疗领域 AI 应用者提供了关键指导,清晰阐述了如何充分释放 GPT-4V 的诊断潜力。

1 Introduction

1 引言

The burgeoning field of multimodal medical large language models offers promising applications for the future of healthcare and artificial intelligence (AI) research. One of the leading AI models in this domain is OpenAI’s GPT-4[1], which has recently been enhanced with a multimodal variant, GPT-4V[2], incorporating image input capabilities. This model extension draws from the formidable linguistic foundation of GPT-4 and has garnered significant attention due to its potential in processing and interpreting medical imagery.

蓬勃发展的多模态医疗大语言模型领域为未来医疗保健和人工智能 (AI) 研究提供了广阔的应用前景。该领域的领先AI模型之一是OpenAI的GPT-4[1],近期其升级版GPT-4V[2]通过引入图像输入功能实现了多模态扩展。这一模型升级基于GPT-4强大的语言基础,因其在医学影像处理与解析方面的潜力而备受关注。

Prior evaluations, such as those conducted by Chaoyi Wu et al, have provided a detailed examination of GPT-4V’s medical performance, probing the limits of its image processing abilities[3], also the performance of GPT-4V in medical examination has been tested[4, 5, 6] and reveals the potential of the usage of GPT-4V in the medical area. In the realm of MLLMs, performance is contingent on both image comprehension and textual understanding capabilities. Merely inputting an image along with basic, unrefined prompts does not fully utilize the model’s language understanding potential and only partially engages its image interpretation capabilities. To delineate the boundaries of GPT-4V’s abilities, research focused on prompt optimization is crucial. Building upon previous studies, this work concentrates on exploring the optimal combination of prompts to enable GPT-4V to operate at its peak efficiency, thereby offering novel insights for subsequent medical research utilizing GPT-4V.

先前由Chaoyi Wu等人开展的评估已对GPT-4V的医学性能进行了详细考察,探索了其图像处理能力的边界[3],同时针对GPT-4V在医学检查中的表现也进行了测试[4,5,6],揭示了GPT-4V在医疗领域的应用潜力。在多模态大语言模型(MLLM)领域,模型性能取决于图像理解与文本理解能力的双重作用。仅输入图像及基础未优化的提示词,既无法充分发挥模型的语言理解潜力,也只能部分调动其图像解析能力。为界定GPT-4V的能力边界,提示词优化的相关研究至关重要。本研究在前人工作基础上,着重探索提示词的最优组合方案,使GPT-4V能以最高效能运作,从而为后续基于GPT-4V的医学研究提供新思路。

In our investigation, we selected diverse modalities of data and experimented with various combinations of textual and visual prompts. Ultimately, we synthesized ten tips that significantly enhance GPT-4V’s performance in medical applications. Given that the focus of this study is on prompt engineering rather than an exhaustive test of medical capabilities, we did not examine an extensive array of cases. Instead, this paper will showcase results from a selected set of tests, emphasizing the impact of strategic prompt design on enhancing the model’s effectiveness in medical image analysis and interpretation.

在我们的调查中,我们选择了多种数据模态,并尝试了文本与视觉提示的不同组合。最终,我们总结出十条能显著提升GPT-4V在医疗应用中性能的技巧。鉴于本研究聚焦于提示工程而非全面的医疗能力测试,我们并未考察大量案例。相反,本文将展示一组精选测试的结果,重点阐述策略性提示设计对提升模型在医学图像分析与解读中效力的影响。

2 Methods

2 方法

2.1 Overview

2.1 概述

In this study, we utilize an image and text prompt as the input, followed by a manual assessment of the output’s quality. Our methodology involves presenting the same image with a variety of textual prompts. Each prompt’s output is then subject to a thorough human evaluation. When a particular textual prompt yields exceptionally positive results, it undergoes more in-depth testing and further evaluation in our test sets. If this prompt strategy consistently demonstrates superior performance across all tests, it is then incorporated into our list of effective prompt tips.

在本研究中,我们采用图像和文本提示作为输入,随后对输出质量进行人工评估。我们的方法包括用多种文本提示呈现同一张图像,并对每个提示的输出结果进行详尽的人工评估。当某个特定文本提示产生异常积极的结果时,它将在我们的测试集中接受更深入的测试和进一步评估。若该提示策略在所有测试中持续表现出优越性能,则会被纳入我们的有效提示技巧列表。

o optimize the testing process and ensure rigorous and systematic evaluation, we have established the following 8 ules during the test part:

为优化测试流程并确保严谨系统的评估,我们在测试环节制定了以下8项规则:

test is evaluated independently, allowing for a more accurate assessment of the model’s performance in isolated interactions.

测试是独立评估的,能够更准确地衡量模型在孤立交互中的表现。

• Rule 8 - Version Specification of GPT-4V: The version of GPT4-V used in our research spans from September 27 to October 18, 2023. It is important to note that subsequent updates to the model post this period may impact the effectiveness of certain prompt tips, potentially rendering some obsolete. This time frame is critical for contextual i zing our findings and understanding the specific capabilities of the model version under study.

• 规则8 - GPT-4V版本说明:本研究所用GPT4-V版本覆盖2023年9月27日至10月18日期间。需注意该时间节点后的模型更新可能影响部分提示技巧的有效性,甚至导致某些方法失效。此时间段对定位研究结论及理解特定版本模型能力具有关键意义。

In the manual evaluation part, we compared the outputs of GPT-4V against a set of reference answers. This process involved a scoring system wherein a match with the reference answer earned one point. Responses that were not incorrect but also not completely aligned with the reference answer neither gained nor lost points. Incorrect answers resulted in a deduction of one point. Through this scoring mechanism, we calculated the scores for each distinct prompt strategy employed during the study. The 10 prompt tips with the highest scores were then extracted. These top-scoring prompt tips form the crux of our study’s conclusions, offering valuable insights into the most effective ways of interacting with the GPT-4V model. This approach not only underscores the model’s capabilities but also highlights the nuances of optimizing AI interactions in complex tasks.

在人工评估部分,我们将GPT-4V的输出与一组参考答案进行了对比。该流程采用计分制:与参考答案完全匹配得1分;答案未出错但与参考答案不完全一致时不得分;错误答案扣1分。通过该评分机制,我们计算了研究中每种不同提示策略的得分,并提取出得分最高的10条提示技巧。这些高分提示技巧构成了本研究结论的核心,为优化GPT-4V模型交互方式提供了关键洞见。该方法不仅验证了模型能力,更揭示了复杂任务中优化AI交互的细微差异。

2.2 Testing procedure

2.2 测试流程

In the specific context of prompt testing, the experimental process of our study typically divides into two or three distinct parts:

在提示测试的具体背景下,我们的研究实验过程通常分为两到三个独立部分:

• Control Group: Under the guidelines mentioned in Section 2.1, within this group, we generally employ the simplest and most direct prompts, or sometimes only input the image without any textual prompt. Upon obtaining the corresponding outputs, these are then compared with the reference answers for scoring. • Experiment Group: Also following the rules outlined in Section 2.1, this group uses the same images as the control group for input. However, the textual prompts are optimized for this set. The outputs generated from these optimized prompts are then compared with the reference answers and scored accordingly. • Ablation Study (if necessary): In accordance with the rules specified in Section 2.1, this part of the experiment involves inputting only textual prompts without any images. The outputs are then compared with the reference answers for evaluation and scoring.

  • 对照组:按照第2.1节提到的指导原则,该组通常采用最简单直接的提示词,有时仅输入图像而不带任何文本提示。获得相应输出后,将其与参考答案进行比对评分。
  • 实验组:同样遵循第2.1节规定的规则,该组使用与对照组相同的图像输入,但文本提示词经过针对性优化。基于优化提示生成的输出会与参考答案比对并评分。
  • 消融实验(如需要):根据第2.1节的具体规则,该部分实验仅输入文本提示而不使用图像,随后将输出结果与参考答案比对评估并打分。

A typical demonstration template is shown in Figure2.2. This structured approach enables a comprehensive assessment of the model’s performance under different conditions, allowing for a clear understanding of the impact of various prompt strategies, Among all the trials, we will list out 10 tips for effective prompts.

图 2.2 展示了一个典型的演示模板。这种结构化方法能够全面评估模型在不同条件下的性能,从而清晰理解各种提示策略的影响。在所有试验中,我们将列出10条有效提示的技巧。

Figure 1: A typical example of demonstration, illustrating the basic information of a case along with the results from both control and experimental groups

图 1: 典型示例演示,展示案例基本信息及对照组与实验组的结果

2.3 Testing dataset

2.3 测试数据集

In our study, the endoscopy group in the test set utilized data from the Kvasir_ SEG [8]and M2caiseg datasets[9], alongside data from the hy st eros copy procedures we collected. For CT imaging, the AutoPET[10], Total Segment at or[11], AbdomenCT-1K[12] datasets were used, supplemented with data from relevant literature. MRI data incorporated the $B r a T S 202I[13],A T L A S V2.O$ and AMOS 2021[14] datasets. It is crucial to acknowledge that this dataset compilation is not comprehensive and does not represent all medical scenarios. Nonetheless, it provides an adequate basis for testing our prompt tips. Future endeavors will aim to expand this dataset to further enrich the study.

在我们的研究中,测试集的内窥镜组使用了来自Kvasir_ SEG [8]和M2caiseg数据集[9]的数据,以及我们收集的宫腔镜手术数据。CT成像方面采用了AutoPET[10]、TotalSegmentator[11]和AbdomenCT-1K[12]数据集,并补充了相关文献数据。MRI数据整合了$Br a T S 202I[13]$、$A T L A S V2.O$以及AMOS 2021[14]数据集。必须指出的是,该数据集汇编并不全面,也不能代表所有医疗场景。尽管如此,它为我们测试提示技巧提供了充分的基础。未来工作将致力于扩展该数据集以进一步丰富研究。

3 Prompt Tips

3 个提示词技巧

Utilizing the aforementioned testing methodology and adhering to the established rules, we have successfully identified ten valuable prompt tips:

利用上述测试方法并遵循既定规则,我们成功总结出十条实用提示技巧:

4 Discussion

4 讨论

4.1 Tip 1: Concise language is more effective than complex descriptions, emphasizing succinct, task-relevant details in image analysis.

4.1 技巧1:简洁语言比复杂描述更有效,在图像分析中应强调与任务相关的简明细节

4.1.1 Demonstration Info

4.1.1 演示信息

In this demonstration, we conducted a comparative analysis of prompt outcomes using concise language versus complex descriptions within endoscopic, CT, and MRI imaging contexts, all under the condition of providing equivalent information. The study meticulously compared the effectiveness of simplified and elaborate prompt styles on GPT-4V’s performance. The final results clearly indicated that, when the same amount of information is provided, prompts articulated in a concise manner significantly enhance GPT-4V’s functional capabilities. This finding underscores the importance of prompt language efficiency in optimizing the performance of AI models in medical imaging scenarios, suggesting that brevity and clarity in prompts can lead to more effective AI analysis and interpretation within various medical imaging modalities.

在本演示中,我们在内窥镜、CT和MRI成像场景下,以提供同等信息量为前提,对简洁语言与复杂描述的提示效果进行了对比分析。研究细致比较了简化版与详尽版提示风格对GPT-4V性能的影响。最终结果明确显示:当提供相同信息量时,采用简洁表述的提示能显著提升GPT-4V的功能表现。这一发现凸显了提示语言效率对于优化AI模型在医学影像场景性能的重要性,表明跨不同医学影像模态时,提示的简洁性与清晰度可带来更有效的AI分析与解读。

4.1.2 Analysis

4.1.2 分析

Based on the tests conducted, we observed that within endoscopic scenarios, simplified descriptions yielded better outcomes compared to complex descriptions. In contrast, within CT and MRI contexts, the performance of simplified descriptions was comparable to that of complex descriptions, both showing significant improvements over control groups with no or weak descriptions.

根据测试结果,我们观察到在内窥镜场景中,简化描述比复杂描述效果更佳。相比之下,在CT和MRI场景中,简化描述与复杂描述的表现相当,两者均比无描述或弱描述的对照组有显著提升。

In this regard, we posit that the superior performance of simplified descriptions can be attributed to their provision of only the most critical information. Complex descriptions, conversely, offer an abundance of auxiliary information. During the generation of textual content, the weight of these auxiliary descriptive elements is also accounted for, which consequently diminishes the relative importance of core information. This can lead to slight deviations in results, potentially resulting in inaccuracies. Therefore, when inputting prompts into GPT-4V, it is advisable to include only the most relevant core information pertaining to the query at hand, minimizing the incorporation of extensive, indirectly related descriptive language.

在这方面,我们认为简化描述之所以表现更优,是因为它们仅提供了最核心的信息。相反,复杂描述会附带大量辅助信息。在生成文本内容时,这些辅助性描述元素的权重也会被纳入考量,从而相对降低了核心信息的重要性。这可能导致结果出现细微偏差,甚至造成不准确的情况。因此,在向GPT-4V输入提示时,建议仅包含与当前查询最相关的核心信息,尽量减少引入大量间接相关的描述性语言。

4.2 Tip 2: Providing the tasks can better help analyze the images

4.2 技巧2:提供任务能更好地辅助分析图像

4.2.1 Demonstration Info

4.2.1 演示信息

In this demonstration, we illustrate how the task of providing medical images enhances the model’s ability to analyze input medical images. We focus particularly on the identification of surgical instruments in single endoscopic images and the analysis of endoscopic sequences. An ablation study was incorporated into this demonstration to ensure that the textual prompts provided did not leak the answers. This approach underscores the effectiveness of combining visual and textual inputs in improving the model’s performance in medical image analysis, while also ensuring the integrity of the results through rigorous testing methods.

在本演示中,我们展示了提供医学图像的任务如何增强模型分析输入医学图像的能力。我们特别关注单张内窥镜图像中手术器械的识别和内窥镜序列的分析。本次演示引入了消融研究 (ablation study) ,以确保提供的文本提示不会泄露答案。该方法通过结合视觉和文本输入,有效提升了模型在医学图像分析中的性能,同时通过严格的测试方法确保结果的可靠性。

4.2.2 Analysis

4.2.2 分析

In this test, we found that specifying the type of task significantly enhances the recognition capabilities of GPT-4V. We believe that medical images, particularly endoscopic images, are challenging to differentiate based on a single image due to the limitations of the endoscope’s field of view (FOV) and the similarity of most endoscopic scenes. This makes it difficult to identify the specific task of the endoscope, thereby complicating the analysis of the problem. After specifying the particular task of the endoscope, the probability distribution of the output becomes more focused on the answer, thereby increasing the accuracy of the response.

在本测试中,我们发现明确任务类型能显著提升 GPT-4V 的识别能力。我们认为由于内窥镜视野 (FOV) 的局限性和大多数内窥镜场景的相似性,基于单张图像区分医学图像(尤其是内窥镜图像)具有挑战性。这使得识别内窥镜的具体任务变得困难,从而增加了问题分析的复杂度。在明确内窥镜的具体任务后,输出的概率分布会更集中于答案,从而提高了响应的准确性。

To further validate the effectiveness of our prompt strategy, we conducted an ablation study. In the ablation study, we did not input images and merely inputted prompts telling GPT-4V to pretend there was an image, then asked GPT-4V the same questions. If GPT-4V could still provide the correct answers under these circumstances, it would indicate that we had leaked answer-related information in our prompts. If it could not respond correctly, it would suggest that we did not disclose any answer-related information in our prompts.

为了进一步验证我们的提示策略的有效性,我们进行了消融实验。在消融实验中,我们没有输入图像,仅输入提示让GPT-4V假装有一张图像,然后向GPT-4V提出相同的问题。如果GPT-4V在这种情况下仍能提供正确答案,则表明我们在提示中泄露了与答案相关的信息。如果它无法正确响应,则表明我们在提示中未透露任何与答案相关的信息。

We discovered that with pure text input, GPT-4V could not correctly answer our questions or provide analysis, indicating that our prompts did not leak additional information. This means that the correct answers were due to our prompt guiding the image analysis rather than the prompt leaking key answer-related information.

我们发现,在纯文本输入的情况下,GPT-4V无法正确回答问题或提供分析,这表明我们的提示词没有泄露额外信息。这意味着正确答案是由于提示词引导了图像分析,而非提示词泄露了与答案相关的关键信息。

In contrast, we presented Case 4, a prompt we believe leaked some answer-related information. Compared to Case 3, we included a key piece of information, "ring-shape," which could lead GPT-4V to produce the correct answer without an image.

相比之下,我们展示了案例4,这是一个我们认为泄露了部分答案相关信息的提示。与案例3相比,我们加入了一个关键信息"环形",这可能导致GPT-4V在没有图像的情况下生成正确答案。

In summary, we believe that the conclusion that GPT-4V’s performance can be further enhanced by providing the task is solid and valid.

总之,我们相信通过明确任务要求可以进一步提升GPT-4V性能的结论是坚实有效的。

4.3 Tip 3: Implementing step-by-step guidance in multi-round dialogue allows GPT-4v to handle complex tasks more efficiently by breaking them down into simpler operations.

4.3 技巧3:在多轮对话中实施分步指导,通过将复杂任务拆解为简单操作,使GPT-4v更高效地处理复杂任务。

4.3.1 Demonstration Info

4.3.1 演示信息

In this demonstration, we showcased how, through step-by-step prompt tips, efficient matching of medical images with their corresponding masks can be achieved. It is important to emphasize that while this task may not hold substantial clinical significance, we selected it for demonstration due to its high level of difficulty. This task demands the utmost utilization of the multimodal model’s image understanding capabilities, thereby exemplifying the value of prompt tips.

在本演示中,我们展示了如何通过逐步提示技巧,实现医学图像与对应掩膜的高效匹配。需要特别强调的是,虽然该任务可能不具备重大临床意义,但我们选择它进行演示是因为其极高的难度水平。该任务需要最大限度发挥多模态模型的图像理解能力,从而彰显提示技巧的价值。

We also presented comparisons across three different modalities: endoscopy, CT, and MRI. These comparisons were based on scenarios where prompt tips were not used, where step-by-step guidance was provided in a single-round dialogue, and where step-by-step guidance was facilitated through multi-round dialogues. Our findings indicate that guiding GPT through multi-round dialogues with step-by-step instructions yielded the most effective results. This highlights the potential of structured, iterative interactions in harnessing the full capabilities of advanced AI models in complex image analysis tasks.

我们还对三种不同模态(内窥镜、CT和MRI)进行了比较。这些比较基于以下场景:未使用提示技巧、在单轮对话中提供逐步指导,以及通过多轮对话实现逐步引导。研究发现,通过多轮对话结合分步指导来引导GPT能产生最佳效果。这凸显了结构化迭代交互在复杂图像分析任务中充分发挥先进AI模型潜力的优势。

4.3.2 Analysis

4.3.2 分析

In our study, the tests conducted on endoscopic imagery revealed certain limitations in GPT-4V’s processing capabilities. Specifically, when tasked with identifying which mask correlates with a polyp in a colon image, GPT-4V consistently errs towards selecting the second mask. This error appears to stem from its misinterpretation of the vertical alignment of the real colon os copy images and their corresponding masks, leading to a mistaken assumption of correlation where none exists.

在我们的研究中,针对内窥镜图像的测试揭示了GPT-4V处理能力的某些局限性。具体而言,当要求识别结肠图像中哪个掩膜与息肉相关时,GPT-4V始终倾向于选择第二个掩膜。这一错误似乎源于其对真实结肠开口副本图像及其对应掩膜垂直排列的误判,导致在不存在关联的情况下错误假设了相关性。

Nevertheless, we observed that a gradual, step-by-step guidance approach significantly enhances GPT-4V’s performance in handling complex tasks. However, this enhancement is not as pronounced when all step-by-step instructions are consolidated into a single-round dialogue. We believe this finding aligns with the conclusion drawn in Tip 1, which emphasizes that prompts should not be excessively lengthy or complex. In the process of guiding GPT-4V, the inclusion of an abundance of descriptive and medically irrelevant terminology adversely affected its performance, likely due to the undue weight these words carried in the model’s processing.

然而,我们观察到渐进式分步引导能显著提升GPT-4V处理复杂任务的表现。但当所有分步指令被整合至单轮对话时,这种提升效果并不明显。该发现与技巧1的结论相吻合,即提示词(prompt)不应过度冗长或复杂。在引导GPT-4V过程中,引入大量描述性且与医学无关的术语会对其表现产生负面影响,这可能是由于这些词汇在模型处理过程中被赋予了不当权重。

Conversely, when we divided the step-by-step guidance across multiple rounds of dialogue, we noticed a marked improvement in GPT-4V’s performance. This approach enabled it to accomplish challenging tasks, like accurate mask matching, with a high degree of precision—a feat not yet achievable by other large multimodal models. Such a prompting strategy effectively leverages GPT-4V’s powerful capabilities.

相反,当我们将分步指导分散到多轮对话中时,我们注意到 GPT-4V 的表现有显著提升。这种方法使其能够以极高的精度完成具有挑战性的任务,例如精确的掩码匹配 (mask matching) ,这是其他大型多模态模型尚未实现的壮举。这种提示策略有效地利用了 GPT-4V 的强大能力。

Similarly, we observed a comparable effect in both the CT and MRI groups. Without any specifically designed prompts, GPT-4V failed to correctly identify the tasks at hand. However, with the incorporation of step-by-step guidance, there was a significant enhancement in performance, with multi-round dialogues demonstrating even more pronounced improvements. This evidence suggests that a structured, interactive approach is crucial for maximizing the potential of advanced multimodal AI models in complex medical imaging tasks.

同样地,我们在CT和MRI组中都观察到了类似的效果。在没有专门设计提示词的情况下,GPT-4V未能正确识别当前任务。但通过引入分步指导后,模型性能得到显著提升,其中多轮对话的改进尤为明显。这一证据表明,结构化交互方式对于发挥先进多模态AI模型在复杂医学影像任务中的潜力至关重要。

4.4 Tip 4: Don’t expose your target at the very beginning when you start multi-round dialog

4.4 技巧4: 开始多轮对话时不要过早暴露目标

4.4.1 Demonstration Info

4.4.1 演示信息

In the previous demonstration, we showcased how step-by-step guidance within a multi-turn dialogue can effectively accomplish challenging tasks. In this demonstration, we aim to present a strategy that may potentially weaken the effectiveness of prompts: prematurely revealing the objective. We will use the same three cases and almost identical prompts as in the previous demonstration. The key difference, however, lies in our approach: we will disclose our objective – matching masks with medical images to GPT-4V right from the start.

在之前的演示中,我们展示了多轮对话中逐步引导如何有效完成复杂任务。本次演示将展示一种可能弱化提示效果的策略:过早暴露目标。我们将沿用之前演示的三个案例和几乎相同的提示语,关键区别在于操作方式:一开始就向GPT-4V明示目标——将掩膜与医学图像进行匹配。

This setup allows us to critically evaluate the impact of early goal exposure on the performance of GPT-4V in completing the task. By comparing the outcomes of this approach with those from the previous demonstration, where the objective was gradually introduced, we aim to discern the influence of prompt structure and information sequencing on the model’s efficiency and accuracy in task execution. This investigation is pivotal in understanding the nuances of effective prompt design, particularly in complex problem-solving scenarios involving AI models.

此设置使我们能够批判性地评估早期目标暴露对GPT-4V任务完成表现的影响。通过将该方法与先前逐步引入目标的演示结果进行对比,我们旨在辨别提示结构和信息排序对模型任务执行效率与准确性的影响。这项研究对于理解有效提示设计的细微差别至关重要,尤其是在涉及AI模型的复杂问题解决场景中。

4.4.2 Analysis

4.4.2 分析

In this demonstration, a notable difference from the previous one was observed when the objective was prematurely revealed. Despite employing our previously high-accuracy prompt strategies, the accuracy remained markedly low, comparable to the performance achieved with unrefined prompts.

本次演示中,当目标被提前揭示时,观察到了与前次实验的显著差异。尽管采用了先前高准确率的提示策略,准确率仍明显偏低,与使用未优化提示时的性能相当。

Our observation led to the discovery that this decrease in performance is attributable to the "short-term memory" of GPT-4V. To elaborate, if the objective is revealed too early, GPT-4V tends to immediately generate an answer based on the exposed information. The quality of this answer is similar to that produced by unrefined prompts, which is to say, it is of lower quality. Further, if prompts are inputted after receiving this initial answer, in an attempt to guide GPT-4V, the subsequent outputs tend to be erroneous, aligning with the previously derived incorrect answer. The strong short-term memory of GPT-4V means that if the initial output is not manually identified as incorrect, it carries this erroneous prior into subsequent responses, thus affecting the quality of the answers. Therefore, when using GPT-4V with the intention of achieving a specific outcome through multi-turn dialogue, it is advisable not to disclose the task objective at the onset.

我们的观察发现,这种性能下降可归因于GPT-4V的"短期记忆"。具体而言,若过早揭示任务目标,GPT-4V会立即基于暴露信息生成答案,其质量与未经优化的提示所产生的答案相似,即质量较低。更重要的是,若在获得初始答案后继续输入提示以引导GPT-4V,后续输出往往会与先前得出的错误答案保持一致。GPT-4V强大的短期记忆意味着,若未手动识别初始输出的错误,系统会将这种错误先验带入后续响应,从而影响答案质量。因此,当希望通过多轮对话实现特定目标时,建议不要在一开始就透露任务目标。

4.5 Tip 5: Describing appearances or characteristics will greatly enhance the performance.

4.5 技巧5: 描述外观或特征将显著提升性能

4.5.1 Demonstration Info

4.5.1 演示信息

In this demonstration, we illustrated that providing descriptions of appearance or characteristics significantly enhances the model’s performance. For the descriptions of appearance or characteristics, we adopted definitions from Wikipedia in this showcase. Given that this prompt tip involves detailed descriptions of appearance and characteristics, it is most effectively applied in scenarios where there is a need for batch processing of data that is familiar to the user.

在本演示中,我们展示了提供外观或特征描述能显著提升模型性能。对于外观或特征的描述,本案例采用了维基百科的定义。由于这一提示技巧涉及外观与特征的详细描述,它最适用于用户熟悉且需要批量处理数据的场景。

4.5.2 Analysis

4.5.2 分析

In this study, we discovered that providing specific descriptions of shapes and appearances significantly enhances GPT-4V’s ability to recognize certain structures. In the control group, where no descriptions were provided, the performance was suboptimal. Specifically, within the endoscopy group, organs like the liver, fat, and intestines were difficult to identify without prompts, with only the gallbladder being relatively easier to recognize. In CT and MRI imaging, GPT-4V was able to identify easily recognizable organs like the liver without additional prompts, whereas identifying the gallbladder and pancreas proved to be more challenging.

在本研究中,我们发现提供形状和外观的具体描述能显著提升GPT-4V识别某些结构的能力。对照组未提供任何描述时,其表现欠佳。具体而言,在内窥镜组中,肝脏、脂肪和肠道等器官若无提示则难以识别,仅胆囊相对较易辨认。在CT和MRI成像中,GPT-4V无需额外提示即可识别肝脏等易辨器官,而胆囊和胰腺的识别则更具挑战性。

Subsequently, we utilized organ-specific prompts based on appearance descriptions from Wikipedia and Mayo Clinic, among others. It is crucial to note that GPT-4V may output descriptions of the images in the order of the input prompts. If the order of descriptions inputted matches the order of annotations in the image, this might result in correct outputs. However, these outputs could be derived from the order of the input descriptions rather than an understanding of the image. To mitigate this, we intentionally scrambled the order of our descriptions to prevent answer leakage due to input sequencing. We observed a substantial increase in recognition accuracy upon providing appearance descriptions. In the endoscopy group, apart from the intestines which remained challenging, other structures were identified with high accuracy. In both the CT and MRI groups, complete and accurate identification was achieved, including small structures like the gallbladder in MRI images.

随后,我们基于维基百科和梅奥诊所等来源的外观描述,使用了器官特定的提示词。需要特别注意的是,GPT-4V可能会按照输入提示的顺序输出图像描述。如果输入的描述顺序与图像中标注的顺序一致,可能会导致看似正确的输出结果。但这些输出可能仅源于输入描述的顺序,而非对图像的真实理解。为消除这种干扰,我们有意打乱了描述顺序,以避免因输入序列导致的答案泄露。实验表明,提供外观描述后识别准确率显著提升:在内窥镜组中,除肠道仍具挑战性外,其他结构均能高精度识别;在CT和MRI组中,包括MRI图像中胆囊等微小结构在内的所有器官均实现了完整准确的识别。

Furthermore, we also observed that when dealing with markers of different colors, it is advisable to avoid using color-related descriptions. Such descriptions can lead to confusion in GPT-4V’s processing. We will delve into this phenomenon in greater detail in the following chapter, examining the implications of color descriptions on the model’s interpretive accuracy and how they might inadvertently mislead the AI’s analysis.

此外,我们还观察到,在处理不同颜色的标记时,应避免使用与颜色相关的描述。这类描述可能导致 GPT-4V 处理时产生混淆。我们将在下一章更详细地探讨这一现象,分析颜色描述对模型解释准确性的影响,以及它们如何可能无意中误导 AI 的分析。

4.6 Tip 6: Appearance descriptors should be non-conflicting with any image annotations to avoid misinterpretations.

4.6 技巧6: 外观描述符不应与任何图像标注冲突,以避免误解。

4.6.1 Demonstration Info

4.6.1 演示信息

In the previous chapter, we discussed the impact of appearance descriptions on the performance of GPT-4V, noting a significant enhancement in its capabilities. Given that appearance descriptions often involve color, especially in endoscopic scenarios, it is pertinent to explore the outcomes when markers are also denoted by colors. In this round of demonstrations, we excluded data from CT and MRI modalities, as these primarily involve grayscale imaging and seldom incorporate colors beyond black and white. This focus allows us to specifically assess the influence of color descriptions in contexts where color plays a critical role, providing insights into the nuances of prompt design for color-rich medical imaging scenarios

在上一章中,我们讨论了外观描述对GPT-4V性能的影响,注意到其能力有显著提升。鉴于外观描述通常涉及颜色(尤其在 endoscopic 场景中),探索用颜色标注标记时的效果显得尤为重要。本轮演示中,我们排除了CT和MRI模态的数据,因为这些成像方式主要采用灰度图像,极少使用黑白以外的颜色。通过聚焦于色彩起关键作用的场景,我们能更精准地评估颜色描述的影响,从而为色彩丰富的医学影像场景提供提示设计方面的细微洞察。

4.6.2 Analysis

4.6.2 分析

In this chapter, we discovered that using color descriptors in scenarios where markers are also color-coded can significantly impair performance. In this showcase, the results were even worse than those in the control group (Section 4.5.1). Notably, all instances of yellow dots were incorrectly identified as fat. This misidentification likely stems from GPT-4V interpreting the yellow dots as part of the scene rather than as markers. When we described fat as "generally pale yellow," GPT-4V mistakenly classified the yellow dots as fat. This observation underscores the importance of careful consideration in using color descriptors, particularly in contexts where color plays a dual role in both the scene and the markers.

在本章中,我们发现当标记物也采用颜色编码时,使用颜色描述符会显著影响性能。本次展示的结果甚至比对照组更差(见第4.5.1节)。值得注意的是,所有黄色圆点都被错误识别为脂肪。这种误判可能源于GPT-4V将黄色圆点视为场景的一部分而非标记物。当我们描述脂肪为"通常呈淡黄色"时,GPT-4V错误地将黄色圆点归类为脂肪。这一发现凸显了在使用颜色描述符时需要格外谨慎,特别是在颜色同时承担场景元素和标记物双重功能的场景中。

4.7 Tip 7: Clarifying the contextual relationships between sequential images enhances GPT-4v’s analytical accuracy and detail recognition.

4.7 技巧7: 明确连续图像间的上下文关系可提升GPT-4v的分析准确性与细节识别能力

4.7.1 Demonstration Info

4.7.1 演示信息

This demonstration primarily focused on the processing of endoscopic sequences. We composed a single image by combining four pictures from a hy st eros copy to assess GPT-4V’s capability in handling endoscopic sequences. The test was divided into two groups: one without inputting the image sequence numbers6.7.1 and another with the sequence numbers included 6.7.1. Our findings indicate that image analysis is more effective when the sequence order of the images is provided. This result highlights the importance of sequential context in enhancing the model’s comprehension and accuracy in processing related images in a series.

本次演示主要针对内窥镜序列的处理。我们通过组合宫腔镜检查中的四张图片合成单幅图像,以评估GPT-4V处理内窥镜序列的能力。测试分为两组:一组未输入图像序列编号6.7.1,另一组则包含序列编号6.7.1。研究结果表明,当提供图像顺序信息时,分析效果更佳。这一发现凸显了序列上下文对于提升模型理解能力和处理系列相关图像准确性的重要性。

4.7.2 Analysis

4.7.2 分析

Our findings in this chapter reveal that providing the sequential order of images significantly enhances their analysis. This approach is beneficial for processing video files as well. The underlying effectiveness of this strategy appears to stem from how GPT-4V perceives each image. Without sequence information, GPT-4V tends to treat each image as a discrete entity, leading to independent analysis with limited contextual understanding.

本章研究发现,提供图像序列顺序能显著提升其分析效果。这一方法同样适用于视频文件处理。该策略的有效性根源在于GPT-4V对每帧图像的感知机制:缺失序列信息时,GPT-4V倾向于将每幅图像视为独立实体,导致分析过程缺乏上下文关联。

However, when the sequential order is specified, or when the interrelationship among the images is clarified, GPT-4V is more inclined to view the images as interconnected elements of a larger narrative. Consequently, it synthesizes a more comprehensive workflow, encompassing the entire sequence from the first to the last image. This methodology proves especially useful in the analysis of surgical procedures in videos or image sequences, allowing for a more integrated and coherent interpretation of the visual data.

然而,当指定了顺序或明确了图像间的关联性时,GPT-4V更倾向于将这些图像视为一个更大叙事中相互关联的元素。因此,它会综合出一个更全面的工作流程,涵盖从第一张到最后一张图像的整个序列。这种方法在分析视频或图像序列中的手术步骤时尤为有效,能够对视觉数据进行更整合、连贯的解读。

4.8 Tip 8: Splicing multiple images into one and providing a sequence order enhances task processing effectiveness compared to multiple simultaneous inputs.

4.8 技巧8:将多张图像拼接为单图并标注顺序,相比同时输入多张图像更能提升任务处理效果

4.8.1 Demonstration Info

4.8.1 演示信息

In the previous demonstration, we presented the results of splicing multiple images into one(6.7.1 and 6.7.1). GPT-4V, however, possesses a built-in upload feature that can be used for analyzing image sequences. In this demonstration, we will compare the outcomes of using GPT-4V’s native upload feature versus splicing multiple images together. It is important to note that, as of the September 27 to December 12, 2023 version, this upload system can handle a maximum of four images at a time. Additionally, images uploaded through this system are automatically displayed in varying sizes on the webpage, although their actual upload dimensions remain unchanged, as shown in Figure 6.8.1. This comparison aims to highlight the differences in results obtained through these two methods of image presentation and processing.

在之前的演示中,我们展示了将多张图像拼接成一张的结果(6.7.1和6.7.1)。然而,GPT-4V具备内置的上传功能,可用于分析图像序列。本演示将对比使用GPT-4V原生上传功能与拼接多张图像的效果差异。需要注意的是,截至2023年9月27日至12月12日的版本,该上传系统单次最多只能处理四张图像。此外,通过该系统上传的图像会在网页上自动以不同尺寸显示,但其实际上传尺寸保持不变,如图6.8.1所示。本次对比旨在凸显这两种图像呈现和处理方式所获结果的差异。

4.8.2 Analysis

4.8.2 分析

In our prior showcase, we created a composite image by splicing together multiple individual images. However, in this subsequent test, we employed GPT4-v’s inherent image uploading feature to upload four images simultaneously, which is the system’s maximum capacity, to analyze a surgical procedure depicted across these images. The results of this test indicate that the direct upload of multiple images through GPT4-v’s system tends to be less effective in the analysis compared to utilizing a single, spliced composite image. This reduced effectiveness might be due to the current version of GPT4-v not being fully optimized for multi-image analysis. While GPT4-v shows remarkable proficiency in processing single images, it seems there is a need for additional development to enhance its capability to handle and interpreting multiple images concurrently.

在我们之前的展示中,通过拼接多张独立图像创建了一张合成图。但在后续测试中,我们使用GPT4-v内置的多图上传功能同时上传了四张图片(这是系统上限),以分析这些图片中描绘的手术流程。测试结果表明,与使用单张合成图相比,通过GPT4-v系统直接上传多张图片的分析效果较差。这种效能下降可能是由于当前版本的GPT4-v尚未针对多图分析进行充分优化。尽管GPT4-v在处理单张图像时表现出色,但仍需进一步开发来提升其同步处理和解析多张图像的能力。

4.9 Tip 9: Providing comparative analysis opportunities, especially with temporal patient data, deepens lesion or condition assessments.

4.9 技巧9:提供对比分析机会(尤其是时序患者数据)可深化病灶或病情评估。

4.9.1 Demonstration Info

4.9.1 演示信息

In this demonstration, we highlighted the significant enhancement in GPT-4V’s performance facilitated by the provision of comparative information. This comparison could involve contrasting healthy and abnormal states, or it might pertain to changes in pathological lesions. Such juxtapositions provide GPT-4V with a richer context, enabling a more accurate and nuanced understanding of the medical imagery. By presenting contrasting scenarios or changes over time, GPT-4V can better discern subtle differences, leading to improved diagnostic accuracy. This approach underscores the importance of contextual information in medical image analysis, particularly in complex cases where changes are subtle or involve gradual progression.

在本演示中,我们重点展示了通过提供对比信息对GPT-4V性能的显著提升。这种对比可能涉及健康与异常状态的对照,也可能与病理病变的变化相关。此类并置为GPT-4V提供了更丰富的上下文,使其能更准确细致地理解医学影像。通过呈现对比场景或时间维度的变化,GPT-4V能更好地区分细微差异,从而提高诊断准确性。该方法凸显了上下文信息在医学影像分析中的重要性,尤其适用于变化细微或呈渐进发展的复杂病例。

4.9.2 Analysis

4.9.2 分析

In this test, we observed that providing GPT-4v with comparative images from before and after significantly improved the accuracy of diagnoses. This is intuitively reasonable since the nature of many conditions, such as the growth of pulmonary nodules, can be challenging to assess from a single, final-stage image. However, when presented with a series of images depicting the nodule over several years, GPT-4v is able to analyze the growth pattern of the nodule. Consequently, this enables a more accurate assessment of the patient’s condition.

在此测试中,我们观察到向GPT-4v提供前后对比图像能显著提升诊断准确率。这从直觉上是合理的,因为许多病症(如肺结节生长)的性质仅通过单张终期影像难以评估。但当输入展现结节多年变化的系列图像时,GPT-4v能分析结节的生长模式,从而更精准地评估患者病情。

Additionally, in diagnosing lung nodules, we opted not to mark incorrect responses, focusing instead on correctly answered parts. This decision was based on the complexity inherent in diagnosing diseases solely from the growth patterns of nodules. We acknowledge that all responses provided by the model possess a certain degree of validity under these circumstances. However, it was observed that when images showing comparative growth were presented, all responses accurately identified a de no carcinoma as the correct diagnosis in the first answer. This finding suggests that providing contextual visual information significantly enhances the model’s diagnostic accuracy, particularly in complex cases like lung nodule analysis.

此外,在诊断肺结节时,我们选择不标注错误回答,而是聚焦于正确应答部分。这一决定基于仅凭结节生长模式诊断疾病所固有的复杂性。我们承认在此情境下,模型提供的所有回答均具有一定有效性。但研究发现,当展示对比生长图像时,所有回答均在首次诊断中准确识别出新生癌变。这表明提供上下文视觉信息能显著提升模型诊断准确率,尤其在肺结节分析等复杂病例中。

4.10 Tip 10: Directing GPT-4V’s focus to the interested areas will facilitate more targeted and relevant output.

4.10 技巧 10: 将 GPT-4V 的注意力引导至目标区域可生成更具针对性的输出

4.10.1 Demonstration Info

4.10.1 演示信息

In this demonstration, we examined the impact of providing spatial orientation information on GPT-4V’s recognition capabilities. We observed that when directing GPT-4V’s focus to the interested areas, GPT-4V’s responses were notably more coherent and accurate. This finding suggests that spatial information significantly aids the AI in contextual i zing and interpreting medical images. This prompt tip is particularly useful in cases where the approximate location of a lesion is known, but a detailed analysis and diagnosis are required using GPT-4V. By integrating location-based cues into the prompts, GPT-4V can focus its analysis more effectively, leading to more precise diagnostic outcomes. This approach highlights the importance of spatial context in enhancing AI-assisted medical image diagnosis.

在此演示中,我们研究了提供空间方位信息对GPT-4V识别能力的影响。我们发现,当引导GPT-4V关注目标区域时,其响应明显更具连贯性和准确性。这一发现表明,空间信息能有效帮助AI对医学图像进行情境化解读。该提示技巧特别适用于已知病灶大致位置、但需通过GPT-4V进行详细分析和诊断的场景。通过将基于位置的线索整合到提示中,GPT-4V能更高效地聚焦分析,从而获得更精确的诊断结果。该方法凸显了空间上下文在增强AI辅助医学图像诊断中的重要性。

4.10.2 Analysis

4.10.2 分析

In the experiments conducted in this chapter, we found that providing specific spatial orientation significantly enhances the accuracy of GPT-4V’s responses. In the MRI experiments, without specified location guidance, GPT-4V tended to diagnose images as normal, even when there were clear abnormalities present. Once directed to focus on the "upper left quadrant," GPT-4V promptly identified the anomalies in that region.

在本章的实验中,我们发现提供具体的空间方位能显著提升GPT-4V的回答准确率。在MRI实验中,若未指定位置引导,GPT-4V倾向于将图像诊断为正常,即便存在明显异常。一旦指示其关注"左上象限",GPT-4V便能迅速识别该区域的异常。

In the PET scan tests, without a defined area of observation, GPT-4V erroneously identified regions with naturally high metabolic activity, such as the brain, heart, and bladder, as potential pathological sites, leading to an incorrect diagnosis of possible metastatic cancer. However, when GPT-4V was instructed to concentrate on the thoracic region, it successfully differentiated between cardiac activity and lung cancer, providing a more accurate diagnosis. These findings highlight the importance of directional cues in assisting GPT-4V to accurately interpret medical imaging, emphasizing the need for precise location information in AI-assisted diagnostic processes.

在PET扫描测试中,由于未定义观察区域,GPT-4V错误地将大脑、心脏和膀胱等天然高代谢活动区域识别为潜在病理部位,导致误诊为可能存在的转移性癌症。但当指示GPT-4V专注于胸部区域时,它成功区分了心脏活动与肺癌,提供了更准确的诊断。这些发现凸显了方向性提示对协助GPT-4V准确解读医学影像的重要性,同时强调了在AI辅助诊断过程中提供精确位置信息的必要性。

5 Summary

5 总结

In this work, we identified ten valuable prompt tips that significantly enhance GPT-4V’s utility in medical contexts through extensive testing across various datasets. These prompts led to marked performance improvements in all tasks, demonstrating the power of well-designed prompts in enhancing AI analysis in medical imaging. Future research will broaden the case spectrum and employ these prompt techniques for bulk data processing using APIs. Additionally, we plan to explore the effectiveness of these prompts in non-medical, natural image settings. This exploration aims to understand the adaptability of these strategies across different image analysis scenarios. Our findings not only contribute to medical image analysis using AI models but also open new avenues for applying AI-driven strategies in a wider range of imaging applications.

在这项工作中,我们通过跨数据集的广泛测试,确定了十条能显著提升GPT-4V在医疗场景效用的提示词技巧。这些提示词使所有任务性能均获得显著提升,证明了精心设计的提示词对增强医学影像AI分析能力的作用。未来研究将扩大病例范围,并运用这些提示技术通过API进行批量数据处理。此外,我们计划探究这些提示词在非医疗自然图像场景中的有效性,以评估这些策略在不同图像分析情境中的适应能力。本研究成果不仅推动了AI模型在医学影像分析中的应用,更为AI驱动策略在更广泛成像领域的应用开辟了新路径。

References

参考文献

6 Case demonstration

6 案例演示

In our demonstration, each scenario is illustrated through comprehensive visuals. "Prompt" refers to specific textual inputs presented to the users. The "GPT-4V" section captures the responses elicited by the GPT-4V model. It is important to note that, due to predetermined safeguard measures, GPT-4V consistently asserts its non-expertise in specialized domains such as radiology. For clarity and succinctness, we have omitted these routine disclaimers from our presentation of results. "Reference answer" represents the standard responses or interpretations, as informed by expert sources like Radio pae dia.

在我们的演示中,每个场景都通过全面的视觉材料进行展示。"Prompt"指呈现给用户的具体文本输入。"GPT-4V"部分记录了GPT-4V模型生成的响应。需要注意的是,由于预设的安全防护措施,GPT-4V在放射学等专业领域始终声明自身不具备专业知识。为保持清晰简洁,我们在结果展示中省略了这些常规免责声明。"Reference answer"代表基于Radio pae dia等专业信息来源的标准答案或解读。

For a clearer interpretation of our findings, we implemented a color-coding strategy:

为了更清晰地解读我们的发现,我们采用了颜色编码策略:

6.1 Tip 1: Concise language is more effective than complex descriptions, emphasizing succinct, task-relevant details in image analysis.

6.1 技巧1:简洁语言比复杂描述更有效,在图像分析中应强调与任务相关的简明细节。

6.1.1 Case 1: Endoscopic image annotation recognition

6.1.1 案例1: 内窥镜图像标注识别

CASE 1

案例 1

• Picture modal: Endoscopic picture • Tasks: La paros co pic c hole cyst ec to my • Reference Answer: – Yellow Dot: Liver. – Red Dot: Surgical grasper. – Blue Dot: Gallbladder. – Green Dot: Fats. – Purple Dot: Intestine.

• 图片模态:内窥镜图像
• 任务:腹腔镜胆囊切除术
• 参考答案:

  • 黄点:肝脏。
  • 红点:手术抓钳。
  • 蓝点:胆囊。
  • 绿点:脂肪组织。
  • 紫点:肠道。

CASE 1 Control Group

案例1 控制组

Prompt: Tell me what instrument or tissue each color represents.

提示:告诉我每种颜色代表什么乐器或组织。

GPT-4V Response:

GPT-4V 响应:

• response 1

• response 1

– Yellow Dot: This looks like it could be the liver tissue. The liver is a large organ that sits right above the gallbladder, and its edge can be seen during gallbladder surgeries.

  • 黄点: 这看起来可能是肝组织。肝脏是位于胆囊正上方的大型器官,在胆囊手术中可以看到其边缘。

– Red Dot: This instrument is a clip applicator or a clip holder. It’s used to apply clips to blood vessels or ducts to ensure there is no bleeding or leakage after they are cut.

  • 红点: 该器械是夹子施放器或夹子固定器,用于将夹子施加在血管或导管上,确保切断后不会出血或渗漏。

– Green Dot: This structure seems to be the cystic duct, which connects the gallbladder to the common bile duct. It’s a key structure that surgeons need to identify and clip or ligate during the procedure.

  • 绿点: 该结构似乎是胆囊管 (cystic duct) ,它将胆囊与胆总管相连。这是外科医生在手术过程中需要识别并夹闭或结扎的关键结构。

– Purple Dot: The structure at the location of the purple dot is not entirely clear from the perspective of the image, but it might be a part of the omentum or another adipose tissue. The omentum is a fatty apron-like structure that hangs down from the stomach and covers some of the organs in the abdomen.

紫点:从图像视角来看,紫色标记点的结构并不完全清晰,可能是网膜或其他脂肪组织的一部分。网膜是悬挂于胃部下方的脂肪围裙状结构,覆盖着腹腔内的部分器官。

response 2

响应2

response 3

响应3

CASE 1 Complex Descriptions

案例1:复杂描述

Prompt: This is a photo of a la paros co pic gallbladder removal surgery. The gallbladder is connected to the liver, and the liver is usually around with fat. In the picture, there are five different colors marking different tissues or instruments. Based on what can be seen under the la paros co pic view and the appearance of the organs, please tell me what each color represents.

提示:这是一张腹腔镜胆囊切除手术的照片。胆囊与肝脏相连,而肝脏周围通常有脂肪组织。图中用五种不同颜色标记了不同的组织或器械。根据腹腔镜视图下可见的器官外观,请告诉我每种颜色代表什么。

GPT-4V Response:

GPT-4V 响应:

response 1

响应 1

– Blue Dot: This appears to represent the cystic duct. The cystic duct connects the gallbladder to the common bile duct and is a pathway for bile to flow in and out of the gallbladder.

  • 蓝点: 这似乎代表胆囊管。胆囊管连接胆囊与胆总管,是胆汁进出胆囊的通道。

Green dot: This likely represents the gallbladder. The gallbladder is a small, pearshaped organ that stores bile produced by the liver.

绿点: 这很可能代表胆囊。胆囊是一个储存肝脏产生胆汁的小型梨形器官。

– Red dot: This seems to indicate a la paros co pic instrument, likely being used for dissection or manipulation of the nearby structures.

  • 红点: 这似乎表示一种类似解剖器械的工具,可能用于解剖或操作附近结构。

– Yellow dot: This might denote the liver’s surface. The liver is responsible for producing bile, among its many other functions.

  • 黄点:这可能表示肝脏表面。肝脏的功能众多,其中包括生成胆汁。

– Purple dot: Given its position and appearance, this could either represent a portion of the liver or possibly associated fatty tissue.

  • 紫点:根据其位置和外观,这可能代表部分肝脏或可能相关的脂肪组织。

response 2

响应 2

response 3

响应3

CASE 1 Concise Description

案例1:简明描述

Prompt: This is a photo of a la paros co pic c hole cyst ec to my. The gallbladder is connected to the liver which is usually enveloped in fat. Tell me what instrument or tissue each color represents.

提示:这是一张关于胆囊切除术 (cholecystectomy) 的照片。胆囊通常被脂肪包裹,与肝脏相连。请说明每种颜色代表什么器械或组织。

GPT-4V Response:

GPT-4V 响应:

response 1

响应1

• response 2

  • response 2

response 3

响应3

6.1.2 Case 2: CT image annotation recognition

6.1.2 案例2: CT图像标注识别

CASE 2

案例 2

• Picture modal: CT section • Tasks: Abdominal CT section • Reference Answer: – Red Dot: Liver. – Yellow Dot: Pancrea. – Green Dot: Left kidney. – Blue Dot: Spleen.

• 图像模态:CT 切片
• 任务:腹部 CT 切片
• 参考答案:

  • 红点:肝脏。
  • 黄点:胰腺。
  • 绿点:左肾。
  • 蓝点:脾脏。

CASE 2 Control Group

案例2 对照组

Prompt: This is an abdominal CT scan. Please tell me what organs are marked by the four different colors.

提示:这是一张腹部CT扫描图。请告诉我四种不同颜色标记的器官分别是什么。

GPT-4V Response:

GPT-4V 响应:

response 1

响应1

• response 2

• response 2

response 3

响应3

CASE 2 Complex Descriptions

案例 2 复杂描述

Prompt: This is a cross-sectional image of an abdominal CT scan. The central white highlighted area is the spine, and the outer white highlighted areas are the ribs. The top of the image is the patient’s front, the bottom of the image is the patient’s back, the right side of the image is the patient’s left side, and the left side is the patient’s right side. Within this image, there are four color annotations, with the 4 colors representing 4 different organs. What are these four organs

提示:这是一张腹部CT扫描的横截面图像。中央高亮的白色区域是脊柱,外围高亮的白色区域是肋骨。图像顶部为患者前侧,底部为患者后侧,右侧为患者左侧,左侧为患者右侧。图像内有四个颜色标注,四种颜色代表四个不同器官。这些器官分别是什么

GPT-4V Response:

GPT-4V 响应:

response 1

响应 1

• response 2

  • response 2

response 3

响应3

CASE 2

案例 2

Concise Description

简洁描述

Prompt: This is a cross-sectional image of an abdominal CT scan. In the image, the locations of the organs are mirrored from their actual locations, and the central white-highlighted area is the spine. What are these four organs marked by color

提示:这是一幅腹部CT扫描的横断面图像。图中器官位置与实际呈镜像关系,中央白色高亮区域为脊柱。这些用颜色标记的四个器官分别是什么

GPT-4V Response:

GPT-4V 响应:

• response 1

  • response 1

• response 2

  • response 2

The area marked with a red circle might be the liver or a part of the liver. The area marked with a yellow dot seems closer to the central structures, possibly indicating a part of the pancreas or nearby structures. – The green dot seems to be highlighting a portion of the intestine or a related structure. The blue dot might be indicating a section of the spleen or a structure near the kidney.

红色圆圈标记的区域可能是肝脏或肝脏的一部分。黄色点标记的区域似乎更靠近中央结构,可能表示胰腺或其附近结构的一部分。绿色点似乎突出了肠道或相关结构的一部分。蓝色点可能指示脾脏的一部分或肾脏附近的结构。

response 3

响应3

6.1.3 Case 3: MRI image annotation recognition

6.1.3 案例3: MRI图像标注识别

CASE 3

案例 3

• Picture modal: MRI section • Tasks: Abdominal MRI section • Reference Answer: – Red Dot: Liver. – Yellow Dot: Pancrea. – Green Dot: Left kidney. – Blue Dot: Spleen.

  • 图片模态:MRI切面
  • 任务:腹部MRI切面
  • 参考答案:
    • 红点:肝脏。
    • 黄点:胰腺。
    • 绿点:左肾。
    • 蓝点:脾脏。

CASE 3 Control Group

案例3 控制组

Prompt: This is an abdominal MRI scan. Please tell me what organs are marked by the four different colors.

提示:这是一张腹部 MRI 扫描图。请告诉我四种不同颜色分别标记了哪些器官。

GPT-4V Response:

GPT-4V 响应:

response 1

响应1

– Red dot: This is likely over the liver, which is a large organ on the right side of the abdominal cavity that is involved in many metabolic processes. – Yellow dot: This could represent the gallbladder if it is under the liver, or part of the gastrointestinal tract, such as the duodenum, which is the first part of the small intestine immediately following the stomach. – Green dot: This is probably over the spleen, an organ involved in filtering blood and immune system function, located in the left upper quadrant of the abdomen. – Blue dot: This could represent a portion of the gastrointestinal tract, possibly the stomach or a section of the large intestine, depending on its precise location.

  • 红点:可能位于肝脏上方,这是腹腔右侧参与多种代谢过程的大型器官。
  • 黄点:若在肝脏下方可能代表胆囊,或胃肠道的一部分,如紧接胃部的小肠起始段十二指肠。
  • 绿点:可能对应脾脏,这是位于左上腹负责血液过滤和免疫功能的器官。
  • 蓝点:根据具体位置,可能代表胃肠道的一部分,如胃部或大肠的某段。

response 2

响应 2

response 3

响应 3

CASE 3 Complex Descriptions

案例 3:复杂描述

Prompt: This is a cross-sectional image of an abdominal MRI scan. The central white highlights area is the aorta. The top of the image is the patient’s front, the bottom of the image is the patient’s back, the right side of the image is the patient’s left side, and the left side is the patient’s right side. Within this image, there are four color annotations, with the 4 colors representing 4 different organs. What are these four organs?

提示:这是一张腹部MRI扫描的横断面图像。中央高亮的白色区域是主动脉。图像顶部为患者前方,底部为患者后方,图像右侧为患者左侧,左侧为患者右侧。该图像中有四种颜色标注,四种颜色代表四个不同器官。这些器官分别是什么?

GPT-4V Response:

GPT-4V 响应:

response 1

响应 1

response 2

响应2

• response 3

  • response 3

CASE 3 Concise Description

案例3:简明描述

Prompt: This is a cross-sectional image of an abdominal MRI scan. In the image, the locations of the organs in MRI will be mirrored from their actual human anatomy, what are these four organs marked by color.

提示:这是一张腹部MRI扫描的横断面图像。图中MRI显示的器官位置将与实际人体解剖结构呈镜像关系,请问用颜色标记的这四个器官分别是什么。

GPT-4V Response:

GPT-4V 响应:

response 1

响应1

response 2

响应 2

response 3

响应3

6.2 Tip 2: Providing the tasks can better help analyze the images

6.2 技巧2:提供任务能更好地帮助分析图像

6.2.1 Case 1: Endoscopic operation analysis

6.2.1 案例1: 内窥镜手术分析

CASE 1

案例1

CASE 1 Control group

案例1 对照组

Prompt: Please tell me what the silver surgical instrument in the picture is and its purpose.

提示:请告诉我图片中的银色手术器械是什么及其用途。

GPT-4V Response:

GPT-4V 响应:

• response 1: The image shows a certain surgery being performed under arth ros copy. From the image, surgical instruments can be seen, one of which looks like a clamp or manipulator. Regarding the specific type of surgical instrument, I cannot be completely sure, as this depends on the specific purpose and context of the surgery. Art hr osco pic surgeries are generally used for diagnosing and treating joint problems, especially in the knee, shoulder, and other major joints. The purpose of this instrument may include manipulating tissues, excising damaged tissues, clamping certain objects, etc., depending on the purpose of the surgery.

• 回应1: 图像显示在关节镜 (arthroscopy) 下进行的某种手术。从图像中可以看到手术器械,其中一件看起来像钳子或操作器。关于具体的手术器械类型,我无法完全确定,因为这取决于手术的具体目的和背景。关节镜手术通常用于诊断和治疗关节问题,尤其是膝关节、肩关节等主要关节。该器械的用途可能包括操作组织、切除受损组织、夹持特定物体等,具体取决于手术目的。

response 2: The image shows a scene from an endoscopic surgery. The surgical instrument looks like a ring-shaped grasper or clamp. This type of instrument is typically used to grasp, excise, or manipulate tissue in art hr osco pic or other endoscopic surgeries. The red area in the picture might be tissue inside the joint cavity, and the white structures might be bone or cartilage, but this requires more detailed background information to determine. The specific purpose of this instrument depends on the type and purpose of the surgery, such as in knee arth ros copy, it might be used to excise damaged cartilage or repair injured ligaments.

回复2:图像显示的是内窥镜手术场景。手术器械呈环形抓钳或夹钳状,这类器械通常用于关节镜(arthroscopic)或其他内窥镜手术中抓取、切除或操作组织。图中红色区域可能是关节腔内组织,白色结构可能是骨骼或软骨,但需更详细的背景信息才能确认。该器械的具体用途取决于手术类型和目的,例如在膝关节镜手术中,可能用于切除损伤软骨或修复受损韧带。

response 3: The image shows an instrument used in an art hr osco pic surgery. Art hr osco pic surgery is a method that enters the joint cavity through small incisions and uses an art hro scope for diagnosis and treatment. The instrument in the picture might be used to clamp, cut, or refine tissues inside the joint. Specifically, the middle ring-shaped structure might be a protective sleeve, used to protect the soft tissues inside the joint from damage. The clamping instruments on both sides might be used to manipulate tissues inside the joint or perform surgical operations.

回复3:图中展示的是用于关节镜手术的一种器械。关节镜手术是通过小切口进入关节腔,利用关节镜进行诊断和治疗的方法。图中的器械可能用于钳夹、切割或修整关节内部组织。具体而言,中间的环形结构可能是保护套,用于保护关节内部软组织免受损伤。两侧的钳夹器械可能用于操作关节内部组织或进行手术操作。

CASE 1

案例 1

Providing the tasks

提供任务

Prompt: This image is from a hy ster osco pic myomectomy surgery. Please tell me what the silver surgical instrument in the picture is and its purpose.

提示:这张图片来自宫腔镜子宫肌瘤切除术。请告诉我图中银色手术器械的名称及其用途。

GPT-4V Response:

GPT-4V 响应:

response 1: The image appears to be from a hy ster osco pic procedure, specifically a hy ster osco pic myomectomy, as you mentioned. The ring-shaped surgical instrument visible in the image is known as a "loop electrode" or "re sec to scope loop." The purpose of this loop electrode in a hy ster osco pic myomectomy is to resect or cut away fibroids (myomas) from the inner lining of the uterus. The loop electrode is electrified, and when it comes in contact with the fibroid tissue, it can cut and coagulate (stop bleeding) simultaneously. This allows the surgeon to carefully and selectively remove the fibroid tissue without damaging the surrounding healthy tissue of the uterus. response 2: The image appears to be from a hy ster osco pic procedure, specifically a myomectomy, which involves the removal of fibroids from the uterus. The ring-shaped surgical instrument you’re referring to is likely a "loop electrode" or "re sec to scope loop". This instrument is used in hy ster osco pic surgeries to cut tissue and/or coagulate bleeding vessels. When electrical current is passed through the loop, it can effectively cut tissue or coagulate blood vessels to control bleeding. In the context of a hy ster osco pic myomectomy, the loop electrode is used to shave off or resect pieces of the fibroid until it has been completely removed or reduced in size. response 3: The image appears to show a surgical procedure inside a uterus using hysteroscopy. The ring-shaped instrument you’re referring to is likely a myoma loop or a re sec to scope loop. Its primary purpose is to resect (or cut away) tissue, such as fibroids (myomas) from the inner wall of the uterus. During a hy ster osco pic myomectomy, the surgeon uses this loop to shave off the fibroid tissue in small portions until the fibroid is entirely removed. The loop typically carries electrical current, which helps in cutting and coagulating the tissue, minimizing bleeding during the procedure.

回复1: 该图像似乎来自宫腔镜手术,具体而言是宫腔镜子宫肌瘤切除术。图中可见的环形手术器械被称为"环形电极"或"切除镜环"。在宫腔镜子宫肌瘤切除术中,该环形电极的作用是切除或切割子宫内壁的肌瘤组织。环形电极通电后,当接触肌瘤组织时可同时实现切割和凝血(止血)功能,使外科医生能精准选择性地切除肌瘤组织而不损伤子宫周围健康组织。

回复2: 图像显示的是宫腔镜手术场景,具体为子宫肌瘤切除术。您所指的环形手术器械应为"环形电极"或"切除镜环"。该器械在宫腔镜手术中用于切割组织和/或凝固出血血管。当电流通过环形电极时,可有效切割组织或凝固血管止血。在宫腔镜子宫肌瘤切除术中,外科医生使用环形电极逐层削除或切除肌瘤组织,直至完全去除或缩小肌瘤体积。

回复3: 图像显示的是宫腔镜下子宫内手术操作。您提及的环形器械应为子宫肌瘤切割环或切除镜环,其主要功能是切除(或切割)子宫内壁的肌瘤等组织。在宫腔镜子宫肌瘤切除术中,外科医生使用该切割环分小块削除肌瘤组织,直至完全切除。切割环通常带有电流,既可辅助切割又能实现组织凝固,有效减少术中出血。

6.2.2 Case 2: Endoscopic series operation analysis

6.2.2 案例2: 内窥镜序列操作分析

CASE 2

案例 2

CASE 2 Control Group

案例2 控制组

Prompt: This is a sequence of pictures from an endoscopic surgery. Please analyze each step of the sequence.

提示:这是一组内窥镜手术的连续图像。请分析该序列中的每个步骤。

GPT-4V Response:

GPT-4V 响应:

CASE 2 Providing the tasks

案例 2: 任务分配

Prompt: This is a sequence of pictures showing the localization of polyps during a hy ster osco pic polyp ec to my procedure. Please analyze each step of the sequence.

提示:这是一组展示宫腔镜息肉切除术中息肉定位过程的图片序列。请分析该序列的每个步骤。

GPT-4V Response:

GPT-4V 响应:

CASE 3

案例 3

Ablation study (No input image, text only)

消融研究(无输入图像,仅文本)

Prompt: Assume there is an image from a hy ster osco pic myomectomy surgery. Please tell me what the silver surgical instrument in the picture is and its purpose.

提示:假设有一张来自宫腔镜子宫肌瘤切除术的图像。请告诉我图中银色手术器械的名称及其用途。

GPT-4V Response:

GPT-4V 响应:

• response 1:

  • response 1:

In a hy ster osco pic myomectomy surgery image, the silver surgical instrument you’re most likely to see is a hy st eros cope. A hy st eros cope is a thin, lighted telescope-like device that is inserted into the uterus through the cervix. It allows the surgeon to see the inside of the uterus and is used to diagnose and treat causes of abnormal bleeding. During a myomectomy, the hy st eros cope enables the surgeon to locate and remove fibroids or polyps from within the uterine cavity. It’s a key instrument in such procedures, providing a minimally invasive option for surgery.

在宫腔镜子宫肌瘤切除术的手术图像中,你最可能看到的银色手术器械是宫腔镜。宫腔镜是一种细长、带光源的望远镜状设备,通过宫颈插入子宫。它使外科医生能够观察子宫内部,用于诊断和治疗异常出血的原因。在子宫肌瘤切除术中,宫腔镜帮助外科医生定位并切除子宫腔内的肌瘤或息肉。这是此类手术中的关键器械,为手术提供了微创选择。

• response 2:

• response 2:

In a hy ster osco pic myomectomy surgery image, the silver surgical instrument you are referring to is most likely a hy st eros cope. A hy st eros cope is a thin, lighted telescope-like device that is inserted through the vagina and cervix into the uterus. It allows the surgeon to view the inside of the uterus and operate with minimal invasion. During a myomectomy, the hy st eros cope is used to locate and remove fibroids or polyps from the uterine wall.

在一张宫腔镜子宫肌瘤切除手术图像中,您所指的银色手术器械很可能是宫腔镜。宫腔镜是一种细长、带光源的望远镜状设备,通过阴道和宫颈插入子宫。它使外科医生能够观察子宫内部并进行微创手术。在子宫肌瘤切除术中,宫腔镜用于定位并切除子宫壁上的肌瘤或息肉。

• response 3:

• 响应3:

In a hy ster osco pic myomectomy surgery, the most likely silver surgical instrument you might see in an image is a hy st eros cope. A hy st eros cope is a thin, lighted telescope-like device that is inserted into the uterus through the cervix. It allows the surgeon to view the inside of the uterus and operate with precision. This instrument is crucial for hy ster osco pic myomectomy, a procedure used to remove fibroids (non cancerous growths) from the uterus. The hy st eros cope is used both to visualize the fibroids and to guide other instruments that are used to excise or destroy the fibroids.

在宫腔镜子宫肌瘤切除术中,图像中最可能出现的银色手术器械是宫腔镜。宫腔镜是一种细长、带光源的望远镜状设备,通过宫颈置入子宫。它使外科医生能够观察子宫内部并进行精确操作。该器械对于宫腔镜子宫肌瘤切除术至关重要,该手术用于切除子宫内的肌瘤(非癌性增生组织)。宫腔镜既用于观察肌瘤,也用于引导其他切除或破坏肌瘤的手术器械。

CASE 4

案例 4

Ablation study (No input image, text only)

消融研究 (无输入图像,仅文本)

Prompt: Assume there is an image from a myomectomy surgery. Please tell me what the ringshaped surgical instrument in the picture is and its purpose.

提示:假设有一张子宫肌瘤切除术的图像。请告诉我图中环形手术器械的名称及其用途。

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应1:

In a hypothetical scenario where there’s an image from a hy ster osco pic myomectomy, a ring-shaped surgical instrument could likely be a "loop re sec to scope." The loop resectoscope is equipped with an electric wire loop. Its purpose is to cut and coagulate tissue during the procedure.

在一个假设场景中,若出现来自宫腔镜子宫肌瘤切除术的图像,环形手术器械很可能是"环形电切镜 (loop resectoscope)"。该器械配有电切环,用于在手术过程中切割和凝固组织。

During a hy ster osco pic myomectomy, this instrument is specifically used to shave off the fibroid tissue from the uterus. The electric loop cuts the fibroid, and the electric current ensures that there’s minimal bleeding by coagulating the blood vessels as it cuts.

在宫腔镜子宫肌瘤切除术中,该器械专门用于削除子宫内的肌瘤组织。电切环切除肌瘤的同时,电流通过凝固血管确保术中出血量最小。

• response 2:

• 响应 2:

A ring-shaped instrument in the context of hy ster osco pic myomectomy could possibly be a tenaculum or a loop electrode. The loop electrode is often part of a re sec to scope, which is commonly used in hy ster osco pic myomectomy to remove fibroids from the uterine wall. The loop can be used to cut and remove the fibroid tissue with the use of electric current, while the hy st eros cope allows the surgeon to see inside the uterus during the procedure.

在宫腔镜子宫肌瘤切除术中,环形器械可能是子宫钳或环形电极。环形电极通常是宫腔电切镜的组成部分,常用于宫腔镜子宫肌瘤切除术中从子宫壁切除肌瘤。该环形器械可通过电流切割并移除肌瘤组织,同时宫腔镜能让外科医生在手术过程中观察子宫内部。

• response 3:

• response 3:

In the context of a hy ster osco pic myomectomy, a ring-shaped surgical instrument is typically referred to as a "loop re sec to scope." The loop re sec to scope is an instrument used in conjunction with a hy st eros cope.

在宫腔镜子宫肌瘤切除术中,环形手术器械通常称为"环形电切镜 (loop resectoscope)"。环形电切镜是与宫腔镜配合使用的器械。

Purpose:

目的:

The loop re sec to scope has an electric wire loop at its end. It is used to cut and coagulate tissue during the procedure. During a hy ster osco pic myomectomy, the loop re sec to scope is utilized to shave off or remove fibroid tissue from the uterus. The electric current that passes through the wire loop allows for both cutting and coagulation, making the removal of fibroids efficient and minimizing bleeding.

环状电切镜末端带有金属线圈,用于手术中切割和凝固组织。在宫腔镜子宫肌瘤切除术中,环状电切镜用于削除或切除子宫内的肌瘤组织。通过金属线圈的电流既能实现切割又能止血,使得肌瘤切除高效且出血量最小化。

6.3 Tip 3: Implementing step-by-step guidance in multi-round dialogue allows GPT-4v to handle complex tasks more efficiently by breaking them down into simpler operations.

6.3 技巧3: 在多轮对话中实施分步引导,通过将复杂任务拆解为简单操作,使GPT-4v更高效地处理复杂任务。

6.3.1 Case 1: Endoscopic mask-polyp matching challenge

6.3.1 案例1: 内窥镜掩模-息肉匹配挑战

CASE 1 Control group

案例1 对照组

Prompt: Please tell me which mask at the top corresponds to the polyp below.

提示:请告诉我顶部的哪个遮罩对应下面的息肉。

GPT-4V Response:

GPT-4V 响应:

CASE 1

案例 1

Step-by-step guidance (single round)

分步指导 (单轮)

GPT-4V Response:

GPT-4V 响应:

CASE 1

案例 1

Step-by-step guidance (multi-round)

分步指导(多轮)

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应1:

– Answering prompt 1: To identify the location of the polyp in Picture 4 based on the mask template, let’s divide each image into nine equal parts (3x3 grid) and see where the corresponding pattern from the template appears in Picture 4.

回答提示1:根据掩膜模板识别图4中息肉的位置,我们将图像均分为九等份(3x3网格),观察模板对应的模式在图4中的出现区域。

Upon examining the images:

检查图像时:

Pictures 1-3 serve as a mask template for polyps. Picture 4 is an image from a colon os copy. Based on the mask template provided in Pictures 1-3 and the colon os copy image in Picture 4, the polyp in Picture 4 would be located in the bottom center block (or the 8th block if numbering from top left to bottom right in a 3x3 grid).

图片1-3作为息肉的掩模模板。图片4是结肠镜图像。根据图片1-3提供的掩模模板和图片4的结肠镜图像,图片4中的息肉应位于底部中心区块(若按3x3网格从左上方至右下方编号则为第8区块)。

• response 2:

• 响应 2:

• response 3:

  • response 3:

– Answering prompt 1: The polyp in Picture 4 is located in the center block. – Answering prompt 2: For Pictures 1, 2, and 3: Picture 1: The polyp is located in the center block. Picture 2: The polyp is located in the top-center block. Picture 3: The polyp is located in the top-right block. respectively. – Answering prompt 3: The mask in Picture 1 would correspond to Picture 4.

  • 回答提示1:图4中的息肉位于中央区块。
  • 回答提示2:对于图1、图2和图3:
    图1:息肉位于中央区块。
    图2:息肉位于中上区块。
    图3:息肉位于右上区块。
  • 回答提示3:图1中的掩膜对应图4。

6.3.2 Case 2: CT mask-organs matching challenge

6.3.2 案例2: CT器官掩膜匹配挑战

CASE 2 Control Group

案例2 对照组

Prompt: Please tell me which mask at the top corresponds to the liver below, consider the red part only.

提示:请告诉我上方哪个遮罩对应下方的肝脏,仅考虑红色部分。

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应1:

Upon examining the CT images provided, it is evident that the red mask in Image 1 corresponds to the liver’s anatomical location. This mask covers the right upper quadrant of the abdomen, which aligns with the liver’s predominant placement in the human body. The red part follows the expected shape and region of the liver, extending slightly towards the left upper quadrant, which is consistent with the liver’s anatomy. The other masks either cover additional areas not occupied by the liver (as seen in Image 2) or highlight regions to the left (as seen in Image 3), which do not correspond to the primary location of the liver. Therefore, the correct liver mask is the one shown in Image 1.

在检查提供的CT图像时,可以明显看出图像1中的红色掩膜对应于肝脏的解剖位置。该掩膜覆盖了腹部的右上象限,这与人体中肝脏的主要位置一致。红色部分遵循了肝脏预期的形状和区域,略微向左上象限延伸,这与肝脏的解剖结构相符。其他掩膜要么覆盖了肝脏未占据的额外区域(如图像2所示),要么突出了左侧的区域(如图像3所示),这些都不对应于肝脏的主要位置。因此,正确的肝脏掩膜是图像1中显示的那个。

• response 2:

• 响应2:

Based on the anatomy visible in the CT images, the red part of the mask that corresponds to the liver is found in Image 1 at the top. The liver is situated predominantly in the right upper quadrant of the abdominal cavity, and the red mask in Image 1 accurately overlays this area, conforming to the expected anatomical location and shape of the liver. The other images either highlight areas that extend beyond the typical boundaries of the liver or, in the case of Image 3, cover a section on the left side where the liver is not present. Image 4 serves as a reference as it does not have any highlighted area, confirming that the highlighted region in Image 1 is indeed representative of the liver’s location.

根据CT影像中可见的解剖结构,与肝脏对应的红色掩膜部分位于顶部的图1中。肝脏主要位于腹腔右上部,图1中的红色掩膜精确覆盖该区域,符合肝脏预期的解剖位置和形状。其他图像要么突出显示超出肝脏典型边界的区域,要么如图3所示覆盖了左侧不存在肝脏的部分。图4作为参考未标注任何高亮区域,进一步证实图1中的高亮区域确实代表了肝脏的位置。

• response 3: Upon examining the provided images, it is clear that the mask in Image 1 corresponds to the liver in the images below, considering only the red part. This conclusion is reached by observing the anatomical position and shape of the red highlighted area, which closely matches the typical location and size of the liver in the right upper quadrant of the abdomen on a transverse CT scan. The liver is the largest organ in the abdominal cavity and has a characteristic shape that is well represented by the mask in Image 1, which covers the right lobe of the liver and extends slightly to the left, consistent with the liver’s anatomical position. Images 2 and 3 highlight areas that either extend beyond the typical boundaries of the liver or are situated in an atypical location for liver tissue. Image 4 does not have any highlighted areas to compare.

• 响应3: 通过检查所提供的图像,可以明确图像1中的掩膜(仅考虑红色部分)对应于下方图像中的肝脏。这一结论是通过观察红色高亮区域的解剖位置和形状得出的,其与横断面CT扫描中腹部右上象限肝脏的典型位置和大小高度吻合。作为腹腔内最大的器官,肝脏具有特征性形态,而图像1中的掩膜恰当地覆盖了肝右叶并向左略微延伸,这与肝脏的解剖位置一致。图像2和3所标注的区域要么超出了肝脏的常规边界,要么位于肝脏组织的非典型位置。图像4则没有任何可供对比的高亮区域。

CASE 2

案例 2

Step-by-step guidance (single round)

分步指南(单轮)

GPT-4V Response:

GPT-4V 响应:

response 2: In the provided CT images, the liver is represented by the red-shaded area in pictures 1, 2, and 3. The liver is located in the upper right portion of the abdominal cavity, just below the diaphragm and occupies a large area of the upper quadrant of the abdomen.

回复2:在提供的CT图像中,肝脏在图1、图2和图3中以红色阴影区域表示。肝脏位于腹腔右上部,紧贴膈肌下方,占据了腹部右上象限的大部分区域。

Picture 1: In picture 1, the liver is the large red area taking up most of the right side of the image (the patient’s left side, as medical images are typically viewed from the perspective of facing the patient). – Picture 2: In picture 2, again, the liver is the large red area, which is on the right side of the image Picture 3: In picture 3, the liver is shown with two separate red areas, indicating the larger right lobe of the liver and the smaller left lobe.

图 1: 在图1中,肝脏是占据图像右侧大部分区域的红色大范围区域 (患者左侧,因为医学图像通常以面对患者的视角呈现)。
图 2: 在图2中,肝脏同样是图像右侧的红色大范围区域。
图 3: 在图3中,肝脏显示为两个独立的红色区域,分别代表较大的右叶和较小的左叶。

Based on the position and shape of the liver in pictures 1, 2, and 3, the mask that would correspond to picture 4 would likely be similar to the ones in pictures 1 and 2, as these show a complete liver, which is how it would normally appear in an un segmented CT image.

根据图片1、2和3中肝脏的位置和形状,图片4对应的掩膜可能与图片1和2中的类似,因为这些图片显示了完整的肝脏,这也是未分割CT图像中肝脏的正常外观。

response 3: In the provided images, the red-shaded areas are indicative of the liver’s location in the axial (horizontal) cross-section of the abdominal CT scans. Here’s how the liver is positioned in each of the images:

回复3: 在提供的图像中,红色阴影区域表示腹部CT扫描轴向(水平)切面中肝脏的位置。以下是各图像中肝脏的定位情况:

For picture 4, which is an unmarked CT scan, the liver would typically be located in the same general area as in pictures 1, 2, and 3, meaning it should be in the upper right quadrant of the abdomen, extending across the midline towards the left upper quadrant beneath the diaphragm. Based on the location in the marked images, the corresponding mask for picture 4 would likely be most similar to the mask in picture 1 or 2, since these show a large portion of the liver in its typical position.

在图4这张未标记的CT扫描图像中,肝脏通常位于与图1、2和3相同的常规区域,即应处于腹部右上象限,横跨中线向膈肌下方的左上象限延伸。根据标记图像中的位置判断,图4对应的掩膜很可能与图1或图2中的掩膜最为相似,因为这两张图显示了处于典型位置的大部分肝脏组织。

CASE 2

案例 2

Step-by-step guidance (multi-round)

分步指导(多轮)

Prompt 1: Picture 123 is the mask template for the liver(the red part), and Picture 4 is an image of an abdominal CT section, please tell me the location and size of the liver in picture 4.(answer in 50 words)

提示1:图123是肝脏的掩模模板(红色部分),图4是一张腹部CT切片的图像,请告诉我图4中肝脏的位置和大小。(回答限50字)

Prompt 2:Please tell me the location and size of the livers in picture 1 2 3 (answer in 50 words) Prompt 3:Which mask would correspond to Picture 4 (answer in 50 words)

提示2:请告诉我图片1、2、3中肝脏的位置和大小 (回答限50字)
提示3:哪张遮罩(mask)对应图片4 (回答限50字)

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应1:

– Answering prompt 1: In Picture 4, the liver is located in the upper right quadrant of the abdominal cavity, extending across the midline to the upper left quadrant. It is the large, dark gray area beneath the diaphragm. The size cannot be accurately determined without specific measurements or scale, but it occupies a significant portion of the upper abdominal area.

  • 回答提示1: 在图4中,肝脏位于腹腔右上象限,延伸跨过中线至左上象限。它是横膈膜下方的大片深灰色区域。由于缺乏具体测量数据或比例尺,无法准确判断其大小,但可见其占据了上腹部相当大区域。

– Answering prompt 2: In Pictures 1, 2, and 3, the liver (highlighted in red) is primarily located in the right upper quadrant of the abdominal cavity. The size appears large, spanning across to the left upper quadrant, below the diaphragm. Without a scale, precise size cannot be determined, but it occupies a substantial portion of the abdominal cavity in each image.

回答提示2:在图1、图2和图3中,肝脏(用红色高亮显示)主要位于腹腔的右上象限。其体积显得较大,横跨至左上象限,位于横膈膜下方。由于缺乏比例尺,无法确定精确尺寸,但在每张图像中它都占据了腹腔的相当大部分。

– Answering prompt 3: Picture 1’s mask seems to cover the largest area, extending quite far to the left, which might be too large for Picture 4. Picture 2’s mask has a more confined area on the right side, which may not cover all of the liver in Picture 4. Picture 3’s mask is in between the two in terms of coverage, possibly being a closer match to the liver in Picture 4. Based on this visual assessment, Picture 3’s mask might correspond most closely to the liver in Picture 4.

  • 回答提示3:图1的遮罩覆盖区域似乎最大,向左延伸较远,可能对图4来说过大。图2的遮罩在右侧区域较为局限,可能无法覆盖图4的全部肝脏。图3的遮罩在覆盖范围上介于两者之间,可能更接近图4的肝