OpenAI’s latest large vision-language model (LVLM), GPT-4V(ision), has piqued considerable interest for its potential in medical applications. Despite its promise, recent studies and internal reviews highlight its under performance in specialized medical tasks. This paper explores the boundary of GPT-4V’s capabilities in medicine, particularly in processing complex imaging data from endo s copies, CT scans, and MRIs etc. Leveraging open-source datasets, we assessed its foundational competencies, identifying substantial areas for enhancement. Our research emphasizes prompt engineering, an often-under utilized strategy for improving AI responsiveness. Through iterative testing, we refined the model’s prompts, significantly improving its interpretative accuracy and relevance in medical imaging. From our comprehensive evaluations, we distilled 10 effective prompt engineering techniques, each fortifying GPT-4V’s medical acumen. These methodical enhancements facilitate more reliable, precise, and clinically valuable insights from GPT-4V, advancing its oper ability in critical healthcare environments. Our findings are pivotal for those employing AI in medicine, providing clear, actionable guidance on harnessing GPT-4V’s full diagnostic potential.

OpenAI 最新推出的大型视觉语言模型 (LVLM) GPT-4V(ision) 因其在医疗应用中的潜力而备受关注。尽管前景广阔，但近期研究和内部评估表明，该模型在专业医疗任务中表现欠佳。本文探讨了 GPT-4V 在医学领域的能力边界，特别是在处理内窥镜、CT 扫描和 MRI 等复杂影像数据时的表现。基于开源数据集，我们评估了其基础能力，发现了大量待改进领域。研究重点聚焦提示工程 (prompt engineering) 这一常被低估的 AI 响应优化策略。通过迭代测试，我们优化了模型的提示词，显著提升了其在医学影像解读中的准确性和相关性。经过系统评估，我们提炼出 10 项有效的提示工程技术，每项都能强化 GPT-4V 的医疗判断力。这些方法论的改进使 GPT-4V 能输出更可靠、精准且具临床价值的洞察，提升了其在关键医疗场景中的适用性。本研究成果为医疗领域 AI 应用者提供了关键指导，清晰阐述了如何充分释放 GPT-4V 的诊断潜力。

1 Introduction

1 引言

The burgeoning field of multimodal medical large language models offers promising applications for the future of healthcare and artificial intelligence (AI) research. One of the leading AI models in this domain is OpenAI’s GPT-4[1], which has recently been enhanced with a multimodal variant, GPT-4V[2], incorporating image input capabilities. This model extension draws from the formidable linguistic foundation of GPT-4 and has garnered significant attention due to its potential in processing and interpreting medical imagery.

蓬勃发展的多模态医疗大语言模型领域为未来医疗保健和人工智能 (AI) 研究提供了广阔的应用前景。该领域的领先AI模型之一是OpenAI的GPT-4[1]，近期其升级版GPT-4V[2]通过引入图像输入功能实现了多模态扩展。这一模型升级基于GPT-4强大的语言基础，因其在医学影像处理与解析方面的潜力而备受关注。

Prior evaluations, such as those conducted by Chaoyi Wu et al, have provided a detailed examination of GPT-4V’s medical performance, probing the limits of its image processing abilities[3], also the performance of GPT-4V in medical examination has been tested[4, 5, 6] and reveals the potential of the usage of GPT-4V in the medical area. In the realm of MLLMs, performance is contingent on both image comprehension and textual understanding capabilities. Merely inputting an image along with basic, unrefined prompts does not fully utilize the model’s language understanding potential and only partially engages its image interpretation capabilities. To delineate the boundaries of GPT-4V’s abilities, research focused on prompt optimization is crucial. Building upon previous studies, this work concentrates on exploring the optimal combination of prompts to enable GPT-4V to operate at its peak efficiency, thereby offering novel insights for subsequent medical research utilizing GPT-4V.

先前由Chaoyi Wu等人开展的评估已对GPT-4V的医学性能进行了详细考察，探索了其图像处理能力的边界[3]，同时针对GPT-4V在医学检查中的表现也进行了测试[4,5,6]，揭示了GPT-4V在医疗领域的应用潜力。在多模态大语言模型(MLLM)领域，模型性能取决于图像理解与文本理解能力的双重作用。仅输入图像及基础未优化的提示词，既无法充分发挥模型的语言理解潜力，也只能部分调动其图像解析能力。为界定GPT-4V的能力边界，提示词优化的相关研究至关重要。本研究在前人工作基础上，着重探索提示词的最优组合方案，使GPT-4V能以最高效能运作，从而为后续基于GPT-4V的医学研究提供新思路。

In our investigation, we selected diverse modalities of data and experimented with various combinations of textual and visual prompts. Ultimately, we synthesized ten tips that significantly enhance GPT-4V’s performance in medical applications. Given that the focus of this study is on prompt engineering rather than an exhaustive test of medical capabilities, we did not examine an extensive array of cases. Instead, this paper will showcase results from a selected set of tests, emphasizing the impact of strategic prompt design on enhancing the model’s effectiveness in medical image analysis and interpretation.

在我们的调查中，我们选择了多种数据模态，并尝试了文本与视觉提示的不同组合。最终，我们总结出十条能显著提升GPT-4V在医疗应用中性能的技巧。鉴于本研究聚焦于提示工程而非全面的医疗能力测试，我们并未考察大量案例。相反，本文将展示一组精选测试的结果，重点阐述策略性提示设计对提升模型在医学图像分析与解读中效力的影响。

2 Methods

2 方法

2.1 Overview

2.1 概述

In this study, we utilize an image and text prompt as the input, followed by a manual assessment of the output’s quality. Our methodology involves presenting the same image with a variety of textual prompts. Each prompt’s output is then subject to a thorough human evaluation. When a particular textual prompt yields exceptionally positive results, it undergoes more in-depth testing and further evaluation in our test sets. If this prompt strategy consistently demonstrates superior performance across all tests, it is then incorporated into our list of effective prompt tips.

在本研究中，我们采用图像和文本提示作为输入，随后对输出质量进行人工评估。我们的方法包括用多种文本提示呈现同一张图像，并对每个提示的输出结果进行详尽的人工评估。当某个特定文本提示产生异常积极的结果时，它将在我们的测试集中接受更深入的测试和进一步评估。若该提示策略在所有测试中持续表现出优越性能，则会被纳入我们的有效提示技巧列表。

o optimize the testing process and ensure rigorous and systematic evaluation, we have established the following 8 ules during the test part:

为优化测试流程并确保严谨系统的评估，我们在测试环节制定了以下8项规则：

test is evaluated independently, allowing for a more accurate assessment of the model’s performance in isolated interactions.

测试是独立评估的，能够更准确地衡量模型在孤立交互中的表现。

• Rule 8 - Version Specification of GPT-4V: The version of GPT4-V used in our research spans from September 27 to October 18, 2023. It is important to note that subsequent updates to the model post this period may impact the effectiveness of certain prompt tips, potentially rendering some obsolete. This time frame is critical for contextual i zing our findings and understanding the specific capabilities of the model version under study.

• 规则8 - GPT-4V版本说明：本研究所用GPT4-V版本覆盖2023年9月27日至10月18日期间。需注意该时间节点后的模型更新可能影响部分提示技巧的有效性，甚至导致某些方法失效。此时间段对定位研究结论及理解特定版本模型能力具有关键意义。

In the manual evaluation part, we compared the outputs of GPT-4V against a set of reference answers. This process involved a scoring system wherein a match with the reference answer earned one point. Responses that were not incorrect but also not completely aligned with the reference answer neither gained nor lost points. Incorrect answers resulted in a deduction of one point. Through this scoring mechanism, we calculated the scores for each distinct prompt strategy employed during the study. The 10 prompt tips with the highest scores were then extracted. These top-scoring prompt tips form the crux of our study’s conclusions, offering valuable insights into the most effective ways of interacting with the GPT-4V model. This approach not only underscores the model’s capabilities but also highlights the nuances of optimizing AI interactions in complex tasks.

在人工评估部分，我们将GPT-4V的输出与一组参考答案进行了对比。该流程采用计分制：与参考答案完全匹配得1分；答案未出错但与参考答案不完全一致时不得分；错误答案扣1分。通过该评分机制，我们计算了研究中每种不同提示策略的得分，并提取出得分最高的10条提示技巧。这些高分提示技巧构成了本研究结论的核心，为优化GPT-4V模型交互方式提供了关键洞见。该方法不仅验证了模型能力，更揭示了复杂任务中优化AI交互的细微差异。

2.2 Testing procedure

2.2 测试流程

In the specific context of prompt testing, the experimental process of our study typically divides into two or three distinct parts:

在提示测试的具体背景下，我们的研究实验过程通常分为两到三个独立部分：

• Control Group: Under the guidelines mentioned in Section 2.1, within this group, we generally employ the simplest and most direct prompts, or sometimes only input the image without any textual prompt. Upon obtaining the corresponding outputs, these are then compared with the reference answers for scoring. • Experiment Group: Also following the rules outlined in Section 2.1, this group uses the same images as the control group for input. However, the textual prompts are optimized for this set. The outputs generated from these optimized prompts are then compared with the reference answers and scored accordingly. • Ablation Study (if necessary): In accordance with the rules specified in Section 2.1, this part of the experiment involves inputting only textual prompts without any images. The outputs are then compared with the reference answers for evaluation and scoring.

对照组：按照第2.1节提到的指导原则，该组通常采用最简单直接的提示词，有时仅输入图像而不带任何文本提示。获得相应输出后，将其与参考答案进行比对评分。
实验组：同样遵循第2.1节规定的规则，该组使用与对照组相同的图像输入，但文本提示词经过针对性优化。基于优化提示生成的输出会与参考答案比对并评分。
消融实验（如需要）：根据第2.1节的具体规则，该部分实验仅输入文本提示而不使用图像，随后将输出结果与参考答案比对评估并打分。

A typical demonstration template is shown in Figure2.2. This structured approach enables a comprehensive assessment of the model’s performance under different conditions, allowing for a clear understanding of the impact of various prompt strategies, Among all the trials, we will list out 10 tips for effective prompts.

图 2.2 展示了一个典型的演示模板。这种结构化方法能够全面评估模型在不同条件下的性能，从而清晰理解各种提示策略的影响。在所有试验中，我们将列出10条有效提示的技巧。

Figure 1: A typical example of demonstration, illustrating the basic information of a case along with the results from both control and experimental groups

图 1: 典型示例演示，展示案例基本信息及对照组与实验组的结果

2.3 Testing dataset

2.3 测试数据集

In our study, the endoscopy group in the test set utilized data from the Kvasir_ SEG [8]and M2caiseg datasets[9], alongside data from the hy st eros copy procedures we collected. For CT imaging, the AutoPET[10], Total Segment at or[11], AbdomenCT-1K[12] datasets were used, supplemented with data from relevant literature. MRI data incorporated the $B r a T S 202I[13],A T L A S V2.O$ and AMOS 2021[14] datasets. It is crucial to acknowledge that this dataset compilation is not comprehensive and does not represent all medical scenarios. Nonetheless, it provides an adequate basis for testing our prompt tips. Future endeavors will aim to expand this dataset to further enrich the study.

在我们的研究中，测试集的内窥镜组使用了来自Kvasir_ SEG [8]和M2caiseg数据集[9]的数据，以及我们收集的宫腔镜手术数据。CT成像方面采用了AutoPET[10]、TotalSegmentator[11]和AbdomenCT-1K[12]数据集，并补充了相关文献数据。MRI数据整合了$Br a T S 202I[13]$、$A T L A S V2.O$以及AMOS 2021[14]数据集。必须指出的是，该数据集汇编并不全面，也不能代表所有医疗场景。尽管如此，它为我们测试提示技巧提供了充分的基础。未来工作将致力于扩展该数据集以进一步丰富研究。

3 Prompt Tips

3 个提示词技巧

Utilizing the aforementioned testing methodology and adhering to the established rules, we have successfully identified ten valuable prompt tips:

利用上述测试方法并遵循既定规则，我们成功总结出十条实用提示技巧：

4 Discussion

4 讨论

4.1 Tip 1: Concise language is more effective than complex descriptions, emphasizing succinct, task-relevant details in image analysis.

4.1 技巧1：简洁语言比复杂描述更有效，在图像分析中应强调与任务相关的简明细节

4.1.1 Demonstration Info

4.1.1 演示信息

In this demonstration, we conducted a comparative analysis of prompt outcomes using concise language versus complex descriptions within endoscopic, CT, and MRI imaging contexts, all under the condition of providing equivalent information. The study meticulously compared the effectiveness of simplified and elaborate prompt styles on GPT-4V’s performance. The final results clearly indicated that, when the same amount of information is provided, prompts articulated in a concise manner significantly enhance GPT-4V’s functional capabilities. This finding underscores the importance of prompt language efficiency in optimizing the performance of AI models in medical imaging scenarios, suggesting that brevity and clarity in prompts can lead to more effective AI analysis and interpretation within various medical imaging modalities.

在本演示中，我们在内窥镜、CT和MRI成像场景下，以提供同等信息量为前提，对简洁语言与复杂描述的提示效果进行了对比分析。研究细致比较了简化版与详尽版提示风格对GPT-4V性能的影响。最终结果明确显示：当提供相同信息量时，采用简洁表述的提示能显著提升GPT-4V的功能表现。这一发现凸显了提示语言效率对于优化AI模型在医学影像场景性能的重要性，表明跨不同医学影像模态时，提示的简洁性与清晰度可带来更有效的AI分析与解读。

4.1.2 Analysis

4.1.2 分析

Based on the tests conducted, we observed that within endoscopic scenarios, simplified descriptions yielded better outcomes compared to complex descriptions. In contrast, within CT and MRI contexts, the performance of simplified descriptions was comparable to that of complex descriptions, both showing significant improvements over control groups with no or weak descriptions.

根据测试结果，我们观察到在内窥镜场景中，简化描述比复杂描述效果更佳。相比之下，在CT和MRI场景中，简化描述与复杂描述的表现相当，两者均比无描述或弱描述的对照组有显著提升。

In this regard, we posit that the superior performance of simplified descriptions can be attributed to their provision of only the most critical information. Complex descriptions, conversely, offer an abundance of auxiliary information. During the generation of textual content, the weight of these auxiliary descriptive elements is also accounted for, which consequently diminishes the relative importance of core information. This can lead to slight deviations in results, potentially resulting in inaccuracies. Therefore, when inputting prompts into GPT-4V, it is advisable to include only the most relevant core information pertaining to the query at hand, minimizing the incorporation of extensive, indirectly related descriptive language.

在这方面，我们认为简化描述之所以表现更优，是因为它们仅提供了最核心的信息。相反，复杂描述会附带大量辅助信息。在生成文本内容时，这些辅助性描述元素的权重也会被纳入考量，从而相对降低了核心信息的重要性。这可能导致结果出现细微偏差，甚至造成不准确的情况。因此，在向GPT-4V输入提示时，建议仅包含与当前查询最相关的核心信息，尽量减少引入大量间接相关的描述性语言。

4.2 Tip 2: Providing the tasks can better help analyze the images

4.2 技巧2：提供任务能更好地辅助分析图像

4.2.1 Demonstration Info

4.2.1 演示信息

In this demonstration, we illustrate how the task of providing medical images enhances the model’s ability to analyze input medical images. We focus particularly on the identification of surgical instruments in single endoscopic images and the analysis of endoscopic sequences. An ablation study was incorporated into this demonstration to ensure that the textual prompts provided did not leak the answers. This approach underscores the effectiveness of combining visual and textual inputs in improving the model’s performance in medical image analysis, while also ensuring the integrity of the results through rigorous testing methods.

在本演示中，我们展示了提供医学图像的任务如何增强模型分析输入医学图像的能力。我们特别关注单张内窥镜图像中手术器械的识别和内窥镜序列的分析。本次演示引入了消融研究 (ablation study) ，以确保提供的文本提示不会泄露答案。该方法通过结合视觉和文本输入，有效提升了模型在医学图像分析中的性能，同时通过严格的测试方法确保结果的可靠性。

4.2.2 Analysis

4.2.2 分析

In this test, we found that specifying the type of task significantly enhances the recognition capabilities of GPT-4V. We believe that medical images, particularly endoscopic images, are challenging to differentiate based on a single image due to the limitations of the endoscope’s field of view (FOV) and the similarity of most endoscopic scenes. This makes it difficult to identify the specific task of the endoscope, thereby complicating the analysis of the problem. After specifying the particular task of the endoscope, the probability distribution of the output becomes more focused on the answer, thereby increasing the accuracy of the response.

在本测试中，我们发现明确任务类型能显著提升 GPT-4V 的识别能力。我们认为由于内窥镜视野 (FOV) 的局限性和大多数内窥镜场景的相似性，基于单张图像区分医学图像（尤其是内窥镜图像）具有挑战性。这使得识别内窥镜的具体任务变得困难，从而增加了问题分析的复杂度。在明确内窥镜的具体任务后，输出的概率分布会更集中于答案，从而提高了响应的准确性。

To further validate the effectiveness of our prompt strategy, we conducted an ablation study. In the ablation study, we did not input images and merely inputted prompts telling GPT-4V to pretend there was an image, then asked GPT-4V the same questions. If GPT-4V could still provide the correct answers under these circumstances, it would indicate that we had leaked answer-related information in our prompts. If it could not respond correctly, it would suggest that we did not disclose any answer-related information in our prompts.

为了进一步验证我们的提示策略的有效性，我们进行了消融实验。在消融实验中，我们没有输入图像，仅输入提示让GPT-4V假装有一张图像，然后向GPT-4V提出相同的问题。如果GPT-4V在这种情况下仍能提供正确答案，则表明我们在提示中泄露了与答案相关的信息。如果它无法正确响应，则表明我们在提示中未透露任何与答案相关的信息。

We discovered that with pure text input, GPT-4V could not correctly answer our questions or provide analysis, indicating that our prompts did not leak additional information. This means that the correct answers were due to our prompt guiding the image analysis rather than the prompt leaking key answer-related information.

我们发现，在纯文本输入的情况下，GPT-4V无法正确回答问题或提供分析，这表明我们的提示词没有泄露额外信息。这意味着正确答案是由于提示词引导了图像分析，而非提示词泄露了与答案相关的关键信息。

In contrast, we presented Case 4, a prompt we believe leaked some answer-related information. Compared to Case 3, we included a key piece of information, "ring-shape," which could lead GPT-4V to produce the correct answer without an image.

相比之下，我们展示了案例4，这是一个我们认为泄露了部分答案相关信息的提示。与案例3相比，我们加入了一个关键信息"环形"，这可能导致GPT-4V在没有图像的情况下生成正确答案。

In summary, we believe that the conclusion that GPT-4V’s performance can be further enhanced by providing the task is solid and valid.

总之，我们相信通过明确任务要求可以进一步提升GPT-4V性能的结论是坚实有效的。

4.3 Tip 3: Implementing step-by-step guidance in multi-round dialogue allows GPT-4v to handle complex tasks more efficiently by breaking them down into simpler operations.

4.3 技巧3：在多轮对话中实施分步指导，通过将复杂任务拆解为简单操作，使GPT-4v更高效地处理复杂任务。

4.3.1 Demonstration Info

4.3.1 演示信息

In this demonstration, we showcased how, through step-by-step prompt tips, efficient matching of medical images with their corresponding masks can be achieved. It is important to emphasize that while this task may not hold substantial clinical significance, we selected it for demonstration due to its high level of difficulty. This task demands the utmost utilization of the multimodal model’s image understanding capabilities, thereby exemplifying the value of prompt tips.

在本演示中，我们展示了如何通过逐步提示技巧，实现医学图像与对应掩膜的高效匹配。需要特别强调的是，虽然该任务可能不具备重大临床意义，但我们选择它进行演示是因为其极高的难度水平。该任务需要最大限度发挥多模态模型的图像理解能力，从而彰显提示技巧的价值。

We also presented comparisons across three different modalities: endoscopy, CT, and MRI. These comparisons were based on scenarios where prompt tips were not used, where step-by-step guidance was provided in a single-round dialogue, and where step-by-step guidance was facilitated through multi-round dialogues. Our findings indicate that guiding GPT through multi-round dialogues with step-by-step instructions yielded the most effective results. This highlights the potential of structured, iterative interactions in harnessing the full capabilities of advanced AI models in complex image analysis tasks.

我们还对三种不同模态（内窥镜、CT和MRI）进行了比较。这些比较基于以下场景：未使用提示技巧、在单轮对话中提供逐步指导，以及通过多轮对话实现逐步引导。研究发现，通过多轮对话结合分步指导来引导GPT能产生最佳效果。这凸显了结构化迭代交互在复杂图像分析任务中充分发挥先进AI模型潜力的优势。

4.3.2 Analysis

4.3.2 分析

In our study, the tests conducted on endoscopic imagery revealed certain limitations in GPT-4V’s processing capabilities. Specifically, when tasked with identifying which mask correlates with a polyp in a colon image, GPT-4V consistently errs towards selecting the second mask. This error appears to stem from its misinterpretation of the vertical alignment of the real colon os copy images and their corresponding masks, leading to a mistaken assumption of correlation where none exists.

在我们的研究中，针对内窥镜图像的测试揭示了GPT-4V处理能力的某些局限性。具体而言，当要求识别结肠图像中哪个掩膜与息肉相关时，GPT-4V始终倾向于选择第二个掩膜。这一错误似乎源于其对真实结肠开口副本图像及其对应掩膜垂直排列的误判，导致在不存在关联的情况下错误假设了相关性。

Nevertheless, we observed that a gradual, step-by-step guidance approach significantly enhances GPT-4V’s performance in handling complex tasks. However, this enhancement is not as pronounced when all step-by-step instructions are consolidated into a single-round dialogue. We believe this finding aligns with the conclusion drawn in Tip 1, which emphasizes that prompts should not be excessively lengthy or complex. In the process of guiding GPT-4V, the inclusion of an abundance of descriptive and medically irrelevant terminology adversely affected its performance, likely due to the undue weight these words carried in the model’s processing.

然而，我们观察到渐进式分步引导能显著提升GPT-4V处理复杂任务的表现。但当所有分步指令被整合至单轮对话时，这种提升效果并不明显。该发现与技巧1的结论相吻合，即提示词(prompt)不应过度冗长或复杂。在引导GPT-4V过程中，引入大量描述性且与医学无关的术语会对其表现产生负面影响，这可能是由于这些词汇在模型处理过程中被赋予了不当权重。

Conversely, when we divided the step-by-step guidance across multiple rounds of dialogue, we noticed a marked improvement in GPT-4V’s performance. This approach enabled it to accomplish challenging tasks, like accurate mask matching, with a high degree of precision—a feat not yet achievable by other large multimodal models. Such a prompting strategy effectively leverages GPT-4V’s powerful capabilities.

相反，当我们将分步指导分散到多轮对话中时，我们注意到 GPT-4V 的表现有显著提升。这种方法使其能够以极高的精度完成具有挑战性的任务，例如精确的掩码匹配 (mask matching) ，这是其他大型多模态模型尚未实现的壮举。这种提示策略有效地利用了 GPT-4V 的强大能力。

Similarly, we observed a comparable effect in both the CT and MRI groups. Without any specifically designed prompts, GPT-4V failed to correctly identify the tasks at hand. However, with the incorporation of step-by-step guidance, there was a significant enhancement in performance, with multi-round dialogues demonstrating even more pronounced improvements. This evidence suggests that a structured, interactive approach is crucial for maximizing the potential of advanced multimodal AI models in complex medical imaging tasks.

同样地，我们在CT和MRI组中都观察到了类似的效果。在没有专门设计提示词的情况下，GPT-4V未能正确识别当前任务。但通过引入分步指导后，模型性能得到显著提升，其中多轮对话的改进尤为明显。这一证据表明，结构化交互方式对于发挥先进多模态AI模型在复杂医学影像任务中的潜力至关重要。

4.4 Tip 4: Don’t expose your target at the very beginning when you start multi-round dialog

4.4 技巧4: 开始多轮对话时不要过早暴露目标

4.4.1 Demonstration Info

4.4.1 演示信息

In the previous demonstration, we showcased how step-by-step guidance within a multi-turn dialogue can effectively accomplish challenging tasks. In this demonstration, we aim to present a strategy that may potentially weaken the effectiveness of prompts: prematurely revealing the objective. We will use the same three cases and almost identical prompts as in the previous demonstration. The key difference, however, lies in our approach: we will disclose our objective – matching masks with medical images to GPT-4V right from the start.

在之前的演示中，我们展示了多轮对话中逐步引导如何有效完成复杂任务。本次演示将展示一种可能弱化提示效果的策略：过早暴露目标。我们将沿用之前演示的三个案例和几乎相同的提示语，关键区别在于操作方式：一开始就向GPT-4V明示目标——将掩膜与医学图像进行匹配。

This setup allows us to critically evaluate the impact of early goal exposure on the performance of GPT-4V in completing the task. By comparing the outcomes of this approach with those from the previous demonstration, where the objective was gradually introduced, we aim to discern the influence of prompt structure and information sequencing on the model’s efficiency and accuracy in task execution. This investigation is pivotal in understanding the nuances of effective prompt design, particularly in complex problem-solving scenarios involving AI models.

此设置使我们能够批判性地评估早期目标暴露对GPT-4V任务完成表现的影响。通过将该方法与先前逐步引入目标的演示结果进行对比，我们旨在辨别提示结构和信息排序对模型任务执行效率与准确性的影响。这项研究对于理解有效提示设计的细微差别至关重要，尤其是在涉及AI模型的复杂问题解决场景中。

4.4.2 Analysis

4.4.2 分析

In this demonstration, a notable difference from the previous one was observed when the objective was prematurely revealed. Despite employing our previously high-accuracy prompt strategies, the accuracy remained markedly low, comparable to the performance achieved with unrefined prompts.

本次演示中，当目标被提前揭示时，观察到了与前次实验的显著差异。尽管采用了先前高准确率的提示策略，准确率仍明显偏低，与使用未优化提示时的性能相当。

Our observation led to the discovery that this decrease in performance is attributable to the "short-term memory" of GPT-4V. To elaborate, if the objective is revealed too early, GPT-4V tends to immediately generate an answer based on the exposed information. The quality of this answer is similar to that produced by unrefined prompts, which is to say, it is of lower quality. Further, if prompts are inputted after receiving this initial answer, in an attempt to guide GPT-4V, the subsequent outputs tend to be erroneous, aligning with the previously derived incorrect answer. The strong short-term memory of GPT-4V means that if the initial output is not manually identified as incorrect, it carries this erroneous prior into subsequent responses, thus affecting the quality of the answers. Therefore, when using GPT-4V with the intention of achieving a specific outcome through multi-turn dialogue, it is advisable not to disclose the task objective at the onset.

我们的观察发现，这种性能下降可归因于GPT-4V的"短期记忆"。具体而言，若过早揭示任务目标，GPT-4V会立即基于暴露信息生成答案，其质量与未经优化的提示所产生的答案相似，即质量较低。更重要的是，若在获得初始答案后继续输入提示以引导GPT-4V，后续输出往往会与先前得出的错误答案保持一致。GPT-4V强大的短期记忆意味着，若未手动识别初始输出的错误，系统会将这种错误先验带入后续响应，从而影响答案质量。因此，当希望通过多轮对话实现特定目标时，建议不要在一开始就透露任务目标。

4.5 Tip 5: Describing appearances or characteristics will greatly enhance the performance.

4.5 技巧5: 描述外观或特征将显著提升性能

4.5.1 Demonstration Info

4.5.1 演示信息

In this demonstration, we illustrated that providing descriptions of appearance or characteristics significantly enhances the model’s performance. For the descriptions of appearance or characteristics, we adopted definitions from Wikipedia in this showcase. Given that this prompt tip involves detailed descriptions of appearance and characteristics, it is most effectively applied in scenarios where there is a need for batch processing of data that is familiar to the user.

在本演示中，我们展示了提供外观或特征描述能显著提升模型性能。对于外观或特征的描述，本案例采用了维基百科的定义。由于这一提示技巧涉及外观与特征的详细描述，它最适用于用户熟悉且需要批量处理数据的场景。

4.5.2 Analysis

4.5.2 分析

In this study, we discovered that providing specific descriptions of shapes and appearances significantly enhances GPT-4V’s ability to recognize certain structures. In the control group, where no descriptions were provided, the performance was suboptimal. Specifically, within the endoscopy group, organs like the liver, fat, and intestines were difficult to identify without prompts, with only the gallbladder being relatively easier to recognize. In CT and MRI imaging, GPT-4V was able to identify easily recognizable organs like the liver without additional prompts, whereas identifying the gallbladder and pancreas proved to be more challenging.

在本研究中，我们发现提供形状和外观的具体描述能显著提升GPT-4V识别某些结构的能力。对照组未提供任何描述时，其表现欠佳。具体而言，在内窥镜组中，肝脏、脂肪和肠道等器官若无提示则难以识别，仅胆囊相对较易辨认。在CT和MRI成像中，GPT-4V无需额外提示即可识别肝脏等易辨器官，而胆囊和胰腺的识别则更具挑战性。

Subsequently, we utilized organ-specific prompts based on appearance descriptions from Wikipedia and Mayo Clinic, among others. It is crucial to note that GPT-4V may output descriptions of the images in the order of the input prompts. If the order of descriptions inputted matches the order of annotations in the image, this might result in correct outputs. However, these outputs could be derived from the order of the input descriptions rather than an understanding of the image. To mitigate this, we intentionally scrambled the order of our descriptions to prevent answer leakage due to input sequencing. We observed a substantial increase in recognition accuracy upon providing appearance descriptions. In the endoscopy group, apart from the intestines which remained challenging, other structures were identified with high accuracy. In both the CT and MRI groups, complete and accurate identification was achieved, including small structures like the gallbladder in MRI images.

随后，我们基于维基百科和梅奥诊所等来源的外观描述，使用了器官特定的提示词。需要特别注意的是，GPT-4V可能会按照输入提示的顺序输出图像描述。如果输入的描述顺序与图像中标注的顺序一致，可能会导致看似正确的输出结果。但这些输出可能仅源于输入描述的顺序，而非对图像的真实理解。为消除这种干扰，我们有意打乱了描述顺序，以避免因输入序列导致的答案泄露。实验表明，提供外观描述后识别准确率显著提升：在内窥镜组中，除肠道仍具挑战性外，其他结构均能高精度识别；在CT和MRI组中，包括MRI图像中胆囊等微小结构在内的所有器官均实现了完整准确的识别。

Furthermore, we also observed that when dealing with markers of different colors, it is advisable to avoid using color-related descriptions. Such descriptions can lead to confusion in GPT-4V’s processing. We will delve into this phenomenon in greater detail in the following chapter, examining the implications of color descriptions on the model’s interpretive accuracy and how they might inadvertently mislead the AI’s analysis.

此外，我们还观察到，在处理不同颜色的标记时，应避免使用与颜色相关的描述。这类描述可能导致 GPT-4V 处理时产生混淆。我们将在下一章更详细地探讨这一现象，分析颜色描述对模型解释准确性的影响，以及它们如何可能无意中误导 AI 的分析。

4.6 Tip 6: Appearance descriptors should be non-conflicting with any image annotations to avoid misinterpretations.

4.6 技巧6: 外观描述符不应与任何图像标注冲突，以避免误解。

4.6.1 Demonstration Info

4.6.1 演示信息

In the previous chapter, we discussed the impact of appearance descriptions on the performance of GPT-4V, noting a significant enhancement in its capabilities. Given that appearance descriptions often involve color, especially in endoscopic scenarios, it is pertinent to explore the outcomes when markers are also denoted by colors. In this round of demonstrations, we excluded data from CT and MRI modalities, as these primarily involve grayscale imaging and seldom incorporate colors beyond black and white. This focus allows us to specifically assess the influence of color descriptions in contexts where color plays a critical role, providing insights into the nuances of prompt design for color-rich medical imaging scenarios

在上一章中，我们讨论了外观描述对GPT-4V性能的影响，注意到其能力有显著提升。鉴于外观描述通常涉及颜色（尤其在 endoscopic 场景中），探索用颜色标注标记时的效果显得尤为重要。本轮演示中，我们排除了CT和MRI模态的数据，因为这些成像方式主要采用灰度图像，极少使用黑白以外的颜色。通过聚焦于色彩起关键作用的场景，我们能更精准地评估颜色描述的影响，从而为色彩丰富的医学影像场景提供提示设计方面的细微洞察。

4.6.2 Analysis

4.6.2 分析

In this chapter, we discovered that using color descriptors in scenarios where markers are also color-coded can significantly impair performance. In this showcase, the results were even worse than those in the control group (Section 4.5.1). Notably, all instances of yellow dots were incorrectly identified as fat. This misidentification likely stems from GPT-4V interpreting the yellow dots as part of the scene rather than as markers. When we described fat as "generally pale yellow," GPT-4V mistakenly classified the yellow dots as fat. This observation underscores the importance of careful consideration in using color descriptors, particularly in contexts where color plays a dual role in both the scene and the markers.

在本章中，我们发现当标记物也采用颜色编码时，使用颜色描述符会显著影响性能。本次展示的结果甚至比对照组更差（见第4.5.1节）。值得注意的是，所有黄色圆点都被错误识别为脂肪。这种误判可能源于GPT-4V将黄色圆点视为场景的一部分而非标记物。当我们描述脂肪为"通常呈淡黄色"时，GPT-4V错误地将黄色圆点归类为脂肪。这一发现凸显了在使用颜色描述符时需要格外谨慎，特别是在颜色同时承担场景元素和标记物双重功能的场景中。

4.7 Tip 7: Clarifying the contextual relationships between sequential images enhances GPT-4v’s analytical accuracy and detail recognition.

4.7 技巧7: 明确连续图像间的上下文关系可提升GPT-4v的分析准确性与细节识别能力

4.7.1 Demonstration Info

4.7.1 演示信息

This demonstration primarily focused on the processing of endoscopic sequences. We composed a single image by combining four pictures from a hy st eros copy to assess GPT-4V’s capability in handling endoscopic sequences. The test was divided into two groups: one without inputting the image sequence numbers6.7.1 and another with the sequence numbers included 6.7.1. Our findings indicate that image analysis is more effective when the sequence order of the images is provided. This result highlights the importance of sequential context in enhancing the model’s comprehension and accuracy in processing related images in a series.

本次演示主要针对内窥镜序列的处理。我们通过组合宫腔镜检查中的四张图片合成单幅图像，以评估GPT-4V处理内窥镜序列的能力。测试分为两组：一组未输入图像序列编号6.7.1，另一组则包含序列编号6.7.1。研究结果表明，当提供图像顺序信息时，分析效果更佳。这一发现凸显了序列上下文对于提升模型理解能力和处理系列相关图像准确性的重要性。

4.7.2 Analysis

4.7.2 分析

Our findings in this chapter reveal that providing the sequential order of images significantly enhances their analysis. This approach is beneficial for processing video files as well. The underlying effectiveness of this strategy appears to stem from how GPT-4V perceives each image. Without sequence information, GPT-4V tends to treat each image as a discrete entity, leading to independent analysis with limited contextual understanding.

本章研究发现，提供图像序列顺序能显著提升其分析效果。这一方法同样适用于视频文件处理。该策略的有效性根源在于GPT-4V对每帧图像的感知机制：缺失序列信息时，GPT-4V倾向于将每幅图像视为独立实体，导致分析过程缺乏上下文关联。

However, when the sequential order is specified, or when the interrelationship among the images is clarified, GPT-4V is more inclined to view the images as interconnected elements of a larger narrative. Consequently, it synthesizes a more comprehensive workflow, encompassing the entire sequence from the first to the last image. This methodology proves especially useful in the analysis of surgical procedures in videos or image sequences, allowing for a more integrated and coherent interpretation of the visual data.

然而，当指定了顺序或明确了图像间的关联性时，GPT-4V更倾向于将这些图像视为一个更大叙事中相互关联的元素。因此，它会综合出一个更全面的工作流程，涵盖从第一张到最后一张图像的整个序列。这种方法在分析视频或图像序列中的手术步骤时尤为有效，能够对视觉数据进行更整合、连贯的解读。

4.8 Tip 8: Splicing multiple images into one and providing a sequence order enhances task processing effectiveness compared to multiple simultaneous inputs.

4.8 技巧8：将多张图像拼接为单图并标注顺序，相比同时输入多张图像更能提升任务处理效果

4.8.1 Demonstration Info

4.8.1 演示信息

In the previous demonstration, we presented the results of splicing multiple images into one(6.7.1 and 6.7.1). GPT-4V, however, possesses a built-in upload feature that can be used for analyzing image sequences. In this demonstration, we will compare the outcomes of using GPT-4V’s native upload feature versus splicing multiple images together. It is important to note that, as of the September 27 to December 12, 2023 version, this upload system can handle a maximum of four images at a time. Additionally, images uploaded through this system are automatically displayed in varying sizes on the webpage, although their actual upload dimensions remain unchanged, as shown in Figure 6.8.1. This comparison aims to highlight the differences in results obtained through these two methods of image presentation and processing.

在之前的演示中，我们展示了将多张图像拼接成一张的结果(6.7.1和6.7.1)。然而，GPT-4V具备内置的上传功能，可用于分析图像序列。本演示将对比使用GPT-4V原生上传功能与拼接多张图像的效果差异。需要注意的是，截至2023年9月27日至12月12日的版本，该上传系统单次最多只能处理四张图像。此外，通过该系统上传的图像会在网页上自动以不同尺寸显示，但其实际上传尺寸保持不变，如图6.8.1所示。本次对比旨在凸显这两种图像呈现和处理方式所获结果的差异。

4.8.2 Analysis

4.8.2 分析

In our prior showcase, we created a composite image by splicing together multiple individual images. However, in this subsequent test, we employed GPT4-v’s inherent image uploading feature to upload four images simultaneously, which is the system’s maximum capacity, to analyze a surgical procedure depicted across these images. The results of this test indicate that the direct upload of multiple images through GPT4-v’s system tends to be less effective in the analysis compared to utilizing a single, spliced composite image. This reduced effectiveness might be due to the current version of GPT4-v not being fully optimized for multi-image analysis. While GPT4-v shows remarkable proficiency in processing single images, it seems there is a need for additional development to enhance its capability to handle and interpreting multiple images concurrently.

在我们之前的展示中，通过拼接多张独立图像创建了一张合成图。但在后续测试中，我们使用GPT4-v内置的多图上传功能同时上传了四张图片（这是系统上限），以分析这些图片中描绘的手术流程。测试结果表明，与使用单张合成图相比，通过GPT4-v系统直接上传多张图片的分析效果较差。这种效能下降可能是由于当前版本的GPT4-v尚未针对多图分析进行充分优化。尽管GPT4-v在处理单张图像时表现出色，但仍需进一步开发来提升其同步处理和解析多张图像的能力。

4.9 Tip 9: Providing comparative analysis opportunities, especially with temporal patient data, deepens lesion or condition assessments.

4.9 技巧9：提供对比分析机会（尤其是时序患者数据）可深化病灶或病情评估。

4.9.1 Demonstration Info

4.9.1 演示信息

In this demonstration, we highlighted the significant enhancement in GPT-4V’s performance facilitated by the provision of comparative information. This comparison could involve contrasting healthy and abnormal states, or it might pertain to changes in pathological lesions. Such juxtapositions provide GPT-4V with a richer context, enabling a more accurate and nuanced understanding of the medical imagery. By presenting contrasting scenarios or changes over time, GPT-4V can better discern subtle differences, leading to improved diagnostic accuracy. This approach underscores the importance of contextual information in medical image analysis, particularly in complex cases where changes are subtle or involve gradual progression.

在本演示中，我们重点展示了通过提供对比信息对GPT-4V性能的显著提升。这种对比可能涉及健康与异常状态的对照，也可能与病理病变的变化相关。此类并置为GPT-4V提供了更丰富的上下文，使其能更准确细致地理解医学影像。通过呈现对比场景或时间维度的变化，GPT-4V能更好地区分细微差异，从而提高诊断准确性。该方法凸显了上下文信息在医学影像分析中的重要性，尤其适用于变化细微或呈渐进发展的复杂病例。

4.9.2 Analysis

4.9.2 分析

In this test, we observed that providing GPT-4v with comparative images from before and after significantly improved the accuracy of diagnoses. This is intuitively reasonable since the nature of many conditions, such as the growth of pulmonary nodules, can be challenging to assess from a single, final-stage image. However, when presented with a series of images depicting the nodule over several years, GPT-4v is able to analyze the growth pattern of the nodule. Consequently, this enables a more accurate assessment of the patient’s condition.

在此测试中，我们观察到向GPT-4v提供前后对比图像能显著提升诊断准确率。这从直觉上是合理的，因为许多病症（如肺结节生长）的性质仅通过单张终期影像难以评估。但当输入展现结节多年变化的系列图像时，GPT-4v能分析结节的生长模式，从而更精准地评估患者病情。

Additionally, in diagnosing lung nodules, we opted not to mark incorrect responses, focusing instead on correctly answered parts. This decision was based on the complexity inherent in diagnosing diseases solely from the growth patterns of nodules. We acknowledge that all responses provided by the model possess a certain degree of validity under these circumstances. However, it was observed that when images showing comparative growth were presented, all responses accurately identified a de no carcinoma as the correct diagnosis in the first answer. This finding suggests that providing contextual visual information significantly enhances the model’s diagnostic accuracy, particularly in complex cases like lung nodule analysis.

此外，在诊断肺结节时，我们选择不标注错误回答，而是聚焦于正确应答部分。这一决定基于仅凭结节生长模式诊断疾病所固有的复杂性。我们承认在此情境下，模型提供的所有回答均具有一定有效性。但研究发现，当展示对比生长图像时，所有回答均在首次诊断中准确识别出新生癌变。这表明提供上下文视觉信息能显著提升模型诊断准确率，尤其在肺结节分析等复杂病例中。

4.10 Tip 10: Directing GPT-4V’s focus to the interested areas will facilitate more targeted and relevant output.

4.10 技巧 10: 将 GPT-4V 的注意力引导至目标区域可生成更具针对性的输出

4.10.1 Demonstration Info

4.10.1 演示信息

In this demonstration, we examined the impact of providing spatial orientation information on GPT-4V’s recognition capabilities. We observed that when directing GPT-4V’s focus to the interested areas, GPT-4V’s responses were notably more coherent and accurate. This finding suggests that spatial information significantly aids the AI in contextual i zing and interpreting medical images. This prompt tip is particularly useful in cases where the approximate location of a lesion is known, but a detailed analysis and diagnosis are required using GPT-4V. By integrating location-based cues into the prompts, GPT-4V can focus its analysis more effectively, leading to more precise diagnostic outcomes. This approach highlights the importance of spatial context in enhancing AI-assisted medical image diagnosis.

在此演示中，我们研究了提供空间方位信息对GPT-4V识别能力的影响。我们发现，当引导GPT-4V关注目标区域时，其响应明显更具连贯性和准确性。这一发现表明，空间信息能有效帮助AI对医学图像进行情境化解读。该提示技巧特别适用于已知病灶大致位置、但需通过GPT-4V进行详细分析和诊断的场景。通过将基于位置的线索整合到提示中，GPT-4V能更高效地聚焦分析，从而获得更精确的诊断结果。该方法凸显了空间上下文在增强AI辅助医学图像诊断中的重要性。

4.10.2 Analysis

4.10.2 分析

In the experiments conducted in this chapter, we found that providing specific spatial orientation significantly enhances the accuracy of GPT-4V’s responses. In the MRI experiments, without specified location guidance, GPT-4V tended to diagnose images as normal, even when there were clear abnormalities present. Once directed to focus on the "upper left quadrant," GPT-4V promptly identified the anomalies in that region.

在本章的实验中，我们发现提供具体的空间方位能显著提升GPT-4V的回答准确率。在MRI实验中，若未指定位置引导，GPT-4V倾向于将图像诊断为正常，即便存在明显异常。一旦指示其关注"左上象限"，GPT-4V便能迅速识别该区域的异常。

In the PET scan tests, without a defined area of observation, GPT-4V erroneously identified regions with naturally high metabolic activity, such as the brain, heart, and bladder, as potential pathological sites, leading to an incorrect diagnosis of possible metastatic cancer. However, when GPT-4V was instructed to concentrate on the thoracic region, it successfully differentiated between cardiac activity and lung cancer, providing a more accurate diagnosis. These findings highlight the importance of directional cues in assisting GPT-4V to accurately interpret medical imaging, emphasizing the need for precise location information in AI-assisted diagnostic processes.

在PET扫描测试中，由于未定义观察区域，GPT-4V错误地将大脑、心脏和膀胱等天然高代谢活动区域识别为潜在病理部位，导致误诊为可能存在的转移性癌症。但当指示GPT-4V专注于胸部区域时，它成功区分了心脏活动与肺癌，提供了更准确的诊断。这些发现凸显了方向性提示对协助GPT-4V准确解读医学影像的重要性，同时强调了在AI辅助诊断过程中提供精确位置信息的必要性。

5 Summary

5 总结

In this work, we identified ten valuable prompt tips that significantly enhance GPT-4V’s utility in medical contexts through extensive testing across various datasets. These prompts led to marked performance improvements in all tasks, demonstrating the power of well-designed prompts in enhancing AI analysis in medical imaging. Future research will broaden the case spectrum and employ these prompt techniques for bulk data processing using APIs. Additionally, we plan to explore the effectiveness of these prompts in non-medical, natural image settings. This exploration aims to understand the adaptability of these strategies across different image analysis scenarios. Our findings not only contribute to medical image analysis using AI models but also open new avenues for applying AI-driven strategies in a wider range of imaging applications.

在这项工作中，我们通过跨数据集的广泛测试，确定了十条能显著提升GPT-4V在医疗场景效用的提示词技巧。这些提示词使所有任务性能均获得显著提升，证明了精心设计的提示词对增强医学影像AI分析能力的作用。未来研究将扩大病例范围，并运用这些提示技术通过API进行批量数据处理。此外，我们计划探究这些提示词在非医疗自然图像场景中的有效性，以评估这些策略在不同图像分析情境中的适应能力。本研究成果不仅推动了AI模型在医学影像分析中的应用，更为AI驱动策略在更广泛成像领域的应用开辟了新路径。

References

参考文献

6 Case demonstration

6 案例演示

In our demonstration, each scenario is illustrated through comprehensive visuals. "Prompt" refers to specific textual inputs presented to the users. The "GPT-4V" section captures the responses elicited by the GPT-4V model. It is important to note that, due to predetermined safeguard measures, GPT-4V consistently asserts its non-expertise in specialized domains such as radiology. For clarity and succinctness, we have omitted these routine disclaimers from our presentation of results. "Reference answer" represents the standard responses or interpretations, as informed by expert sources like Radio pae dia.

在我们的演示中，每个场景都通过全面的视觉材料进行展示。"Prompt"指呈现给用户的具体文本输入。"GPT-4V"部分记录了GPT-4V模型生成的响应。需要注意的是，由于预设的安全防护措施，GPT-4V在放射学等专业领域始终声明自身不具备专业知识。为保持清晰简洁，我们在结果展示中省略了这些常规免责声明。"Reference answer"代表基于Radio pae dia等专业信息来源的标准答案或解读。

For a clearer interpretation of our findings, we implemented a color-coding strategy:

为了更清晰地解读我们的发现，我们采用了颜色编码策略：

6.1 Tip 1: Concise language is more effective than complex descriptions, emphasizing succinct, task-relevant details in image analysis.

6.1 技巧1：简洁语言比复杂描述更有效，在图像分析中应强调与任务相关的简明细节。

6.1.1 Case 1: Endoscopic image annotation recognition

6.1.1 案例1: 内窥镜图像标注识别

CASE 1

案例 1

• Picture modal: Endoscopic picture • Tasks: La paros co pic c hole cyst ec to my • Reference Answer: – Yellow Dot: Liver. – Red Dot: Surgical grasper. – Blue Dot: Gallbladder. – Green Dot: Fats. – Purple Dot: Intestine.

• 图片模态：内窥镜图像
• 任务：腹腔镜胆囊切除术
• 参考答案：

黄点：肝脏。
红点：手术抓钳。
蓝点：胆囊。
绿点：脂肪组织。
紫点：肠道。

CASE 1 Control Group

案例1 控制组

Prompt: Tell me what instrument or tissue each color represents.

提示：告诉我每种颜色代表什么乐器或组织。

GPT-4V Response:

GPT-4V 响应：

• response 1

– Yellow Dot: This looks like it could be the liver tissue. The liver is a large organ that sits right above the gallbladder, and its edge can be seen during gallbladder surgeries.

黄点: 这看起来可能是肝组织。肝脏是位于胆囊正上方的大型器官，在胆囊手术中可以看到其边缘。

– Red Dot: This instrument is a clip applicator or a clip holder. It’s used to apply clips to blood vessels or ducts to ensure there is no bleeding or leakage after they are cut.

红点: 该器械是夹子施放器或夹子固定器，用于将夹子施加在血管或导管上，确保切断后不会出血或渗漏。

– Green Dot: This structure seems to be the cystic duct, which connects the gallbladder to the common bile duct. It’s a key structure that surgeons need to identify and clip or ligate during the procedure.

绿点: 该结构似乎是胆囊管 (cystic duct) ，它将胆囊与胆总管相连。这是外科医生在手术过程中需要识别并夹闭或结扎的关键结构。

– Purple Dot: The structure at the location of the purple dot is not entirely clear from the perspective of the image, but it might be a part of the omentum or another adipose tissue. The omentum is a fatty apron-like structure that hangs down from the stomach and covers some of the organs in the abdomen.

紫点：从图像视角来看，紫色标记点的结构并不完全清晰，可能是网膜或其他脂肪组织的一部分。网膜是悬挂于胃部下方的脂肪围裙状结构，覆盖着腹腔内的部分器官。

response 2

响应2

response 3

响应3

CASE 1 Complex Descriptions

案例1：复杂描述

Prompt: This is a photo of a la paros co pic gallbladder removal surgery. The gallbladder is connected to the liver, and the liver is usually around with fat. In the picture, there are five different colors marking different tissues or instruments. Based on what can be seen under the la paros co pic view and the appearance of the organs, please tell me what each color represents.

提示：这是一张腹腔镜胆囊切除手术的照片。胆囊与肝脏相连，而肝脏周围通常有脂肪组织。图中用五种不同颜色标记了不同的组织或器械。根据腹腔镜视图下可见的器官外观，请告诉我每种颜色代表什么。

GPT-4V Response:

GPT-4V 响应:

response 1

响应 1

– Blue Dot: This appears to represent the cystic duct. The cystic duct connects the gallbladder to the common bile duct and is a pathway for bile to flow in and out of the gallbladder.

蓝点: 这似乎代表胆囊管。胆囊管连接胆囊与胆总管，是胆汁进出胆囊的通道。

Green dot: This likely represents the gallbladder. The gallbladder is a small, pearshaped organ that stores bile produced by the liver.

绿点: 这很可能代表胆囊。胆囊是一个储存肝脏产生胆汁的小型梨形器官。

– Red dot: This seems to indicate a la paros co pic instrument, likely being used for dissection or manipulation of the nearby structures.

红点: 这似乎表示一种类似解剖器械的工具，可能用于解剖或操作附近结构。

– Yellow dot: This might denote the liver’s surface. The liver is responsible for producing bile, among its many other functions.

黄点：这可能表示肝脏表面。肝脏的功能众多，其中包括生成胆汁。

– Purple dot: Given its position and appearance, this could either represent a portion of the liver or possibly associated fatty tissue.

紫点：根据其位置和外观，这可能代表部分肝脏或可能相关的脂肪组织。

response 2

响应 2

response 3

响应3

CASE 1 Concise Description

案例1：简明描述

Prompt: This is a photo of a la paros co pic c hole cyst ec to my. The gallbladder is connected to the liver which is usually enveloped in fat. Tell me what instrument or tissue each color represents.

提示：这是一张关于胆囊切除术 (cholecystectomy) 的照片。胆囊通常被脂肪包裹，与肝脏相连。请说明每种颜色代表什么器械或组织。

GPT-4V Response:

GPT-4V 响应：

response 1

响应1

• response 2

response 2

response 3

响应3

6.1.2 Case 2: CT image annotation recognition

6.1.2 案例2: CT图像标注识别

CASE 2

案例 2

• Picture modal: CT section • Tasks: Abdominal CT section • Reference Answer: – Red Dot: Liver. – Yellow Dot: Pancrea. – Green Dot: Left kidney. – Blue Dot: Spleen.

• 图像模态：CT 切片
• 任务：腹部 CT 切片
• 参考答案：

红点：肝脏。
黄点：胰腺。
绿点：左肾。
蓝点：脾脏。

CASE 2 Control Group

案例2 对照组

Prompt: This is an abdominal CT scan. Please tell me what organs are marked by the four different colors.

提示：这是一张腹部CT扫描图。请告诉我四种不同颜色标记的器官分别是什么。

GPT-4V Response:

GPT-4V 响应:

response 1

响应1

• response 2

response 3

响应3

CASE 2 Complex Descriptions

案例 2 复杂描述

Prompt: This is a cross-sectional image of an abdominal CT scan. The central white highlighted area is the spine, and the outer white highlighted areas are the ribs. The top of the image is the patient’s front, the bottom of the image is the patient’s back, the right side of the image is the patient’s left side, and the left side is the patient’s right side. Within this image, there are four color annotations, with the 4 colors representing 4 different organs. What are these four organs

提示：这是一张腹部CT扫描的横截面图像。中央高亮的白色区域是脊柱，外围高亮的白色区域是肋骨。图像顶部为患者前侧，底部为患者后侧，右侧为患者左侧，左侧为患者右侧。图像内有四个颜色标注，四种颜色代表四个不同器官。这些器官分别是什么

GPT-4V Response:

GPT-4V 响应:

response 1

响应 1

• response 2

response 2

response 3

响应3

CASE 2

案例 2

Concise Description

简洁描述

Prompt: This is a cross-sectional image of an abdominal CT scan. In the image, the locations of the organs are mirrored from their actual locations, and the central white-highlighted area is the spine. What are these four organs marked by color

提示：这是一幅腹部CT扫描的横断面图像。图中器官位置与实际呈镜像关系，中央白色高亮区域为脊柱。这些用颜色标记的四个器官分别是什么

GPT-4V Response:

GPT-4V 响应:

• response 1

response 1

• response 2

response 2

The area marked with a red circle might be the liver or a part of the liver. The area marked with a yellow dot seems closer to the central structures, possibly indicating a part of the pancreas or nearby structures. – The green dot seems to be highlighting a portion of the intestine or a related structure. The blue dot might be indicating a section of the spleen or a structure near the kidney.

红色圆圈标记的区域可能是肝脏或肝脏的一部分。黄色点标记的区域似乎更靠近中央结构，可能表示胰腺或其附近结构的一部分。绿色点似乎突出了肠道或相关结构的一部分。蓝色点可能指示脾脏的一部分或肾脏附近的结构。

response 3

响应3

6.1.3 Case 3: MRI image annotation recognition

6.1.3 案例3: MRI图像标注识别

CASE 3

案例 3

• Picture modal: MRI section • Tasks: Abdominal MRI section • Reference Answer: – Red Dot: Liver. – Yellow Dot: Pancrea. – Green Dot: Left kidney. – Blue Dot: Spleen.

图片模态：MRI切面
任务：腹部MRI切面
参考答案：
- 红点：肝脏。
- 黄点：胰腺。
- 绿点：左肾。
- 蓝点：脾脏。

CASE 3 Control Group

案例3 控制组

Prompt: This is an abdominal MRI scan. Please tell me what organs are marked by the four different colors.

提示：这是一张腹部 MRI 扫描图。请告诉我四种不同颜色分别标记了哪些器官。

GPT-4V Response:

GPT-4V 响应：

response 1

响应1

– Red dot: This is likely over the liver, which is a large organ on the right side of the abdominal cavity that is involved in many metabolic processes. – Yellow dot: This could represent the gallbladder if it is under the liver, or part of the gastrointestinal tract, such as the duodenum, which is the first part of the small intestine immediately following the stomach. – Green dot: This is probably over the spleen, an organ involved in filtering blood and immune system function, located in the left upper quadrant of the abdomen. – Blue dot: This could represent a portion of the gastrointestinal tract, possibly the stomach or a section of the large intestine, depending on its precise location.

红点：可能位于肝脏上方，这是腹腔右侧参与多种代谢过程的大型器官。
黄点：若在肝脏下方可能代表胆囊，或胃肠道的一部分，如紧接胃部的小肠起始段十二指肠。
绿点：可能对应脾脏，这是位于左上腹负责血液过滤和免疫功能的器官。
蓝点：根据具体位置，可能代表胃肠道的一部分，如胃部或大肠的某段。

response 2

响应 2

response 3

响应 3

CASE 3 Complex Descriptions

案例 3：复杂描述

Prompt: This is a cross-sectional image of an abdominal MRI scan. The central white highlights area is the aorta. The top of the image is the patient’s front, the bottom of the image is the patient’s back, the right side of the image is the patient’s left side, and the left side is the patient’s right side. Within this image, there are four color annotations, with the 4 colors representing 4 different organs. What are these four organs?

提示：这是一张腹部MRI扫描的横断面图像。中央高亮的白色区域是主动脉。图像顶部为患者前方，底部为患者后方，图像右侧为患者左侧，左侧为患者右侧。该图像中有四种颜色标注，四种颜色代表四个不同器官。这些器官分别是什么？

GPT-4V Response:

GPT-4V 响应:

response 1

响应 1

response 2

响应2

• response 3

response 3

CASE 3 Concise Description

案例3：简明描述

Prompt: This is a cross-sectional image of an abdominal MRI scan. In the image, the locations of the organs in MRI will be mirrored from their actual human anatomy, what are these four organs marked by color.

提示：这是一张腹部MRI扫描的横断面图像。图中MRI显示的器官位置将与实际人体解剖结构呈镜像关系，请问用颜色标记的这四个器官分别是什么。

GPT-4V Response:

GPT-4V 响应:

response 1

响应1

response 2

响应 2

response 3

响应3

6.2 Tip 2: Providing the tasks can better help analyze the images

6.2 技巧2：提供任务能更好地帮助分析图像

6.2.1 Case 1: Endoscopic operation analysis

6.2.1 案例1: 内窥镜手术分析

CASE 1

案例1

CASE 1 Control group

案例1 对照组

Prompt: Please tell me what the silver surgical instrument in the picture is and its purpose.

提示：请告诉我图片中的银色手术器械是什么及其用途。

GPT-4V Response:

GPT-4V 响应：

• response 1: The image shows a certain surgery being performed under arth ros copy. From the image, surgical instruments can be seen, one of which looks like a clamp or manipulator. Regarding the specific type of surgical instrument, I cannot be completely sure, as this depends on the specific purpose and context of the surgery. Art hr osco pic surgeries are generally used for diagnosing and treating joint problems, especially in the knee, shoulder, and other major joints. The purpose of this instrument may include manipulating tissues, excising damaged tissues, clamping certain objects, etc., depending on the purpose of the surgery.

• 回应1: 图像显示在关节镜 (arthroscopy) 下进行的某种手术。从图像中可以看到手术器械，其中一件看起来像钳子或操作器。关于具体的手术器械类型，我无法完全确定，因为这取决于手术的具体目的和背景。关节镜手术通常用于诊断和治疗关节问题，尤其是膝关节、肩关节等主要关节。该器械的用途可能包括操作组织、切除受损组织、夹持特定物体等，具体取决于手术目的。

response 2: The image shows a scene from an endoscopic surgery. The surgical instrument looks like a ring-shaped grasper or clamp. This type of instrument is typically used to grasp, excise, or manipulate tissue in art hr osco pic or other endoscopic surgeries. The red area in the picture might be tissue inside the joint cavity, and the white structures might be bone or cartilage, but this requires more detailed background information to determine. The specific purpose of this instrument depends on the type and purpose of the surgery, such as in knee arth ros copy, it might be used to excise damaged cartilage or repair injured ligaments.

回复2：图像显示的是内窥镜手术场景。手术器械呈环形抓钳或夹钳状，这类器械通常用于关节镜(arthroscopic)或其他内窥镜手术中抓取、切除或操作组织。图中红色区域可能是关节腔内组织，白色结构可能是骨骼或软骨，但需更详细的背景信息才能确认。该器械的具体用途取决于手术类型和目的，例如在膝关节镜手术中，可能用于切除损伤软骨或修复受损韧带。

response 3: The image shows an instrument used in an art hr osco pic surgery. Art hr osco pic surgery is a method that enters the joint cavity through small incisions and uses an art hro scope for diagnosis and treatment. The instrument in the picture might be used to clamp, cut, or refine tissues inside the joint. Specifically, the middle ring-shaped structure might be a protective sleeve, used to protect the soft tissues inside the joint from damage. The clamping instruments on both sides might be used to manipulate tissues inside the joint or perform surgical operations.

回复3：图中展示的是用于关节镜手术的一种器械。关节镜手术是通过小切口进入关节腔，利用关节镜进行诊断和治疗的方法。图中的器械可能用于钳夹、切割或修整关节内部组织。具体而言，中间的环形结构可能是保护套，用于保护关节内部软组织免受损伤。两侧的钳夹器械可能用于操作关节内部组织或进行手术操作。

CASE 1

案例 1

Providing the tasks

提供任务

Prompt: This image is from a hy ster osco pic myomectomy surgery. Please tell me what the silver surgical instrument in the picture is and its purpose.

提示：这张图片来自宫腔镜子宫肌瘤切除术。请告诉我图中银色手术器械的名称及其用途。

GPT-4V Response:

GPT-4V 响应:

response 1: The image appears to be from a hy ster osco pic procedure, specifically a hy ster osco pic myomectomy, as you mentioned. The ring-shaped surgical instrument visible in the image is known as a "loop electrode" or "re sec to scope loop." The purpose of this loop electrode in a hy ster osco pic myomectomy is to resect or cut away fibroids (myomas) from the inner lining of the uterus. The loop electrode is electrified, and when it comes in contact with the fibroid tissue, it can cut and coagulate (stop bleeding) simultaneously. This allows the surgeon to carefully and selectively remove the fibroid tissue without damaging the surrounding healthy tissue of the uterus. response 2: The image appears to be from a hy ster osco pic procedure, specifically a myomectomy, which involves the removal of fibroids from the uterus. The ring-shaped surgical instrument you’re referring to is likely a "loop electrode" or "re sec to scope loop". This instrument is used in hy ster osco pic surgeries to cut tissue and/or coagulate bleeding vessels. When electrical current is passed through the loop, it can effectively cut tissue or coagulate blood vessels to control bleeding. In the context of a hy ster osco pic myomectomy, the loop electrode is used to shave off or resect pieces of the fibroid until it has been completely removed or reduced in size. response 3: The image appears to show a surgical procedure inside a uterus using hysteroscopy. The ring-shaped instrument you’re referring to is likely a myoma loop or a re sec to scope loop. Its primary purpose is to resect (or cut away) tissue, such as fibroids (myomas) from the inner wall of the uterus. During a hy ster osco pic myomectomy, the surgeon uses this loop to shave off the fibroid tissue in small portions until the fibroid is entirely removed. The loop typically carries electrical current, which helps in cutting and coagulating the tissue, minimizing bleeding during the procedure.

回复1: 该图像似乎来自宫腔镜手术，具体而言是宫腔镜子宫肌瘤切除术。图中可见的环形手术器械被称为"环形电极"或"切除镜环"。在宫腔镜子宫肌瘤切除术中，该环形电极的作用是切除或切割子宫内壁的肌瘤组织。环形电极通电后，当接触肌瘤组织时可同时实现切割和凝血(止血)功能，使外科医生能精准选择性地切除肌瘤组织而不损伤子宫周围健康组织。

回复2: 图像显示的是宫腔镜手术场景，具体为子宫肌瘤切除术。您所指的环形手术器械应为"环形电极"或"切除镜环"。该器械在宫腔镜手术中用于切割组织和/或凝固出血血管。当电流通过环形电极时，可有效切割组织或凝固血管止血。在宫腔镜子宫肌瘤切除术中，外科医生使用环形电极逐层削除或切除肌瘤组织，直至完全去除或缩小肌瘤体积。

回复3: 图像显示的是宫腔镜下子宫内手术操作。您提及的环形器械应为子宫肌瘤切割环或切除镜环，其主要功能是切除(或切割)子宫内壁的肌瘤等组织。在宫腔镜子宫肌瘤切除术中，外科医生使用该切割环分小块削除肌瘤组织，直至完全切除。切割环通常带有电流，既可辅助切割又能实现组织凝固，有效减少术中出血。

6.2.2 Case 2: Endoscopic series operation analysis

6.2.2 案例2: 内窥镜序列操作分析

CASE 2

案例 2

CASE 2 Control Group

案例2 控制组

Prompt: This is a sequence of pictures from an endoscopic surgery. Please analyze each step of the sequence.

提示：这是一组内窥镜手术的连续图像。请分析该序列中的每个步骤。

GPT-4V Response:

GPT-4V 响应:

CASE 2 Providing the tasks

案例 2: 任务分配

Prompt: This is a sequence of pictures showing the localization of polyps during a hy ster osco pic polyp ec to my procedure. Please analyze each step of the sequence.

提示：这是一组展示宫腔镜息肉切除术中息肉定位过程的图片序列。请分析该序列的每个步骤。

GPT-4V Response:

GPT-4V 响应:

CASE 3

案例 3

Ablation study (No input image, text only)

消融研究（无输入图像，仅文本）

Prompt: Assume there is an image from a hy ster osco pic myomectomy surgery. Please tell me what the silver surgical instrument in the picture is and its purpose.

提示：假设有一张来自宫腔镜子宫肌瘤切除术的图像。请告诉我图中银色手术器械的名称及其用途。

GPT-4V Response:

GPT-4V 响应：

• response 1:

response 1:

In a hy ster osco pic myomectomy surgery image, the silver surgical instrument you’re most likely to see is a hy st eros cope. A hy st eros cope is a thin, lighted telescope-like device that is inserted into the uterus through the cervix. It allows the surgeon to see the inside of the uterus and is used to diagnose and treat causes of abnormal bleeding. During a myomectomy, the hy st eros cope enables the surgeon to locate and remove fibroids or polyps from within the uterine cavity. It’s a key instrument in such procedures, providing a minimally invasive option for surgery.

在宫腔镜子宫肌瘤切除术的手术图像中，你最可能看到的银色手术器械是宫腔镜。宫腔镜是一种细长、带光源的望远镜状设备，通过宫颈插入子宫。它使外科医生能够观察子宫内部，用于诊断和治疗异常出血的原因。在子宫肌瘤切除术中，宫腔镜帮助外科医生定位并切除子宫腔内的肌瘤或息肉。这是此类手术中的关键器械，为手术提供了微创选择。

• response 2:

In a hy ster osco pic myomectomy surgery image, the silver surgical instrument you are referring to is most likely a hy st eros cope. A hy st eros cope is a thin, lighted telescope-like device that is inserted through the vagina and cervix into the uterus. It allows the surgeon to view the inside of the uterus and operate with minimal invasion. During a myomectomy, the hy st eros cope is used to locate and remove fibroids or polyps from the uterine wall.

在一张宫腔镜子宫肌瘤切除手术图像中，您所指的银色手术器械很可能是宫腔镜。宫腔镜是一种细长、带光源的望远镜状设备，通过阴道和宫颈插入子宫。它使外科医生能够观察子宫内部并进行微创手术。在子宫肌瘤切除术中，宫腔镜用于定位并切除子宫壁上的肌瘤或息肉。

• response 3:

• 响应3:

In a hy ster osco pic myomectomy surgery, the most likely silver surgical instrument you might see in an image is a hy st eros cope. A hy st eros cope is a thin, lighted telescope-like device that is inserted into the uterus through the cervix. It allows the surgeon to view the inside of the uterus and operate with precision. This instrument is crucial for hy ster osco pic myomectomy, a procedure used to remove fibroids (non cancerous growths) from the uterus. The hy st eros cope is used both to visualize the fibroids and to guide other instruments that are used to excise or destroy the fibroids.

在宫腔镜子宫肌瘤切除术中，图像中最可能出现的银色手术器械是宫腔镜。宫腔镜是一种细长、带光源的望远镜状设备，通过宫颈置入子宫。它使外科医生能够观察子宫内部并进行精确操作。该器械对于宫腔镜子宫肌瘤切除术至关重要，该手术用于切除子宫内的肌瘤(非癌性增生组织)。宫腔镜既用于观察肌瘤，也用于引导其他切除或破坏肌瘤的手术器械。

CASE 4

案例 4

Ablation study (No input image, text only)

消融研究 (无输入图像，仅文本)

Prompt: Assume there is an image from a myomectomy surgery. Please tell me what the ringshaped surgical instrument in the picture is and its purpose.

提示：假设有一张子宫肌瘤切除术的图像。请告诉我图中环形手术器械的名称及其用途。

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应1:

In a hypothetical scenario where there’s an image from a hy ster osco pic myomectomy, a ring-shaped surgical instrument could likely be a "loop re sec to scope." The loop resectoscope is equipped with an electric wire loop. Its purpose is to cut and coagulate tissue during the procedure.

在一个假设场景中，若出现来自宫腔镜子宫肌瘤切除术的图像，环形手术器械很可能是"环形电切镜 (loop resectoscope)"。该器械配有电切环，用于在手术过程中切割和凝固组织。

During a hy ster osco pic myomectomy, this instrument is specifically used to shave off the fibroid tissue from the uterus. The electric loop cuts the fibroid, and the electric current ensures that there’s minimal bleeding by coagulating the blood vessels as it cuts.

在宫腔镜子宫肌瘤切除术中，该器械专门用于削除子宫内的肌瘤组织。电切环切除肌瘤的同时，电流通过凝固血管确保术中出血量最小。

• response 2:

• 响应 2:

A ring-shaped instrument in the context of hy ster osco pic myomectomy could possibly be a tenaculum or a loop electrode. The loop electrode is often part of a re sec to scope, which is commonly used in hy ster osco pic myomectomy to remove fibroids from the uterine wall. The loop can be used to cut and remove the fibroid tissue with the use of electric current, while the hy st eros cope allows the surgeon to see inside the uterus during the procedure.

在宫腔镜子宫肌瘤切除术中，环形器械可能是子宫钳或环形电极。环形电极通常是宫腔电切镜的组成部分，常用于宫腔镜子宫肌瘤切除术中从子宫壁切除肌瘤。该环形器械可通过电流切割并移除肌瘤组织，同时宫腔镜能让外科医生在手术过程中观察子宫内部。

• response 3:

In the context of a hy ster osco pic myomectomy, a ring-shaped surgical instrument is typically referred to as a "loop re sec to scope." The loop re sec to scope is an instrument used in conjunction with a hy st eros cope.

在宫腔镜子宫肌瘤切除术中，环形手术器械通常称为"环形电切镜 (loop resectoscope)"。环形电切镜是与宫腔镜配合使用的器械。

Purpose:

目的：

The loop re sec to scope has an electric wire loop at its end. It is used to cut and coagulate tissue during the procedure. During a hy ster osco pic myomectomy, the loop re sec to scope is utilized to shave off or remove fibroid tissue from the uterus. The electric current that passes through the wire loop allows for both cutting and coagulation, making the removal of fibroids efficient and minimizing bleeding.

环状电切镜末端带有金属线圈，用于手术中切割和凝固组织。在宫腔镜子宫肌瘤切除术中，环状电切镜用于削除或切除子宫内的肌瘤组织。通过金属线圈的电流既能实现切割又能止血，使得肌瘤切除高效且出血量最小化。

6.3 Tip 3: Implementing step-by-step guidance in multi-round dialogue allows GPT-4v to handle complex tasks more efficiently by breaking them down into simpler operations.

6.3 技巧3: 在多轮对话中实施分步引导，通过将复杂任务拆解为简单操作，使GPT-4v更高效地处理复杂任务。

6.3.1 Case 1: Endoscopic mask-polyp matching challenge

6.3.1 案例1: 内窥镜掩模-息肉匹配挑战

CASE 1 Control group

案例1 对照组

Prompt: Please tell me which mask at the top corresponds to the polyp below.

提示：请告诉我顶部的哪个遮罩对应下面的息肉。

GPT-4V Response:

GPT-4V 响应:

CASE 1

案例 1

Step-by-step guidance (single round)

分步指导 (单轮)

GPT-4V Response:

GPT-4V 响应:

CASE 1

案例 1

Step-by-step guidance (multi-round)

分步指导（多轮）

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应1:

– Answering prompt 1: To identify the location of the polyp in Picture 4 based on the mask template, let’s divide each image into nine equal parts (3x3 grid) and see where the corresponding pattern from the template appears in Picture 4.

回答提示1：根据掩膜模板识别图4中息肉的位置，我们将图像均分为九等份（3x3网格），观察模板对应的模式在图4中的出现区域。

Upon examining the images:

检查图像时：

Pictures 1-3 serve as a mask template for polyps. Picture 4 is an image from a colon os copy. Based on the mask template provided in Pictures 1-3 and the colon os copy image in Picture 4, the polyp in Picture 4 would be located in the bottom center block (or the 8th block if numbering from top left to bottom right in a 3x3 grid).

图片1-3作为息肉的掩模模板。图片4是结肠镜图像。根据图片1-3提供的掩模模板和图片4的结肠镜图像，图片4中的息肉应位于底部中心区块(若按3x3网格从左上方至右下方编号则为第8区块)。

• response 2:

• 响应 2:

• response 3:

response 3:

– Answering prompt 1: The polyp in Picture 4 is located in the center block. – Answering prompt 2: For Pictures 1, 2, and 3: Picture 1: The polyp is located in the center block. Picture 2: The polyp is located in the top-center block. Picture 3: The polyp is located in the top-right block. respectively. – Answering prompt 3: The mask in Picture 1 would correspond to Picture 4.

回答提示1：图4中的息肉位于中央区块。
回答提示2：对于图1、图2和图3：
图1：息肉位于中央区块。
图2：息肉位于中上区块。
图3：息肉位于右上区块。
回答提示3：图1中的掩膜对应图4。

6.3.2 Case 2: CT mask-organs matching challenge

6.3.2 案例2: CT器官掩膜匹配挑战

CASE 2 Control Group

案例2 对照组

Prompt: Please tell me which mask at the top corresponds to the liver below, consider the red part only.

提示：请告诉我上方哪个遮罩对应下方的肝脏，仅考虑红色部分。

GPT-4V Response:

GPT-4V 响应：

• response 1:

• 响应1:

Upon examining the CT images provided, it is evident that the red mask in Image 1 corresponds to the liver’s anatomical location. This mask covers the right upper quadrant of the abdomen, which aligns with the liver’s predominant placement in the human body. The red part follows the expected shape and region of the liver, extending slightly towards the left upper quadrant, which is consistent with the liver’s anatomy. The other masks either cover additional areas not occupied by the liver (as seen in Image 2) or highlight regions to the left (as seen in Image 3), which do not correspond to the primary location of the liver. Therefore, the correct liver mask is the one shown in Image 1.

在检查提供的CT图像时，可以明显看出图像1中的红色掩膜对应于肝脏的解剖位置。该掩膜覆盖了腹部的右上象限，这与人体中肝脏的主要位置一致。红色部分遵循了肝脏预期的形状和区域，略微向左上象限延伸，这与肝脏的解剖结构相符。其他掩膜要么覆盖了肝脏未占据的额外区域（如图像2所示），要么突出了左侧的区域（如图像3所示），这些都不对应于肝脏的主要位置。因此，正确的肝脏掩膜是图像1中显示的那个。

• response 2:

• 响应2:

Based on the anatomy visible in the CT images, the red part of the mask that corresponds to the liver is found in Image 1 at the top. The liver is situated predominantly in the right upper quadrant of the abdominal cavity, and the red mask in Image 1 accurately overlays this area, conforming to the expected anatomical location and shape of the liver. The other images either highlight areas that extend beyond the typical boundaries of the liver or, in the case of Image 3, cover a section on the left side where the liver is not present. Image 4 serves as a reference as it does not have any highlighted area, confirming that the highlighted region in Image 1 is indeed representative of the liver’s location.

根据CT影像中可见的解剖结构，与肝脏对应的红色掩膜部分位于顶部的图1中。肝脏主要位于腹腔右上部，图1中的红色掩膜精确覆盖该区域，符合肝脏预期的解剖位置和形状。其他图像要么突出显示超出肝脏典型边界的区域，要么如图3所示覆盖了左侧不存在肝脏的部分。图4作为参考未标注任何高亮区域，进一步证实图1中的高亮区域确实代表了肝脏的位置。

• response 3: Upon examining the provided images, it is clear that the mask in Image 1 corresponds to the liver in the images below, considering only the red part. This conclusion is reached by observing the anatomical position and shape of the red highlighted area, which closely matches the typical location and size of the liver in the right upper quadrant of the abdomen on a transverse CT scan. The liver is the largest organ in the abdominal cavity and has a characteristic shape that is well represented by the mask in Image 1, which covers the right lobe of the liver and extends slightly to the left, consistent with the liver’s anatomical position. Images 2 and 3 highlight areas that either extend beyond the typical boundaries of the liver or are situated in an atypical location for liver tissue. Image 4 does not have any highlighted areas to compare.

• 响应3: 通过检查所提供的图像，可以明确图像1中的掩膜（仅考虑红色部分）对应于下方图像中的肝脏。这一结论是通过观察红色高亮区域的解剖位置和形状得出的，其与横断面CT扫描中腹部右上象限肝脏的典型位置和大小高度吻合。作为腹腔内最大的器官，肝脏具有特征性形态，而图像1中的掩膜恰当地覆盖了肝右叶并向左略微延伸，这与肝脏的解剖位置一致。图像2和3所标注的区域要么超出了肝脏的常规边界，要么位于肝脏组织的非典型位置。图像4则没有任何可供对比的高亮区域。

CASE 2

案例 2

Step-by-step guidance (single round)

分步指南（单轮）

GPT-4V Response:

GPT-4V 响应:

response 2: In the provided CT images, the liver is represented by the red-shaded area in pictures 1, 2, and 3. The liver is located in the upper right portion of the abdominal cavity, just below the diaphragm and occupies a large area of the upper quadrant of the abdomen.

回复2：在提供的CT图像中，肝脏在图1、图2和图3中以红色阴影区域表示。肝脏位于腹腔右上部，紧贴膈肌下方，占据了腹部右上象限的大部分区域。

Picture 1: In picture 1, the liver is the large red area taking up most of the right side of the image (the patient’s left side, as medical images are typically viewed from the perspective of facing the patient). – Picture 2: In picture 2, again, the liver is the large red area, which is on the right side of the image Picture 3: In picture 3, the liver is shown with two separate red areas, indicating the larger right lobe of the liver and the smaller left lobe.

图 1: 在图1中，肝脏是占据图像右侧大部分区域的红色大范围区域 (患者左侧，因为医学图像通常以面对患者的视角呈现)。
图 2: 在图2中，肝脏同样是图像右侧的红色大范围区域。
图 3: 在图3中，肝脏显示为两个独立的红色区域，分别代表较大的右叶和较小的左叶。

Based on the position and shape of the liver in pictures 1, 2, and 3, the mask that would correspond to picture 4 would likely be similar to the ones in pictures 1 and 2, as these show a complete liver, which is how it would normally appear in an un segmented CT image.

根据图片1、2和3中肝脏的位置和形状，图片4对应的掩膜可能与图片1和2中的类似，因为这些图片显示了完整的肝脏，这也是未分割CT图像中肝脏的正常外观。

response 3: In the provided images, the red-shaded areas are indicative of the liver’s location in the axial (horizontal) cross-section of the abdominal CT scans. Here’s how the liver is positioned in each of the images:

回复3: 在提供的图像中，红色阴影区域表示腹部CT扫描轴向(水平)切面中肝脏的位置。以下是各图像中肝脏的定位情况:

For picture 4, which is an unmarked CT scan, the liver would typically be located in the same general area as in pictures 1, 2, and 3, meaning it should be in the upper right quadrant of the abdomen, extending across the midline towards the left upper quadrant beneath the diaphragm. Based on the location in the marked images, the corresponding mask for picture 4 would likely be most similar to the mask in picture 1 or 2, since these show a large portion of the liver in its typical position.

在图4这张未标记的CT扫描图像中，肝脏通常位于与图1、2和3相同的常规区域，即应处于腹部右上象限，横跨中线向膈肌下方的左上象限延伸。根据标记图像中的位置判断，图4对应的掩膜很可能与图1或图2中的掩膜最为相似，因为这两张图显示了处于典型位置的大部分肝脏组织。

CASE 2

案例 2

Step-by-step guidance (multi-round)

分步指导（多轮）

Prompt 1: Picture 123 is the mask template for the liver(the red part), and Picture 4 is an image of an abdominal CT section, please tell me the location and size of the liver in picture 4.(answer in 50 words)

提示1：图123是肝脏的掩模模板（红色部分），图4是一张腹部CT切片的图像，请告诉我图4中肝脏的位置和大小。（回答限50字）

Prompt 2:Please tell me the location and size of the livers in picture 1 2 3 (answer in 50 words) Prompt 3:Which mask would correspond to Picture 4 (answer in 50 words)

提示2:请告诉我图片1、2、3中肝脏的位置和大小 (回答限50字)
提示3:哪张遮罩(mask)对应图片4 (回答限50字)

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应1:

– Answering prompt 1: In Picture 4, the liver is located in the upper right quadrant of the abdominal cavity, extending across the midline to the upper left quadrant. It is the large, dark gray area beneath the diaphragm. The size cannot be accurately determined without specific measurements or scale, but it occupies a significant portion of the upper abdominal area.

回答提示1: 在图4中，肝脏位于腹腔右上象限，延伸跨过中线至左上象限。它是横膈膜下方的大片深灰色区域。由于缺乏具体测量数据或比例尺，无法准确判断其大小，但可见其占据了上腹部相当大区域。

– Answering prompt 2: In Pictures 1, 2, and 3, the liver (highlighted in red) is primarily located in the right upper quadrant of the abdominal cavity. The size appears large, spanning across to the left upper quadrant, below the diaphragm. Without a scale, precise size cannot be determined, but it occupies a substantial portion of the abdominal cavity in each image.

回答提示2：在图1、图2和图3中，肝脏(用红色高亮显示)主要位于腹腔的右上象限。其体积显得较大，横跨至左上象限，位于横膈膜下方。由于缺乏比例尺，无法确定精确尺寸，但在每张图像中它都占据了腹腔的相当大部分。

– Answering prompt 3: Picture 1’s mask seems to cover the largest area, extending quite far to the left, which might be too large for Picture 4. Picture 2’s mask has a more confined area on the right side, which may not cover all of the liver in Picture 4. Picture 3’s mask is in between the two in terms of coverage, possibly being a closer match to the liver in Picture 4. Based on this visual assessment, Picture 3’s mask might correspond most closely to the liver in Picture 4.

回答提示3：图1的遮罩覆盖区域似乎最大，向左延伸较远，可能对图4来说过大。图2的遮罩在右侧区域较为局限，可能无法覆盖图4的全部肝脏。图3的遮罩在覆盖范围上介于两者之间，可能更接近图4的肝脏。基于此视觉评估，图3的遮罩可能与图4的肝脏匹配度最高。

• response 2:

• 响应 2:

– Answering prompt 1: The liver in Picture 4, based on the mask template provided in Pictures 1-3, is not delineated with a red mask. However, typically, the liver is located in the upper right quadrant of the abdomen in a CT image, extending from just under the diaphragm down towards the right kidney. It is not possible to accurately determine the size of the liver from Picture 4 without the mask, but the liver is usually the largest solid organ in the abdominal cavity

回答提示1：图4中的肝脏，基于图1-3提供的掩膜模板，并未用红色掩膜勾勒出来。然而通常情况下，在CT图像中肝脏位于腹部右上方，从横膈膜下方延伸至右肾附近。由于缺乏掩膜，无法从图4中准确判断肝脏大小，但肝脏通常是腹腔内最大的实体器官

Answering prompt 2: In pictures 1, 2, and 3, the liver (highlighted in red) occupies the upper right quadrant of the abdomen. It extends across the midline to the left upper quadrant. The size appears substantial, filling much of the space below the diaphragm and above the stomach and intestines.

回答提示2：在图1、图2和图3中，肝脏（红色高亮部分）位于腹部右上象限，并向左上象限延伸跨过中线。其体积较大，占据了膈肌下方与胃、肠上方的大部分空间。

– Answering prompt 3: Based on typical anatomical position and the visible structures in Picture 4, the liver mask that would correspond most closely appears to be the mask in Picture 1, considering the liver’s location and the visible vas cula ture branching within the liver tissue.

回答提示3：根据标准解剖位置和图4中可见结构，考虑到肝脏的位置及肝组织内可见的血管分支，与之最匹配的肝脏掩膜应为图1中的掩膜。

• response 3:

response 3:

Answering prompt 1: In Picture 4, the liver is located in the upper right quadrant of the abdomen, just below the diaphragm. It spans from the right side across the midline towards the left, extending slightly past the stomach. The size of the liver appears to be within normal limits, occupying a significant portion of the upper abdominal cavity.

回答提示1：在图4中，肝脏位于腹部右上象限，紧贴横膈膜下方。它从右侧跨越中线向左延伸，略微超过胃部。肝脏大小显示在正常范围内，占据了上腹腔的显著部分。

Answering prompt 2: In Pictures 1-3, the liver, indicated by the red mask, occupies the upper right quadrant of the abdomen, extending across the midline to the upper left quadrant. It varies in size across the images, with Picture 2 showing the largest coverage and Picture 3 showing the least.

回答提示2：在图1-3中，红色遮罩标示的肝脏位于腹部右上象限，并跨越中线延伸至左上象限。各图中肝脏大小不一，图2显示覆盖范围最大，图3显示覆盖范围最小。

– Answering prompt 3: Without medical imaging software to analyze the images in detail, it’s challenging to accurately determine which mask corresponds to Picture 4. However, based on visual approximation, Picture 1 appears to have a liver mask that most closely matches the visible liver contours in Picture 4.

回答提示3：在没有医学影像软件详细分析图像的情况下，很难准确判断哪张掩膜对应图4。但通过视觉近似对比，图1的肝脏掩膜与图4中可见的肝脏轮廓最为接近。

CASE 3 Control Group

案例3 控制组

Prompt: Please tell me which mask at the top corresponds to the liver below, consider the red part only.

提示：请告诉我上方的哪个遮罩对应下方的肝脏，仅考虑红色部分。

GPT-4V Response:

GPT-4V 响应：

• response 1:

response 1:

The images you’ve provided appear to be masks generated from an MRI scan, with the bottom image showing an actual MRI scan of the abdomen. To determine which mask at the top corresponds to the liver in the actual MRI scan, we need to match the shape and position of the liver in the MRI scan with the red masks above.

您提供的图像似乎是基于MRI扫描生成的掩膜，底部图像显示的是腹部的实际MRI扫描结果。要确定顶部哪个掩膜对应实际MRI扫描中的肝脏，我们需要将MRI扫描中肝脏的形状和位置与上方的红色掩膜进行匹配。

The liver is typically located in the upper right quadrant of the abdomen, just under the diaphragm. It has a characteristic shape that is usually wider at the bottom and tapers off near the top.

肝脏通常位于腹部的右上方，紧贴横膈膜下方。其形态特征为底部较宽，顶部逐渐变窄。

• response 2:

• 响应 2:

The images you’ve provided appear to be from a medical imaging study, where the top three images are masks representing different anatomical structures in the liver, and the bottom image seems to be a cross-sectional MRI scan of the abdomen, including the liver. To determine which mask corresponds to the liver on the MRI scan, one needs to consider the anatomical position, shape, and relative size of the liver in the MRI scan and match it to the provided masks. The liver is typically located in the right upper quadrant of the abdomen.

您提供的图像似乎来自一项医学影像研究，其中前三张图像是代表肝脏不同解剖结构的掩模，底部图像则似乎是腹部的横断面 MRI 扫描图，包含肝脏部分。要确定哪张掩模对应 MRI 扫描中的肝脏，需结合 MRI 中肝脏的解剖位置、形状和相对大小，与提供的掩模进行匹配。肝脏通常位于腹部的右上方象限。

Based on visual inspection, it seems that mask 2 at the top most closely corresponds to the shape and position of the liver in the MRI scan below. This mask appears to cover the area of the liver without including other adjacent organs or structures, which aligns with the typical anatomical location of the liver as shown in the MRI image.

通过目视检查，最上方的 mask 2 似乎与下方 MRI 扫描中肝脏的形状和位置最为吻合。该掩膜覆盖了肝脏区域而未包含其他相邻器官或结构，这与 MRI 图像显示的肝脏典型解剖位置一致。

• response 3: The images provided show a series of masks corresponding to segments in an MRI scan. To identify which mask corresponds to the liver in the MRI scan, one would look for the shape and location that matches the liver’s appearance and anatomical position in the lower image.

• 回应3: 提供的图像展示了一系列与MRI扫描中各分段对应的掩模。要识别哪一掩模对应MRI扫描中的肝脏，需观察其形状和位置是否与下方图像中肝脏的外观和解剖位置相匹配。

In the MRI scan, the liver is typically located in the upper right quadrant of the abdomen, just under the diaphragm. It appears as a large, dark mass on T2-weighted images like the one shown, due to its relatively homogeneous texture compared to other organs and tissues which may appear brighter due to higher water content.

在MRI扫描中，肝脏通常位于腹部右上象限，紧贴横膈膜下方。由于肝脏质地相对均匀，与其他因含水量较高而显得更亮的器官和组织相比，它在T2加权图像中呈现为一大片暗色区域，如图所示。

Comparing the shapes and positions in the masks to the liver in the MRI:

将掩模中的形状和位置与MRI中的肝脏进行比较：

Mask 1 shows a large area that extends across the midline, which is not characteristic of the liver’s location.

掩膜1显示了一个横跨中线的大面积区域，这与肝脏的位置特征不符。

Mask 2 shows a large area that is positioned to the right but extends too far to the left, which is not typical for the liver.

掩码2显示了一个位于右侧但向左延伸过大的区域，这并非肝脏的典型特征。

Mask 3 shows a smaller area positioned on the right side only, which could be a closer match to the typical location of the liver.

掩膜3显示仅右侧较小的区域，可能更接近肝脏的典型位置。

Given this information and without more specific details about the orientation and type of MRI sequence, Mask 3 seems to be the most likely candidate that corresponds to the liver, considering only the red part of the mask. However, such assessments are best confirmed by a medical professional with expertise in radiology.

鉴于这些信息，且在没有更多关于MRI序列方向和类型的具体细节的情况下，仅考虑掩膜的红色部分，Mask 3似乎是最可能对应肝脏的候选区域。然而，此类评估最好由具备放射学专业知识的医疗专家确认。

CASE 3

案例 3

Step-by-step guidance (single round)

逐步指导（单轮）

GPT-4V Response:

GPT-4V 响应:

response 1: In picture 4, the liver is the large, darker area that takes up the upper part of the abdomen on the right side (from the perspective of the image, which would be the patient’s left side since MRI images are typically mirrored). Regarding pictures 1, 2, and 3, each of them shows a red area that represents the liver’s location as a mask on a black background. Here’s how they would match:

图4中，肝脏是占据右侧腹部上方(从图像视角看是患者左侧，因为MRI图像通常为镜像)的大片深色区域。关于图1、2和3，每张图上的红色区域代表肝脏位置，是在黑色背景上叠加的掩膜。对应关系如下:

For picture 4, the mask that would correspond to the MRI image would be the one that shows the liver in the anatomical position that matches the dark area on the MRI. This would most likely be the mask from picture 1, as it covers the area where the liver is typically located and appears to be the correct size and shape to match the liver’s appearance in an axial MRI section.

对于图4，与MRI图像对应的掩模将是显示肝脏解剖位置的掩模，该位置与MRI上的暗区相匹配。这很可能是图1中的掩模，因为它覆盖了肝脏通常所在的区域，并且其大小和形状似乎与轴向MRI切片中肝脏的外观相符。

response 2: In the provided MRI image, the liver is located in the upper right quadrant of the abdomen. It appears as a prominent, dark area in the scan, situated just below the diaphragm and to the right of where the stomach would be, which is usually less distinct in an MRI. The masks in Pictures 1, 2, and 3 offer abstract representations of the liver’s location and shape within the human body, following a color-coded template.

在提供的MRI图像中，肝脏位于腹部右上方。扫描中显示为一个显著的暗区，紧贴横膈膜下方、胃部通常不太明显的右侧位置。图1、图2和图3中的掩膜采用颜色编码模板，抽象呈现了肝脏在人体内的位置和形状。

The liver’s relative position is consistent across the masks, although the shapes differ due to individual anatomical variations or the specific liver section shown.

肝脏的相对位置在各掩模中保持一致，尽管由于个体解剖差异或显示的特定肝脏切面导致形状有所不同。

Determining the most accurate mask for Picture 4 depends on the MRI’s orientation and section. The liver is the largest solid organ with a distinctive shape, so the correct mask would closely resemble the large, dark area on the right side of the MRI image. Picture 1 is likely the most accurate representation, with its extensive red area aligning with the liver’s visible section in the MRI.

确定图片4最准确的掩膜取决于MRI的方位和截面。肝脏是最大的实体器官，具有独特的形状，因此正确的掩膜应与MRI图像右侧的大片暗区高度吻合。图片1可能是最准确的呈现，其大面积的红色区域与MRI中可见的肝脏截面一致。

CASE 3

案例 3

Step-by-step guidance (multi-round)

分步指导（多轮）

Prompt 1: Picture 123 is the mask template for the liver(the red part), and Picture 4 is an image of an abdominal MRI section, please tell me the location of the liver in picture 4.(answer in 50 words) Prompt 2:Please tell me the location of the livers in picture 1 2 3 (answer in 50 words) Prompt 3:Which mask would correspond to Picture 4 (answer in 50 words)

提示1: 图123是肝脏的掩膜模板(红色部分)，图4是一张腹部MRI切面图像，请指出图4中肝脏的位置。(50字内回答)
提示2: 请指出图1、2、3中肝脏的位置。(50字内回答)
提示3: 哪个掩膜会与图4对应？(50字内回答)

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应1:

– Answering prompt 1: In picture 4, the liver is located in the upper right quadrant of the abdomen, occupying most of the space under the diaphragm. It appears as the large, dark area on the right side of the image (from the viewer’s perspective), extending towards the left side across the midline.

回答提示1：在图4中，肝脏位于腹部右上象限，占据膈肌下大部分空间。从观察者视角看，图像右侧的大片暗色区域即为肝脏，并向左侧延伸跨越中线。

– Answering prompt 2: In pictures 1, 2, and 3, the liver is represented by the redcolored mask. It is positioned on the right side of the body (left side from the viewer’s perspective) and extends toward the center, varying slightly in each image due to the perspective and section of the scan

回答提示2：在图1、图2和图3中，肝脏由红色掩膜表示。它位于身体右侧（从观察者视角看为左侧）并向中心延伸，由于扫描视角和切面的差异，每张图像中的位置略有变化

– Answering prompt 3: The mask that corresponds to Picture 4 would be Picture 1. This mask shows the liver covering a large area across the upper right quadrant, which matches the liver’s position and size in the MRI image of Picture 4, extending towards the central and left portions of the abdominal cavity.

回答提示3：与图4对应的掩码是图1。该掩码显示肝脏覆盖右上象限的大片区域，这与图4 MRI图像中肝脏的位置和大小相匹配，并向腹腔的中部和左侧延伸。

• response 2:

• 响应 2:

– Answering prompt 1: In Picture 4, the liver is located in the upper right quadrant of the abdominal cavity, directly under the diaphragm. In the MRI image, it is the large, dark area at the top, extending from the right side of the image towards the center, and is the brightest and most voluminous organ visible.

回答提示1：在图4中，肝脏位于腹腔右上象限，紧贴横膈膜下方。在MRI图像中，它是顶部的大片暗色区域，从图像右侧向中心延伸，是可见器官中最亮且体积最大的部分。

– Answering prompt 2: In pictures 1, 2, and 3, the liver is represented by the red area. It is the largest shape in each picture, typically located towards the upper part of the image, indicating its position in the upper right quadrant of the body’s abdominal cavity in a real-life orientation.

回答提示2：在图1、图2和图3中，红色区域代表肝脏。它是每张图片中最大的形状，通常位于图像的上部，表明其在真实人体方位中位于腹腔右上象限的位置。

– Answering prompt 3: The mask that corresponds to Picture 4 is the one in Picture 1. This mask closely matches the shape and position of the liver in the MRI image, with the liver’s broad, right-sided location in the abdominal cavity being accurately represented by the large red area in the mask.

回答提示3：与图4对应的掩膜是图1中的那个。该掩膜与MRI图像中肝脏的形状和位置高度吻合，肝脏在腹腔右侧的宽阔区域通过掩膜中的大面积红色区域得到了准确呈现。

• response 3:

response 3:

– Answering prompt 1: In picture 4, the liver is the large organ located in the upper right quadrant of the abdominal cavity. It is the dark grayish-brown structure in the MRI image, extending across the midline to the left upper quadrant but predominantly situated on the right side of the image.

回答提示1：在图4中，肝脏是位于腹腔右上象限的大型器官。它是MRI图像中呈深灰棕色的结构，横跨中线延伸至左上象限，但主要位于图像的右侧。

– Answering prompt 2: In pictures 1, 2, and 3, the liver is represented by the reddishbrown area. In all images, it is located toward the top portion of the mask template, with the position slightly varying in each: centrally to the left in picture 1, more to the left in picture 2, and predominantly on the left in picture 3.

回答提示2：在图1、图2和图3中，肝脏区域呈现红棕色。所有图像中，肝脏均位于掩模模板的上部，具体位置略有差异：图1中偏左居中，图2中更靠左侧，图3中主要分布在左半侧。

– Answering prompt 3: The mask that corresponds to Picture 4 is the one in Picture 1. This mask shows a liver representation on the right side of the body (left side of the image from our perspective), which matches the typical location of the liver as seen in the MRI image in Picture 4.

回答提示3：与图4对应的掩码是图1中的那个。该掩码在身体右侧（从我们的视角看是图像的左侧）显示了肝脏的表示，这与图4中MRI图像所见的肝脏典型位置相符。

6.4 Tip 4: Don’t expose your target at the very beginning when you start multi-round dialog

6.4 技巧4：开启多轮对话时不要过早暴露目标

6.4.1 Case 1: Endoscopic mask-polyp matching challenge

6.4.1 案例1: 内窥镜掩模-息肉匹配挑战

CASE 1

案例 1

Target exposed (marked by orange)

目标暴露 (用橙色标记)

GPT-4V Response:

GPT-4V 响应:

• response 1:

响应1:

• response 2:

• 响应 2:

pears to be the mask corresponding to the polyp’s location in Picture 4.

图 4 中似乎是息肉位置对应的掩膜。

• response 3:

response 3:

– Answering prompt 1: Based on the provided images, Picture 2 is the mask template that matches the polyp location in Picture 4. The polyp in Picture 4 is located in the bottom middle block when divided into nine equal parts.

回答提示1：根据提供的图像，图片2是与图片4中息肉位置相匹配的掩膜模板。当将图片4划分为九等份时，息肉位于中下区块。

– Answering prompt 2: When dividing each image into nine equal parts (3x3 grid): Picture 1: The polyp mask is in the bottom middle block. Picture 2: The polyp mask is in the top middle block. Picture 3: The polyp mask is in the top right block.

回答提示2：当将每张图像划分为九个等份（3x3网格）时：
图片1：息肉掩膜位于底部中间区块。
图片2：息肉掩膜位于顶部中间区块。
图片3：息肉掩膜位于右上区块。

– Answering prompt 3: Based on the provided images, Picture 1 is the mask template that corresponds to the polyp location in Picture 4.

回答提示3：根据提供的图像，图1是与图4中息肉位置对应的掩模模板。

6.4.2 Case 2: CT mask-organs matching challenge

6.4.2 案例2: CT器官掩模匹配挑战

CASE 2

案例 2

Target exposed (marked by orange)

目标暴露（用橙色标记）

Prompt 1: Following the procedure to find which Picture in 1-3 is the mask of 4. Picture 123 is the mask template for livers, and Picture 4 is an image of a CT section, please tell me the location and size of the liver in Picture 4 (answer in 50 words) Prompt 2: Please tell me the location and size of the livers in pictures 1 2 3. (answer in 50 words) Prompt 3:Which mask would correspond to the Picture 4 (answer in 50 words)

提示1: 按照流程找出1-3图中哪张是图4的遮罩。图123是肝脏的遮罩模板，图4是一张CT切片图像，请告诉我图4中肝脏的位置和大小 (50字内回答)

提示2: 请告诉我图1、2、3中肝脏的位置和大小 (50字内回答)

提示3: 哪张遮罩对应图4 (50字内回答)

GPT-4V Response:

GPT-4V 响应:

• response 1:

response 1:

– Answering prompt 1: The liver in Picture 4 is likely to correspond with the mask shown in Picture 1. The liver is positioned predominantly on the right side of the abdomen, extends across the midline towards the left, and occupies a large space in the upper abdomen, as indicated by the shape and location of the red area in Picture 1.

回答提示1：图4中的肝脏很可能与图1所示的遮罩区域对应。该肝脏主要位于腹部右侧，向中线左侧延伸，并在上腹部占据较大空间，如图1中红色区域的形状和位置所示。

– Answering prompt 2: In pictures 1-3, the red areas representing the liver are located in the upper abdomen, with the largest coverage in picture 3, suggesting a lower axial slice. Picture 2 shows a smaller section, likely higher near the liver’s dome, and picture 1 indicates a mid-level section with substantial bilateral extension.

回答提示2：在图1-3中，代表肝脏的红色区域位于上腹部，其中图3覆盖范围最大，提示为较低轴向切面。图2显示较小截面，可能靠近肝脏穹窿较高位置，而图1显示具有显著双侧延伸的中等水平切面。

– Answering prompt 3: Based on the typical anatomical position of the liver in axial CT images and without the ability to overlay the images directly, Picture 1 appears to be the corresponding mask for Picture 4 due to the large size and the right-sided dominance of the red area, which matches the usual liver location.

回答提示3：根据轴向CT图像中肝脏的典型解剖位置，且无法直接叠加图像的情况下，由于红色区域尺寸较大且呈现右侧优势分布，与肝脏常规位置相符，因此图1似乎是图4对应的掩膜。

• response 2:

response 2:

– Answering prompt 1: The liver in Picture 4 appears to be most closely matching the mask in Picture 1, suggesting it occupies the upper right quadrant of the abdomen. It is not possible to provide an accurate size measurement without specific tools; however, visually, it takes up a significant portion of the upper abdominal area.

回答提示1：图4中的肝脏似乎与图1中的掩膜最为匹配，表明其位于腹部的右上方象限。由于缺乏具体工具，无法提供精确的尺寸测量；但从视觉上看，它占据了上腹部区域的很大一部分。

– Answering prompt 2: In Pictures 1, 2, and 3, the liver (red part) is predominantly located in the right upper quadrant of the abdomen, extending slightly to the left upper quadrant. The size varies in each, covering a large portion of the upper abdomen, with Picture 2 showing the largest coverage.

回答提示2：在图1、图2和图3中，肝脏(红色部分)主要位于腹部右上方，并略微延伸至左上方。各图中肝脏大小不一，覆盖了上腹部的大部分区域，其中图2显示的覆盖范围最大。

– Answering prompt 3: The mask in Picture 1 appears to correspond to Picture 4, as the shape and position of the red area most closely match the anatomical location and size of the liver as visualized on the CT image in Picture 4.

回答提示3：图1中的掩膜似乎对应图4，因为红色区域的形状和位置最接近图4 CT图像中显示的肝脏解剖位置和大小。

• response 3:

response 3:

Answering prompt 1: The liver in Picture 4 is located in the upper right quadrant of the abdominal cavity, extending across the midline towards the left side. It appears to cover a large portion of the upper abdomen, under the diaphragm, similar to the mask shown in Picture 2. The liver’s size looks to be consistent with the standard anatomical proportions in comparison to the rest of the organs present in the CT image.

回答提示1：图4中的肝脏位于腹腔右上象限，跨越中线向左延伸。其似乎覆盖了膈肌下方上腹部的大部分区域，与图2所示遮罩类似。相较于CT图像中其他器官，该肝脏的大小符合标准解剖比例。

– Answering prompt 2: In Pictures 1, 2, and 3, the liver is highlighted by a red mask. In Picture 1, the liver is large, extending across the right upper quadrant to the midline. Picture 2 shows a large liver, dominating the right side. Picture 3’s liver is also large, crossing the midline slightly

回答提示2：在图1、图2和图3中，肝脏被红色遮罩高亮显示。图1中的肝脏较大，从右上象限延伸至中线。图2显示肝脏较大，占据右侧主要区域。图3的肝脏也较大，略微越过中线

– Answering prompt 3: The mask that corresponds to Picture 4 appears to be the one in Picture 2. This is because the highlighted area in Picture 2 closely matches the size and shape of the liver as seen in Picture 4, covering the right upper quadrant and extending towards the midline.

回答提示3：与图4对应的掩膜似乎是图2中的那个。这是因为图2中高亮区域的尺寸和形状与图4中肝脏的尺寸和形状非常吻合，覆盖了右上腹并向中线延伸。

CASE 3

案例 3

Target exposed (marked by orange)

目标暴露（以橙色标记）

Prompt 1: Following the procedure to find which Picture in 1-3 is the mask of 4. Picture 123 is the mask template for livers, and Picture 4 is an image of an MRI section, please tell me the location and size of the liver in Picture 4 (answer in 50 words) Prompt 2: Please tell me the location and size of the livers in pictures 1 2 3. (answer in 50 words) Prompt 3:Which mask would correspond to the Picture 4 (answer in 50 words)

提示1: 按照流程找出1-3图中哪张是图4的掩膜。图片123是肝脏的掩膜模板，图4是一张MRI切片图像，请告诉我图4中肝脏的位置和大小 (50字内回答)
提示2: 请告诉我图片1、2、3中肝脏的位置和大小 (50字内回答)
提示3: 哪张掩膜对应图4 (50字内回答)

GPT-4V Response:

GPT-4V 响应:

• response 1:

response 1:

– Answering prompt 1: In Picture 4, the liver is the large organ located in the upper right quadrant of the abdomen, spanning almost the entire width of the image. It appears to be the largest organ visible. The mask corresponding to the liver in Pictures 1-3 is likely to be Picture 2, due to its size and shape which closely matches the liver’s position and form in Picture 4.

回答提示1：在图4中，肝脏是位于腹部右上象限的大型器官，几乎横跨整个图像宽度，显然是图中可见的最大器官。由于大小和形状与图4中肝脏的位置形态高度吻合，图1-3中对应的肝脏掩膜很可能是图2。

– Answering prompt 2: In pictures 1, 2, and 3, the liver representations are abstracted as shaded areas against a black background. Picture 1 shows a large liver mask on the left side, Picture 2 shows a slightly smaller mask towards the center, and Picture 3 displays the smallest mask on the right side.

回答提示2：在图1、图2和图3中，肝脏以黑色背景上的阴影区域抽象呈现。图1左侧显示较大的肝脏遮罩，图2中央显示稍小的遮罩，图3右侧显示最小的遮罩。

Answering prompt 3: The mask that corresponds to Picture 4 is the one in Picture 2. It most accurately represents the size and shape of the liver in the MRI image, with its larger, more central position compared to the others, and with the characteristic lobe structure on the right side.

回答提示3：与图4对应的掩码是图2中的那个。它最准确地代表了MRI图像中肝脏的大小和形状，相比其他掩码位置更居中且体积更大，右侧还具有特征性的肝叶结构。

• response 2:

response 2:

– Answering prompt 1: The liver in Picture 4 is centrally located, extending across the midline of the body and predominantly on the right side. It is the large, dark area on the MRI, taking up a significant portion of the upper abdomen, corresponding most closely to the mask in Picture 2, indicating a typical size and location for a liver.

回答提示1: 图4中的肝脏位于中央位置，横跨身体中线且主要位于右侧。在MRI图像上呈现为占据上腹部大部分区域的深色大面积阴影，与图2中的掩膜最为吻合，表明其大小和位置均符合典型肝脏特征。

– Answering prompt 2: In the mask templates (Pictures 1-3), the liver representations are as follows:

回答提示2：在掩膜模板（图1-3）中，肝脏的表示如下：

Large, right-sided, extending from the upper to the lower mid abdomen.

大体积，右侧，从上腹部延伸至中下腹部。

Large, predominantly right-sided but with a lobe crossing midline to the left.

体积较大，主要位于右侧，但有一叶跨过中线延伸至左侧。

Smaller, strictly right-sided, located in the upper abdomen.

较小，严格位于右侧，位于上腹部。

Answering prompt 3: The mask in Picture 2 corresponds to Picture 4. It shows a large liver extending across the midline with a realistic representation of both the left and right lobes, matching the anatomical position and size of the liver as seen in the MRI image.

回答提示3：图2中的遮罩对应图4。图中显示一个跨越中线的大型肝脏，左右叶均呈现真实形态，与MRI图像中观察到的肝脏解剖位置和大小相符。

• response 3:

response 3:

Answering prompt 1: In Picture 4, which is an MRI cross-sectional image, the liver is represented by the large, dark area on the right side of the image (patient’s left side). It occupies a significant portion of the abdominal cavity, extending from the upper right quadrant towards the central upper abdomen. The corresponding mask that best matches the liver’s shape and position appears to be Picture 2.

回答提示1：在图4中，这是一幅MRI横截面图像，肝脏表现为图像右侧(患者左侧)的大片暗色区域。它占据了腹腔的很大部分，从右上象限延伸至中上腹部。与肝脏形状和位置最匹配的对应掩模似乎是图2。

– Answering prompt 2: In pictures 1, 2, and 3, the liver masks are depicted in different shades of brown. Picture 1 shows a large liver mask occupying the majority of the image space, extending centrally. Picture 2 shows a more compact liver shape towards the top. Picture 3 displays the smallest liver mask, positioned towards the upper right – Answering prompt 3: The mask that corresponds to Picture 4 is the one in Picture 2. It most closely matches the size, shape, and orientation of the liver as seen in the MRI image, covering the right section of the abdomen and wrapping around the central axis of the body.

回答提示2：在图1、图2和图3中，肝脏掩模以不同深浅的棕色显示。图1展示了一个占据图像大部分空间、向中心延伸的大型肝脏掩模。图2显示了一个更紧凑、位于顶部的肝脏形状。图3呈现了最小的肝脏掩模，位于右上角。
回答提示3：与图4对应的掩模是图2中的那个。它在大小、形状和方向上与MRI图像中看到的肝脏最为接近，覆盖了腹部的右侧并环绕着身体的中心轴。

6.5 Tip 5: Describing appearances or characteristics will greatly enhance the performance.

6.5 技巧5：描述外观或特征将大幅提升性能

6.5.1 Case 1: Endoscopic image annotation recognition

6.5.1 案例1: 内窥镜图像标注识别

CASE 1

案例 1

• Picture modal: Endoscopic picture • Tasks: La paros co pic c hole cyst ec to my • Reference Answer: – 1: Grasper. – 2: Gallbladder. – 3: Liver. – 4: Fats. – 5: Intestine.

• 图片模态：内窥镜图像
• 任务：腹腔镜下胆囊切除术
• 参考答案：
– 1: 抓钳。
– 2: 胆囊。
– 3: 肝脏。
– 4: 脂肪组织。
– 5: 肠管。

CASE 1 Control group

案例1 对照组

Prompt: This is a picture of c hole cyst ec to my In the picture, there are 5 marked numbers. Please tell me which tissues or instruments correspond to the marked numbers in the picture.

提示：这是一张关于我的囊肿切除手术的图片。图片中有5个标记数字。请告诉我图中标记数字对应的组织或器械。

GPT-4V Response:

GPT-4V 响应：

• response 1:

• 响应1:

• response 2:

• 响应 2:

• response 3:

response 3:

CASE 1

案例 1

Appearances/characteristics described

外观/特征描述

Prompt: This is a picture of c hole cyst ec to my In the picture, there are 5 marked numbers. Please tell me which tissues or instruments correspond to the marked numbers in the picture. The liver is a dark reddish-brown organ, the gallbladder is a pear-shaped organ, both of them are under the abdomen and covered by fatty omentum.

提示：这是一张关于胆囊囊肿切除术的图片。图中标注了5个数字，请告诉我这些数字对应的组织或器械。肝脏是一个暗红棕色的器官，胆囊呈梨形，两者都位于腹部下方并被脂肪网膜覆盖。

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应 1:

• response 2:

• 响应 2:

• response 3:

response 3:

6.5.2 Case 2: CT image annotation recognition

6.5.2 案例2: CT图像标注识别

CASE 2

案例 2

CASE 2 Control Group

案例2 对照组

Prompt: This is a picture of an axial CT scan. In the picture, there are 4 marked numbers. Please tell me which tissues correspond to the marked numbers in the picture.

提示：这是一张轴向CT扫描图像。图中标有4个数字，请告诉我图中标记数字对应的组织分别是什么。

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应1:

• response 2:

• response 3:

response 3:

CASE 2

案例 2

Appearances/Characteristics Described

外观/特征描述

Prompt: This is a picture of an axial CT scan. In the picture, there are 4 marked numbers. Please tell me which tissues correspond to the marked numbers in the picture. The stomach typically appears as a hollow structure located near the midline in the upper part of the abdomen. The liver is a wedge-shaped organ with a structure that can be divided into two main lobes. The spleen has a shape similar to a thick oval or convex lens. The pancreas presents horizontally, resembling an axe lying on the spine.

提示：这是一张轴向CT扫描图像。图中标有4个数字，请告诉我图中标记数字对应的组织。胃通常表现为位于腹部上部分中线附近的空腔结构。肝脏呈楔形，其结构可分为两个主要叶。脾脏形状类似厚椭圆形或凸透镜。胰腺呈水平走向，形似横卧在脊柱上的斧头。

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应 1:

• response 2:

response 2:

• response 3:

response 3:

6.5.3 Case 3: MRI image annotation recognition

6.5.3 案例3: MRI图像标注识别

CASE 3

案例 3

CASE 3 Control Group

案例3 控制组

Prompt: This is a picture of an axial MRI scan. In the picture, there are 4 marked numbers. Please tell me which tissues correspond to the marked numbers in the picture.

提示：这是一张轴向 MRI 扫描图像。图片中有 4 个标记数字，请告诉我图中标记数字对应的组织分别是什么。

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应 1:

– 1: This number is pointing to the region that is likely the liver. It is a large organ located in the upper right quadrant of the abdomen. 2: The area labeled here is central and could represent the abdominal aorta or the spine, depending on the exact location and orientation of the scan. 3:This is likely pointing towards the stomach or pancreas, which are located in the upper left quadrant of the abdomen. 4:The location of this number seems to be over the spleen or the left kidney, depending on the exact axial level of the MRI scan.

1: 该数字指向的区域很可能是肝脏。这是一个位于腹部右上象限的大型器官。
2: 此处标记的区域位于中央，可能代表腹主动脉或脊柱，具体取决于扫描的精确位置和方向。
3: 这可能指向胃或胰腺，它们位于腹部的左上象限。
4: 该数字的位置似乎覆盖脾脏或左肾，具体取决于MRI扫描的轴向层面。

• response 2:

response 2:

• response 3:

• 响应 3:

– 1: This number is pointing to an area that is part of the liver, which is a large organ in the upper right part of the abdomen, involved in many metabolic processes. – 2: This could be pointing to the aorta or the inferior vena cava, depending on its exact anatomical location as these are the major blood vessels that run anterior to the spine through the abdomen. 3: This number seems to be pointing to the spine or vertebrae, which would be the bony structure in the center of the body at this level of an axial slice. – 4: This appears to be pointing to the subcutaneous tissue and skin at the back of the body, which is the lower-density area surrounding the body on an MRI scan.

1: 该数字指向肝脏的一部分，肝脏是位于右上腹的大型器官，参与多种代谢过程。
2: 根据其具体解剖位置，可能指向主动脉或下腔静脉，这些是沿脊柱前方向腹部延伸的主要血管。
3: 该数字似乎指向脊柱或椎骨，即在此水平轴位切片中位于身体中央的骨性结构。
4: 该标记指向身体背部的皮下组织和皮肤，在MRI扫描中表现为环绕身体的低密度区域。

CASE 3

案例3

Appearances/Characteristics Described

外观/特征描述

Prompt: This is a picture of an axial MRI scan. In the picture, there are 4 marked numbers. Please tell me which tissues correspond to the marked numbers in the picture. The spleen typically presents as an oval or crescent-like structure, with its signal intensity usually slightly higher than that of the liver, exhibiting a fairly uniform texture. The liver typically displays a uniform texture with moderate to slightly high signal intensity. Its shape is similar to a large, somewhat irregular wedge. The pancreas appears as a slightly flattened, elongated structure. The gallbladder is presented as a small, streamlined or pear-shaped structure, located beneath the liver.

提示：这是一张轴向MRI扫描图像。图中标记了4个数字，请告诉我图中标记的数字对应哪些组织。脾脏通常呈现为椭圆形或新月形结构，其信号强度通常略高于肝脏，质地相当均匀。肝脏通常显示均匀质地，信号强度中等至略高，形状类似于一个大而不规则的楔形。胰腺呈现为略扁平的细长结构。胆囊表现为一个小型流线型或梨形结构，位于肝脏下方。

GPT-4V Response:

GPT-4V 响应：

• response 1:

• 响应 1:

• response 2:

response 2:

• response 3:

• 响应3:

6.6 Tip 6: Appearance descriptors should be non-conflicting with any image annotations to avoid misinterpretations.

6.6 技巧6: 外观描述符不应与任何图像标注冲突，以避免误解。

6.6.1 Case 1: Endoscopic image annotation recognition

6.6.1 案例1: 内窥镜图像标注识别

CASE 1

案例 1

• 图片模态：内窥镜图像
• 任务：腹腔镜胆囊切除术
• 参考答案：

黄点：肝脏。
红点：手术抓钳。
蓝点：胆囊。
绿点：脂肪组织。
紫点：肠道。

CASE 1

案例 1

Conflicted with annotation

与注解冲突

Prompt: This is a picture of a c hole cyst ec to my. In this picture, there are markers of different colors. Please tell me which tissues or instruments correspond to the annotations in the image. Intestinal tissue is typically pale pink, the liver is usually reddish-brown, the gallbladder is commonly grayishgreen, and fat is generally pale yellow.

提示：这是一张关于胆囊切除术 (cholecystectomy) 的图片。图中标有不同颜色的标记。请说明图像中各标注对应的组织或器械。肠组织通常呈淡粉色，肝脏一般为红棕色，胆囊常呈灰绿色，脂肪通常为淡黄色。

GPT-4V Response:

GPT-4V 响应：

• response 1:

• 响应1:

• response 2:

• response 3:

• 响应 3:

6.7 Tip 7: Clarifying the contextual relationships between sequential images enhances GPT-4v’s analytical accuracy and detail recognition.

6.7 技巧7: 明确连续图像间的上下文关系可提升GPT-4v的分析准确性与细节识别能力

6.7.1 Case 1: Endoscopic video sequence analysis

6.7.1 案例1: 内窥镜视频序列分析

CASE 1

案例 1

Figure 2: Without series number

图 2: 无序列号

Figure 3: With series number

图 3: 带序列号

• Picture modal: Endoscopy • Tasks: Hy ster osco pic polyp ec to my • Reference Answer:

• 图片模态：内窥镜
• 任务：宫腔镜下息肉切除术
• 参考答案：

CASE 1

案例 1

Control Group (without contextual relationships)

对照组 (无上下文关联)

Prompt: These four photos are from a hy ster osco pic polyp ec to my. Please analyze the procedures in these four images.

提示：这四张照片来自一次宫腔镜下息肉切除术。请分析这四张图像中的手术步骤。

GPT-4V Response:

GPT-4V 响应：

• response 1:

• 响应 1:

• response 2:

response 2:

– Photo 1: Assisted by hy st eros copy, the physician is examining the endometrial lining. A distinct polyp is visible in this image, surrounded by some mild bleeding. Additionally, a surgical instrument (resembling forceps) is nearing the operational site.

图 1: 在宫腔镜辅助下，医生正在检查子宫内膜。图像中可见一个明显的息肉，周围伴有轻微出血。此外，一把手术器械(类似钳子)正接近操作部位。

– Photo 2: The surgical instrument approaches the polyp, preparing for excision. There’s a slight increase in the bleeding surrounding the polyp.

照片 2: 手术器械接近息肉，准备切除。息肉周围出血量略有增加。

– Photo 3: This is a pivotal moment in the excision procedure. The polyp, as evident from the image, has been grasped and is commencing detachment.

照片 3: 这是切除手术中的关键一刻。如图所示，息肉已被夹住并开始剥离。

– Photo 4: The polyp has been successfully excised, and the surgical instrument is extracting it from the uterine cavity. Hemorrhage within the cavity has been managed.

图 4: 息肉已成功切除，手术器械正将其从宫腔中取出。宫腔内出血已得到控制。

• response 3:

response 3:

– Photo 1: This image offers a perspective from the hy st eros cope, showcasing the endometrial lining and a polyp. Hy ster osco pic instruments are being employed to position and stabilize the polyp in preparation for its excision. – Photo 2: In this image, the tool has approached the polyp and commenced its excision. There’s a reddened region above the excision site, indicative of minor bleeding due to the surgical intervention. – Photo 3: The image depicts a partially excised polyp. A section of the polyp has been cut off, with the remaining still attached to the uterine wall. – Photo 4: This image presents the area post-polyp ec to my. There’s a whitened zone on the endometrial lining, possibly a trace left after the polyp’s removal.

照片 1: 该图像展示了宫腔镜视角下的子宫内膜和息肉。此时正使用宫腔镜器械对息肉进行定位固定，为切除做准备。
照片 2: 图中器械已接近息肉并开始切除。切除部位上方可见发红区域，表明手术操作引发了轻微出血。
照片 3: 图像显示部分切除的息肉。息肉部分已被切除，剩余部分仍附着在子宫壁上。
照片 4: 本图呈现息肉切除术后的区域。子宫内膜上可见泛白区域，可能是息肉切除后留下的痕迹。

CASE 1

案例 1

With contextual relationships

基于上下文关系

Prompt: These four photos are from a hy ster osco pic polyp ec to my. Please analyze the procedures in these four images.

提示：这四张照片来自一次宫腔镜下息肉切除术。请分析这四张图中的手术步骤。

GPT-4V Response:

GPT-4V 响应:

• response 1:

响应 1:

• response 2:

response 2:

• response 3:

响应3:

6.8 Tip 8: Splicing multiple images into one and providing a sequence order enhances task processing effectiveness compared to multiple simultaneous inputs.

6.8 技巧8：将多张图像拼接成一张并按顺序排列，相比同时输入多张图像更能提升任务处理效果。

6.8.1 Case 1: Endoscopic video sequence analysis

6.8.1 案例1: 内窥镜视频序列分析

CASE 1

案例 1

Figure 4: Uploaded one by one via GPT-4V website. The upload system of GPT-4V may cause different sizes to be displayed, but the actual sizes are not affected

图 4: 通过 GPT-4V 网站逐一上传。GPT-4V 的上传系统可能导致显示尺寸不同，但实际尺寸不受影响

Figure 5: Uploaded after splicing

图 5: 拼接后上传

• Picture modal: Endoscopy • Tasks: Hy ster osco pic polyp ec to my • Reference Answer:

• 图片模态：内窥镜
• 任务：宫腔镜息肉切除术
• 参考答案：

CASE 1

案例 1

Uploaded one by one via GPT4-v website

通过GPT4-v网站逐一上传

Prompt: These four photos are from a hy ster osco pic polyp ec to my. Please analyze the procedures in these four images.

提示：这四张照片来自一次宫腔镜息肉切除术。请分析这四张图像中的手术步骤。

GPT-4V Response:

GPT-4V 响应:

• response 1:

• 响应 1:

• response 2:

response 2:

• response 3:

response 3:

6.9 Tip 9: Providing comparative analysis opportunities, especially with temporal patient data, deepens lesion or condition assessments.

6.9 技巧9：提供对比分析机会（尤其是时间序列患者数据）可深化病灶或病情评估。

6.9.1 Case 1: CT scan tracking of pulmonary nodule • Picture modal: CT scan • Tasks: Pulmonary nodule growth tracking • Reference Answer: a de no carcinoma

6.9.1 案例1: 肺结节CT扫描追踪
• 图像模态: CT扫描
• 任务: 肺结节生长追踪
• 参考答案: 原位腺癌

Figure 6: Without comparison

图 6: 无对比

Figure 7: With comparison

图 7: 对比效果

CASE 1

案例 1

Singe picture(without comparison) 6.9.1

单张图片（无对比）6.9.1

Prompt: It took 6 years for the patient’s nodule to grow to the size indicated by the arrow in the image. What disease is the patient most likely to have

提示：患者的结节在6年内生长至图中箭头所示大小。该患者最可能患有何种疾病

GPT-4V Response:

GPT-4V 响应:

CASE 1

案例 1

With comparison 6.9.1

对比 6.9.1

Prompt: The patient’s nodule has increased in size from left to right over 6 years, what kind of disease is the patient most likely to have

提示：患者结节在6年内从左到右逐渐增大，最可能患何种疾病

GPT-4V Response:

GPT-4V 响应：

• response 1:

• 响应1:

– Primary lung cancer: The slow growth might suggest a subtype like adenocarcinoma, which can sometimes grow at a slower rate compared to other forms of lung cancer like small cell carcinoma. However, the speed of growth varies and cannot definitively diagnose the type.

原发性肺癌：生长缓慢可能提示为腺癌等亚型，这类肿瘤的生长速度有时可能比小细胞癌等其他类型肺癌更慢。但生长速度存在个体差异，不能作为确诊依据。

– Infectious causes: Some infections can cause nodules that grow or change over time. Examples include fungal infections such as asperg il los is or cocci dio ido my cos is.

– 感染性原因: 某些感染会导致结节随时间增长或变化。例如曲霉病 (aspergillosis) 或球孢子菌病 (coccidioidomycosis) 等真菌感染。

Benign tumors: While benign tumors such as hamartomas or fibromas are typically slow-growing, they can still increase in size over time.

良性肿瘤：虽然错构瘤或纤维瘤等良性肿瘤通常生长缓慢，但仍可能随时间推移增大。

– Inflammatory conditions: Conditions such as rheumatoid nodules or gran u loma to us diseases can also present as lung nodules. The growth pattern might help differentiate between the different possibilities.

炎症性疾病：类风湿结节或肉芽肿性疾病等也可能表现为肺结节。生长模式可能有助于区分不同可能性。

Metastatic cancers: A lung nodule can be a metastasis from a primary tumor elsewhere in the body.

转移性癌症：肺部结节可能是身体其他部位原发肿瘤的转移灶。

• response 2:

response 2:

– Lung Cancer: Primary lung cancer is the most concerning possibility for a progressively enlarging nodule in the lung. It could be non-small cell lung cancer (e.g., a de no carcinoma, squamous cell carcinoma) or small cell lung cancer.

肺癌：原发性肺癌是肺部结节逐渐增大最令人担忧的可能性。可能是非小细胞肺癌(例如原位癌、鳞状细胞癌)或小细胞肺癌。

– Infectious Causes: Some infections can cause nodules in the lung that can grow over time. This includes bacterial infections like tuberculosis, fungal infections like asperg il los is or his to plasm os is, or parasitic infections like e chino co c cos is.

– 感染性病因：某些感染可能导致肺部结节随时间推移增大，包括结核病等细菌感染、曲霉病或组织胞浆菌病等真菌感染，以及棘球蚴病等寄生虫感染。

– Benign Tumors: Some benign lung tumors, such as hamartomas or fibromas, can also grow, although at a slower rate than malignant ones.

良性肿瘤：某些良性肺部肿瘤（如错构瘤或纤维瘤）也可能生长，尽管速度比恶性肿瘤慢。

– Inflammatory Diseases: Conditions like gran u loma to s is with poly an gi it is, rheumatoid nodules in the context of rheumatoid arthritis, and sarcoidosis can cause lung nodules.

– 炎症性疾病：如肉芽肿性多血管炎、类风湿关节炎背景下的类风湿结节以及结节病等病症均可引发肺结节。

– Metastases: If the patient has a history of cancer elsewhere in the body, it’s possible that the nodule represents metastatic disease.

转移性病变：若患者有其他部位癌症病史，该结节可能为转移性疾病。

• response 3:

response 3:

6.10 Tip 10: Directing GPT-4v’s focus to the interested areas will facilitate more targeted and relevant output.

6.10 技巧10: 将GPT-4v的注意力引导至目标区域可生成更具针对性的输出

6.10.1 Case 1: MRI cerebral scan

6.10.1 案例1: MRI脑部扫描

CASE 1

案例 1

• Picture modal: MRI • Tasks: Cerebral lesions detection (after ischemic stroke) • Reference Answer: A lesion is on the right frontal lobe (left part in image)

图像模态：MRI
任务：脑部病灶检测（缺血性中风后）
参考答案：病灶位于右侧额叶（图像左侧部分）

CASE 1 Control Group

案例1 对照组

Prompt: Please examine this head MRI image and determine if it is normal (the information in this image is sufficient to make a judgment).

提示：请检查这张头部MRI图像并判断是否正常（该图像中的信息足以做出判断）。

GPT-4V Response:

GPT-4V 响应:

• response 1:

response 1:

The image displays a T1-weighted magnetic resonance imaging (MRI) scan of the head. It’s imperative to note that I can only provide general observations based on this image, which should not be construed as a substitute for a professional medical opinion. From the given image:

该图像显示了一幅头部的T1加权磁共振成像(MRI)扫描图。必须强调的是，我只能基于此图像提供一般性观察，这些内容不应被视为专业医疗意见的替代。从给定图像来看:

• response 2:

• 响应2:

This represents a T1-weighted axial slice of a cranial MRI. From the scan: The majority of the cerebral structures can be discerned, including the cerebral cortex, the ventricular system, and the butterfly-shaped white matter region in the brain’s center. At the current resolution, there are no evident neoplasms, hemorrhagic regions, or other manifest path o logie s. The size and morphology of the ventricles also appear to be within standard limits.

这是一幅颅脑MRI的T1加权轴位切片图像。扫描显示：大部分脑部结构清晰可辨，包括大脑皮层、脑室系统以及大脑中央呈蝶形的白质区域。在当前分辨率下，未见明显肿瘤、出血区域或其他显著病理改变。脑室的大小和形态也处于标准范围内。

• response 3:

CASE 1

案例 1

Directed to the interested area

定向到感兴趣区域

Prompt: Please examine the upper left portion of this image and determine if it is normal (the information in this image is sufficient to make a judgment).

提示：请检查此图像的左上部分并判断是否正常（该图像中的信息足以做出判断）。

GPT-4V Response:

GPT-4V 响应：

response 1: Based on the provided MRI image, a hyper intense region is evident in the upper left quadrant, distinctly contrasting with the surrounding cerebral tissue. This region may signify a potential anomaly or lesion. However, the precise nature and clinical significance necessitate a more comprehensive medical evaluation for determination. response 2: In the MRI image you’ve presented, there appears to be a hyper intense area in the upper left section, contrasting with the adjacent cerebral tissue. This could indicate the presence of a lesion or abnormality. response 3: On the MRI image you provided, a hyper intense region is discernible in the upper left quadrant. Based on the image, this region seems to deviate from the typical cerebral tissue presentation. From this observation, the region may represent an aberration.

response 1: 根据提供的MRI图像，左上象限可见一处明显高信号区域，与周围脑组织形成鲜明对比。该区域可能提示存在潜在异常或病变。但确切性质及临床意义需通过更全面的医学评估来确定。

response 2: 您提供的MRI图像显示，左上区域存在一个与邻近脑组织对比明显的高信号区，这可能表明存在病变或异常。

response 3: 在您提供的MRI图像中，可辨识出左上象限存在高信号区域。根据图像表现，该区域似乎偏离正常脑组织特征。由此观察推测，该区域可能代表某种异常。

6.10.2 Case 2: PET scan for lung cancer detection

6.10.2 案例2: 用于肺癌检测的PET扫描

CASE 2

案例 2

• Picture modal: PET scan • Tasks: Cancer detection • Reference Answer: Lung cancer

• 图像模态：PET扫描
• 任务：癌症检测
• 参考答案：肺癌

CASE 2 Control Group

案例2 控制组

Prompt: Please describe this PET scan and tell me what disease is most likely involved in this picture (only answer the most likely one)

提示：请描述这张PET扫描图像，并告诉我图中最可能涉及的疾病（只回答最可能的一种）

GPT-4V Response:

GPT-4V 响应:

CASE 2 Control Group

案例2 对照组

Prompt: Focusing on the chest, please describe this PET scan and tell me what disease is most likely involved in this picture (only answer the most likely one)

提示：请聚焦胸部区域，描述这份PET扫描图像并告诉我该影像最可能提示哪种疾病（仅回答最可能的一种）

GPT-4V Response:

GPT-4V 响应:

• response 1: Focusing on the chest area, there are at least two distinct areas of increased tracer uptake. One is located in the region that would correspond to the media st in um, which is the central part of the thorax. The other is in the region that might correspond to the lung or pleura.

• response 1: 重点关注胸部区域，至少存在两处明显的示踪剂摄取增高区域。一处位于相当于纵隔 (mediastinum) 的区域，即胸腔中央部分；另一处位于可能对应肺部或胸膜的区域。

Given the presence of these bright spots in the chest, the most likely disease involved is lung cancer, possibly with media st in al involvement which could suggest lymph node metastasis. However, other possibilities like lymphoma or infections could also present with increased uptake in these areas, but these are less likely based on the singular bright spots seen in the image. It is important to note that a definitive diagnosis would require correlation with the patient’s clinical history, physical examination, and other diagnostic studies. A biopsy would be necessary to confirm the type of cancer or disease.

鉴于胸部存在这些亮点，最可能的疾病是肺癌，可能伴有纵隔受累，提示淋巴结转移。然而，其他可能性如淋巴瘤或感染也可能在这些区域出现摄取增加，但根据图像中单一的亮点来看，这些可能性较小。需要注意的是，明确诊断需要结合患者的临床病史、体格检查和其他诊断性检查。活检对于确认癌症或疾病类型是必要的。

• response 2: In the chest area of this scan, there are multiple areas that show increased tracer uptake (highlighted as bright spots). The presence of multiple bright spots can be associated with several conditions, but given that they are scattered throughout the chest, the most likely disease involved here is metastatic cancer. Metastases in the chest can arise from primary cancers located in the chest (like lung cancer) or from cancers that originated in other parts of the body and have spread to the chest.

• 回应2: 本次扫描的胸部区域显示多处示踪剂摄取增高区域(呈现为亮点)。多发性亮点的出现可能与多种病症相关，但鉴于其分散分布于整个胸部，最可能的关联疾病是转移性癌症。胸部转移灶可能源自胸部原发癌(如肺癌)，也可能由身体其他部位的原发癌扩散至胸部所致。

response 3: In this particular PET scan image, there are multiple areas of increased radio tracer uptake (hyper metabolic activity) in the region of the chest. The most prominent hyper metabolic activity is noted centrally in the chest area, which could represent the media st in um, the area between the lungs that contains the heart, great vessels, trachea, esophagus, and lymph nodes.

回复3：在这张特定的PET扫描图像中，胸部区域存在多处放射性示踪剂摄取增高(代谢亢进)区域。最显著的代谢亢进位于胸部中央区域，可能代表纵隔(mediastinum)——即两肺之间包含心脏、大血管、气管、食管和淋巴结的区域。

Given that there is focal, intense uptake in the chest without clear visualization of the surrounding anatomical structures, the most likely diagnosis in this scenario would be a media st in al tumor. The most common type of media st in al tumor that would show up on a PET scan is a lymphoma, especially if the hyper metabolic activity is associated with lymph nodes.

鉴于胸部存在局灶性强烈摄取而周围解剖结构显示不清，这种情况下最可能的诊断是纵隔肿瘤。PET扫描中最常见的纵隔肿瘤类型是淋巴瘤，特别是当高代谢活动与淋巴结相关时。

[论文翻译]提升GPT-4V在医疗任务中的表现：关于提示工程策略的综合研究

原文地址：https://arxiv.org/pdf/2312.04344

ENHANCING MEDICAL TASK PERFORMANCE IN GPT-4V: A COMPREHENSIVE STUDY ON PROMPT ENGINEERING STRATEGIES

提升GPT-4V在医疗任务中的表现：关于提示工程策略的综合研究

ABSTRACT

1 Introduction

1 引言

2 Methods

2 方法

2.1 Overview

2.1 概述

2.2 Testing procedure

2.2 测试流程

2.3 Testing dataset

2.3 测试数据集

3 Prompt Tips

3 个提示词技巧

4 Discussion

4 讨论

4.1 Tip 1: Concise language is more effective than complex descriptions, emphasizing succinct, task-relevant details in image analysis.

4.1 技巧1：简洁语言比复杂描述更有效，在图像分析中应强调与任务相关的简明细节

4.1.1 Demonstration Info

4.1.1 演示信息

4.1.2 Analysis

4.2 Tip 2: Providing the tasks can better help analyze the images

4.2 技巧2：提供任务能更好地辅助分析图像

4.2.1 Demonstration Info

4.2.1 演示信息

4.2.2 Analysis

4.3.1 Demonstration Info

4.3.1 演示信息

4.3.2 Analysis

4.4 Tip 4: Don’t expose your target at the very beginning when you start multi-round dialog

4.4 技巧4: 开始多轮对话时不要过早暴露目标

4.4.1 Demonstration Info

4.4.1 演示信息

4.4.2 Analysis

4.5 Tip 5: Describing appearances or characteristics will greatly enhance the performance.

4.5.1 Demonstration Info

4.5.1 演示信息

4.5.2 Analysis

4.6 Tip 6: Appearance descriptors should be non-conflicting with any image annotations to avoid misinterpretations.

4.6 技巧6: 外观描述符不应与任何图像标注冲突，以避免误解。

4.6.1 Demonstration Info

4.6.1 演示信息

4.6.2 Analysis

4.7 Tip 7: Clarifying the contextual relationships between sequential images enhances GPT-4v’s analytical accuracy and detail recognition.

4.7 技巧7: 明确连续图像间的上下文关系可提升GPT-4v的分析准确性与细节识别能力

4.7.1 Demonstration Info

4.7.1 演示信息

4.7.2 Analysis

4.8.1 Demonstration Info

4.8.2 Analysis

4.9.1 Demonstration Info

4.9.1 演示信息

4.9.2 Analysis

4.10 Tip 10: Directing GPT-4V’s focus to the interested areas will facilitate more targeted and relevant output.

4.10 技巧 10: 将 GPT-4V 的注意力引导至目标区域可生成更具针对性的输出

4.10.1 Demonstration Info

4.10.2 Analysis

4.10.2 分析

5 Summary

5 总结

References

6 Case demonstration

6 案例演示

CASE 1 Control Group

案例1 控制组

GPT-4V Response:

response 2

response 3

CASE 1 Complex Descriptions

GPT-4V Response:

response 1

response 2

response 3

CASE 1 Concise Description

GPT-4V Response:

response 1

• response 2