SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model
SkinGPT-4: 基于可视化大语言模型的交互式皮肤病诊断系统
Juexiao Zhou1,2,#, Xiaonan , , Liyuan , , Xiuying Chen1,2, Yuetan Chu1,2, Longxi Zhou1,2, Xingyu Liao1,2, Bin Zhang1,2, Xin Gao1,2,∗
周珏晓1,2,#, 肖楠 , 李源 , 简南 , 陈秀英1,2, 褚月潭1,2, 周龙溪1,2, 廖星宇1,2, 张斌1,2, 高鑫1,2,∗
Abstract—Skin and subcutaneous diseases rank high among the leading contributors to the global burden of nonfatal diseases, impacting a considerable portion of the population. Nonetheless, the field of dermatology diagnosis faces three significant hurdles. Firstly, there is a shortage of dermatologists accessible to diagnose patients, particularly in rural regions. Secondly, accurately interpreting skin disease images poses a considerable challenge. Lastly, generating patient-friendly diagnostic reports is usually a time-consuming and labor-intensive task for dermatologists. To tackle these challenges, we present SkinGPT-4, which is the world’s first interactive dermatology diagnostic system powered by an advanced visual large language model. SkinGPT-4 leverages a fine-tuned version of MiniGPT-4, trained on an extensive collection of skin disease images (comprising 52,929 publicly available and proprietary images) along with clinical concepts and doctors’ notes. We designed a two-step training process to allow SkinGPT-4 to express medical features in skin disease images with natural language and make accurate diagnoses of the types of skin diseases. With SkinGPT-4, users could upload their own skin photos for diagnosis, and the system could autonomously evaluate the images, identifies the characteristics and categories of the skin conditions, performs in-depth analysis, and provides interactive treatment recommendations. Meanwhile, SkinGPT-4’s local deployment capability and commitment to user privacy also render it an appealing choice for patients in search of a dependable and precise diagnosis of their skin ailments. To demonstrate the robustness of SkinGPT-4, we conducted quantitative evaluations on 150 real-life cases, which were independently reviewed by certified dermatologists, and showed that SkinGPT-4 could provide accurate diagnoses of skin diseases. Though SkinGPT-4 is not a substitute for doctors, it could enhance users’ comprehension of their medical conditions, facilitate improve communication between patients and doctors, expedite the diagnostic process for dermatologists, and potentially promote human-centred care and healthcare equity in underdeveloped areas.
摘要—在全球非致命性疾病负担的主要诱因中,皮肤及皮下组织疾病位居前列,影响着大量人群。然而皮肤科诊断领域面临三大挑战:一是诊断医师资源短缺,农村地区尤为突出;二是皮肤病图像识别存在显著困难;三是生成患者友好型诊断报告往往耗费医师大量时间精力。为此,我们推出全球首个基于先进视觉大语言模型的交互式皮肤病诊断系统SkinGPT-4。该系统采用经皮肤病图像集(含52,929张公开及专有图像)、临床概念和医师笔记微调的MiniGPT-4版本,通过两阶段训练实现自然语言描述皮损特征与精准分型诊断。用户可上传皮肤照片获取自主评估,系统能识别皮损特征与分类,提供深度分析及交互式诊疗建议。其本地化部署能力与隐私保护机制,使其成为患者寻求可靠精准诊断的优选方案。我们在150例经认证皮肤科医师独立复核的真实病例上开展定量评估,证实该系统可提供准确诊断。尽管无法替代医师,但能提升患者对病情的理解,优化医患沟通,加速诊断流程,并有望推动欠发达地区以人为本的医疗公平。
Index Terms—Dermatology, Deep learning, Large language model
关键词—皮肤病学、深度学习、大语言模型
1 INTRODUCTION
1 引言
Skin and subcutaneous diseases rank as the fourth major cause of nonfatal disease burden worldwide, affecting a considerable proportion of individuals, with a prevalence ranging from $30%$ to $70%$ across all ages and regions [1]. However, dermatologists are consistently in short supply, particularly in rural areas, and consultation costs are on the rise [2], [3], [4]. As a result, the responsibility of diagnosis often falls on non-specialists such as primary care physicians, nurse practitioners, and physician assistants, which may have limited knowledge and training [5] and low accuracy on diagnosis [6], [7]. The use of store-andforward tele dermatology has become dramatically popular in order to expand the range of services available to medical professionals [8], which involves transmitting digital images of the affected skin area (usually taken using a digital camera or smartphone) [9] and other relevant medical information from users to dermatologists. Then, the dermatologist reviews the case remotely and advises on diagnosis, workup, treatment, and follow-up recommendations [10], [11]. Nonetheless, the field of dermatology diagnosis faces three significant hurdles [12]. Firstly, there is a shortage of dermatologists accessible to diagnose patients, particularly in rural regions. Secondly, accurately interpreting skin disease images poses a considerable challenge. Lastly, generating patient-friendly diagnostic reports is usually a time-consuming and labor-intensive task for dermatologists [4], [13].
皮肤及皮下组织疾病是全球非致命性疾病负担的第四大诱因,影响约30%至70%的全球各年龄段和地区人群[1]。然而皮肤科医生长期短缺(尤其在农村地区)且诊疗费用持续攀升[2][3][4],导致诊断工作常由全科医生、执业护士等非专科人员承担,这些从业者可能存在专业知识局限[5]和诊断准确率偏低的问题[6][7]。为拓展医疗服务范围,存储转发式远程皮肤病学(store-and-forward teledermatology)应用日益广泛[8],该模式要求用户将患处数字图像(通常通过数码相机或智能手机拍摄)[9]及相关医疗信息传输给皮肤科医生,由医生远程评估后提供诊断、检查、治疗及随访建议[10][11]。当前皮肤病诊断领域面临三大挑战[12]:一是接诊医生(尤其是偏远地区)数量不足;二是皮肤病图像精准判读存在显著难度;三是生成患者友好型诊断报告通常耗费皮肤科医生大量时间精力[4][13]。
Advancements in technology have led to the development of various tools and techniques to aid dermatologists in their diagnosis [13], [14], [15]. For example, the development of artificial intelligence tools to aid in the diagnosis of skin disorders from images has been made possible by recent advancements in deep learning [16], [17], such as skin cancer classification [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], dermatopathology [28], [29], [30], predicting novel risk factors or epidemiology [31], [32], identifying ony cho my cos is [33], quantifying alopecia areata [34], classify skin lesions from mpox virus infection [35], and so on [4]. Among these, most studies have predominantly concentrated on identifying skin lesions through dermoscopic images [36], [37], [38]. However, der matos copy is often not readily available outside of dermatology clinics. Some studies have explored the use of clinical photographs of skin cancer [18], ony cho my cos is [33], and skin lesions on educational websites [39]. Nevertheless, those methods are tailored for particular diagnostic objectives as classification tasks and their approach still requires further analysis by dermatologists to issue reports and make clinical decisions. Those methods are unable to automatically generate detailed reports in natural language and allow interactive dialogues with patients. At present, there are no such diagnostic systems available for users to self-diagnose skin conditions by submitting images that can automatically and interactively analyze and generate easy-to-understand text reports.
技术进步推动了多种辅助皮肤科医生诊断的工具和技术发展 [13]、[14]、[15]。例如,深度学习的最新进展使得通过图像辅助诊断皮肤疾病的人工智能工具成为可能 [16]、[17],包括皮肤癌分类 [18]、[19]、[20]、[21]、[22]、[23]、[24]、[25]、[26]、[27]、皮肤病理学 [28]、[29]、[30]、预测新型风险因素或流行病学 [31]、[32]、识别甲真菌病 [33]、量化斑秃 [34]、分类猴痘病毒感染引起的皮肤病变 [35] 等 [4]。其中,大多数研究主要集中于通过皮肤镜图像识别皮肤病变 [36]、[37]、[38]。然而,皮肤镜检查在皮肤科诊所之外通常不易获得。一些研究探索了使用皮肤癌的临床照片 [18]、甲真菌病 [33] 以及教育网站上的皮肤病变图像 [39]。尽管如此,这些方法针对特定诊断目标(如分类任务)进行了定制,其方法仍需要皮肤科医生进一步分析以出具报告并做出临床决策。这些方法无法自动生成自然语言的详细报告,也无法与患者进行交互式对话。目前,尚无此类诊断系统可供用户通过提交图像进行自我诊断,并自动交互式分析和生成易于理解的文本报告。
Over the past few months, the field of large language models (LLMs) has seen significant advancements [40], [41], offering remarkable language comprehension abilities and the potential to perform complex linguistic tasks. One of the most anticipated models is GPT-4 [42], which is a largescale multimodal model that has demonstrated exceptional capabilities, such as generating accurate and detailed image descriptions, providing explanations for atypical visual occurrences, constructing websites based on handwritten textual descriptions, and even acting as family doctors [43]. Despite these remarkable advancements, some features of GPT-4 are still not accessible to the public and are closedsource. Users need to pay and use some features through API. As an accessible alternative, ChatGPT, which is also developed by OpenAI, has demonstrated the potential to assist in disease diagnosis through conversation with patients [44], [45], [46], [46], [47], [48], [49]. By leveraging its advanced natural language processing capabilities, ChatGPT could interpret symptoms and medical history provided by patients and make suggestions for potential diagnoses or referrals to appropriate dermatological specialists [50]. However, ChatGPT currently only allows text input and does not support direct image input for diagnosis, which limits its availability for dermatological diagnosis.
过去几个月,大语言模型 (LLM) 领域取得了重大进展 [40][41],展现出卓越的语言理解能力及执行复杂语言任务的潜力。其中最受期待的 GPT-4 [42] 作为大规模多模态模型,已展现出多项非凡能力:生成精确细致的图像描述、解释非典型视觉现象、根据手写文本描述构建网站,甚至能充当家庭医生 [43]。尽管成果显著,GPT-4 的某些功能仍未向公众开放且闭源,用户需通过付费 API 使用部分功能。作为可替代方案,OpenAI 开发的 ChatGPT 已展现通过医患对话辅助疾病诊断的潜力 [44][45][46][47][48][49],其先进自然语言处理能力可解析患者提供的症状与病史,并提出初步诊断建议或转诊至皮肤科专科医师 [50]。但 ChatGPT 目前仅支持文本输入,无法直接通过图像进行诊断,这限制了其在皮肤科诊断中的应用。
The idea of providing skin images directly for automatic dermatological diagnosis and generating text reports is exciting because it could greatly help solve the three aforementioned challenges in the field of dermatology diagnosis. However, there exists no method to accomplish this at present. But in related areas, ChatCAD [51] is one of the most advanced approaches that designed various networks to take X-rays, CT scans, and MRIs images to generate diverse outputs, which are then transformed into text descriptions. These descriptions are combined as inputs to ChatGPT to generate a condensed report and offer interactive explanations and medical recommendations based on the given image. However, their proposed visiontext models were limited to certain tasks. Meanwhile, for ChatCAD, users need to use ChatGPT’s API to upload text descriptions, which could raise data privacy issues [41], [52], [53] as both medical images and text descriptions contain a lot of patients’ private information [54], [55], [56], [57]. To address those issues, MiniGPT-4 [58] is the first open-source method that allows users to deploy locally to interface images with state-of-the-art LLMs and interact using natural language without the need to fine-tune both pre-trained large models but only a small alignment layer. MiniGPT4 aims to combine the power of a large language model with visual information obtained from a pre-trained vision encoder. To achieve this, the model uses Vicuna [59] as its language decoder, which is built on top of LLaMA [60] and is capable of performing complex linguistic tasks. To process visual information, the same visual encoder used in BLIP-2 [61] is employed, which consists of a ViT [62] backbone combined with a pre-trained Q-Former. Both the language and vision models are open-source. To bridge the gap between the visual encoder and the language model, MiniGPT-4 utilizes a linear projection layer. However, MiniGPT-4 is trained on the combined dataset of Conceptual Caption [63], SBU [64], and LAION [65], which are irrelevant to medical images, especially dermatological images. Therefore, it is still challenging to directly apply MiniGPT-4 to specific domains such as formal dermatology diagnosis.
直接提供皮肤图像用于自动皮肤病诊断并生成文本报告的想法令人振奋,因为这能极大助力解决皮肤病诊断领域的上述三大挑战。然而目前尚无方法能实现这一目标。在相关领域中,ChatCAD [51] 是最先进的方法之一,它设计了多种网络来处理X光、CT和MRI图像以生成多样化输出,再将其转化为文本描述。这些描述组合后输入ChatGPT,生成精简报告并提供基于给定图像的交互式解释与医疗建议。但其所提出的视觉-文本模型仅适用于特定任务。同时,ChatCAD需要用户通过ChatGPT API上传文本描述,这可能引发数据隐私问题 [41][52][53],因为医学图像和文本描述均包含大量患者隐私信息 [54][55][56][57]。
为解决这些问题,MiniGPT-4 [58] 成为首个开源方法,允许用户本地部署以连接图像与前沿大语言模型,通过自然语言交互而无需对两个预训练大模型进行微调,仅需调整小型对齐层。MiniGPT-4旨在将大语言模型能力与预训练视觉编码器获取的视觉信息相结合。为此,该模型采用基于LLaMA [60] 构建的Vicuna [59] 作为语言解码器,可执行复杂语言任务。视觉处理则使用与BLIP-2 [61] 相同的视觉编码器,包含ViT [62] 主干网络与预训练Q-Former。语言和视觉模型均为开源。为弥合视觉编码器与语言模型间的鸿沟,MiniGPT-4采用了线性投影层。然而,MiniGPT-4的训练数据来自Conceptual Caption [63]、SBU [64] 和LAION [65] 的组合数据集,这些数据与医学图像(尤其是皮肤病图像)无关。因此,直接将MiniGPT-4应用于正式皮肤病诊断等特定领域仍具挑战性。
Here, we propose SkinGPT-4, the world’s first dermatology diagnostic system powered by an advanced visionbased large language model (Figure 1). SkinGPT-4 leverages a fine-tuned version of MiniGPT-4, trained on an extensive collection of skin disease images (comprising 52,929 publicly available and proprietary images) along with clinical concepts and doctors’ notes. We designed a two-step training process to develop SkinGPT-4 as shown in Figure 2. In the initial step, SkinGPT-4 aligns visual and textual clinical concepts, enabling it to recognize medical features within skin disease images and express those medical features with natural language. In the subsequent step, SkinGPT
在此,我们提出SkinGPT-4,这是全球首个基于先进视觉大语言模型的皮肤病诊断系统(图1)。SkinGPT-4采用微调版MiniGPT-4架构,训练数据包含52,929张公开及专有皮肤疾病图像,并整合临床概念与医生笔记。如图2所示,我们设计了两阶段训练流程:第一阶段实现视觉特征与文本临床概念的对齐,使模型能够识别皮肤病图像中的医学特征并用自然语言描述;第二阶段着重提升SkinGPT...
Fig. 1. Illustration of SkinGPT-4. SkinGPT-4 incorporates a fine-tuned version of MiniGPT-4 on a vast collection (52,929) of both public and inhouse skin disease images, accompanied by clinical concepts and doctors’ notes. With SkinGPT-4, users could upload their own skin photos for diagnosis, and SkinGPT-4 could autonomously determine the characteristics and categories of skin conditions, perform analysis, provide treatment recommendations, and allow interactive diagnosis. On the right is an example of interactive diagnosis.
图 1: SkinGPT-4示意图。SkinGPT-4基于公开及内部皮肤疾病图像数据集(共52,929张)对MiniGPT-4进行微调,并结合临床概念与医生注释。用户可通过SkinGPT-4上传皮肤照片进行诊断,该系统能自主识别皮肤病症特征与类别,进行分析并提供治疗建议,支持交互式诊断。右侧为交互诊断示例。
4 learns to accurately diagnoses the specific types of skin diseases. This comprehensive training methodology ensures the system’s proficiency in analyzing and classifying various skin conditions. With SkinGPT-4, users have the ability to upload their own skin photos for diagnosis. The system autonomously evaluates the images, identifies the characteristics and categories of the skin conditions, performs in-depth analysis, and provides interactive treatment recom mend at ions (Figure 3). Moreover, SkinGPT-4’s localized deployment capability and a strong commitment to user privacy make it a trustworthy and precise diagnostic tool for patients seeking reliable assessments of their skin ailments. Meanwhile, we showed that SkinGPT-4 could empower patients to gain a clearer understanding of their symptoms, diagnosis, and treatment plans, which could help patients engage in more effective and economical consultations with dermatologists. With SkinGPT-4, patients can have more informed conversations with their doctors, leading to better treatment outcomes and a higher level of satisfaction. To demonstrate the robustness of SkinGPT-4, we conducted quantitative evaluations on 150 real-life cases, which were independently reviewed by certified dermatologists (Figure 4 and Supplementary information). The results showed that SkinGPT-4 consistently provided accurate diagnoses of skin diseases. It is important to note that while SkinGPT4 is not a substitute for medical professionals, it greatly enhances users’ understanding of their medical conditions, facilitates improved communication between patients and doctors, expedites the diagnostic process for dermatologists, and has the potential to advance human-centred care and healthcare equity, particularly in underdeveloped regions [66]. In summary, SkinGPT-4 represents a significant leap forward in the field of dermatology diagnosis in the era of large language models.
4学会准确诊断特定类型的皮肤病。这种全面的训练方法确保了系统在分析和分类各种皮肤状况方面的熟练程度。借助SkinGPT-4,用户可以上传自己的皮肤照片进行诊断。该系统自主评估图像、识别皮肤问题的特征和类别、进行深入分析,并提供交互式治疗建议(图3)。此外,SkinGPT-4的本地化部署能力以及对用户隐私的坚定承诺,使其成为患者寻求可靠皮肤问题评估时值得信赖的精准诊断工具。同时,我们证明SkinGPT-4能帮助患者更清晰地了解自身症状、诊断和治疗方案,从而促进患者与皮肤科医生进行更高效经济的咨询。通过SkinGPT-4,患者能与医生展开更专业的对话,获得更好的治疗效果和更高满意度。为验证SkinGPT-4的稳健性,我们对150个真实病例进行了量化评估(图4及补充资料),所有病例均经认证皮肤科医生独立复核。结果表明SkinGPT-4能持续提供准确的皮肤病诊断。需特别说明的是,虽然SkinGPT-4不能替代专业医疗人员,但它能显著提升用户对病情的理解,改善医患沟通效率,加速皮肤科医生的诊断流程,并有望推动以人为本的医疗关怀和健康公平,特别是在欠发达地区[66]。总之,在大语言模型时代,SkinGPT-4标志着皮肤病诊断领域的重大飞跃。
2 RESULTS
2 结果
2.1 The Overall Design of SkinGPT-4
2.1 SkinGPT-4的整体设计
SkinGPT-4 is an interactive system designed to provide a natural language-based diagnosis of skin disease images as shown in Figure 1. The process commences when the user uploads a skin image, which undergoes encoding by the Vision Transformer (VIT) and Q-Transformer models to comprehend its contents. The VIT model partitions the image into smaller patches and extracts vital features like edges, textures, and shapes. After that, the Q-Transformer model generates an embedding of the image based on the features identified by the VIT model, which is done by using a transformer-based architecture that allows the model to consider the context of the image. The alignment layer facilitates the synchronization of visual information and natural language, and the Vicuna component generates the text-based diagnosis. SkinGPT-4 is fine-tuned on MiniGPT-4 using large skin disease images along with clinical concepts and doctors’ notes to allow for interactive dermatological diagnosis. The system could provide an interactive and user-friendly way to help users self-diagnose skin diseases.
SkinGPT-4是一个交互式系统,旨在为皮肤病图像提供基于自然语言的诊断,如图1所示。该过程始于用户上传皮肤图像,随后通过Vision Transformer (VIT)和Q-Transformer模型进行编码以理解其内容。VIT模型将图像分割成小块并提取关键特征(如边缘、纹理和形状)。接着,Q-Transformer模型基于VIT识别的特征生成图像嵌入,这一过程利用了基于Transformer的架构,使模型能够考虑图像的上下文。对齐层实现了视觉信息与自然语言的同步,而Vicuna组件则生成基于文本的诊断结果。SkinGPT-4在MiniGPT-4基础上使用大型皮肤病图像、临床概念及医生笔记进行微调,以实现交互式皮肤病诊断。该系统可提供交互式且用户友好的方式,帮助用户自我诊断皮肤病。
Fig. 2. Illustration of our datasets for two-step training of SkinGPT-4. The notes below each image indicate clinical concepts and types of skin diseases. In addition, we have detailed descriptions from the certified dermatologists for images in the step 2 dataset. To avoid causing discomfort, we used a translucent grey box to obscure the displayed skin disease images.
图 2: SkinGPT-4 两步训练数据集的示意图。每张图片下方的注释标注了临床概念和皮肤病类型。此外,我们对第二步数据集中的图片附有认证皮肤科医生的详细描述。为避免引起不适,我们使用半透明灰色方框遮盖了显示的皮肤病图像。
2.2 Interactive, Informative and Understandable Dermatology Diagnosis of SkinGPT-4
2.2 SkinGPT-4 的交互式、信息丰富且易于理解的皮肤病诊断
SkinGPT-4 brings forth a multitude of advantages for both patients and dermatologists. One notable benefit lies in its utilization of comprehensive and trustworthy medical knowledge specifically tailored to skin diseases. This empowers SkinGPT-4 to deliver interactive diagnoses, explanations, and recommendations for skin diseases (Supplementary Video), which presents a challenge for MiniGPT-4. Unlike MiniGPT-4, which lacks training with pertinent medical knowledge and domain-specific adaptation, SkinGPT-4 overcomes this limitation, enhancing its proficiency in the dermatological domain. To demonstrate the advantage of SkinGPT-4 over MiniGPT-4, we presented two real-life examples of interactive diagnosis as shown in Figure 3. In Figure 3a, an image is presented of an elderly with actinic keratosis on her face. In Figure 3b, an image is provided of a patient with eczema fingertips.
SkinGPT-4为患者和皮肤科医生带来了多重优势。其显著优势在于利用了专门针对皮肤病的全面且可靠的医学知识。这使得SkinGPT-4能够提供皮肤病的交互式诊断、解释和建议(补充视频),而这正是MiniGPT-4面临的挑战。与未经过相关医学知识训练和领域适配的MiniGPT-4不同,SkinGPT-4克服了这一局限,提升了在皮肤病学领域的专业能力。为展示SkinGPT-4相对于MiniGPT-4的优势,我们提供了两个交互式诊断的真实案例,如图3所示。在图3a中,展示了一位面部患有光化性角化症的老年人图像;图3b则呈现了指尖湿疹患者的图像。
Fig. 3. Diagnosis generated by SkinGPT-4, SkinGPT-4 (step 1 only), SkinGPT-4 (step 2 only), MiniGPT-4 and Dermatologists. a. A case of actinic keratosis. b. A case of eczema fingertips.
图 3: 由SkinGPT-4、SkinGPT-4(仅步骤1)、SkinGPT-4(仅步骤2)、MiniGPT-4和皮肤科医生生成的诊断结果。a. 光化性角化病病例。b. 指尖湿疹病例。
For the actinic keratosis case (Figure 3a), MiniGPT-4 identified features like small and red bumps, and incorrectly diagnosed the skin disease as acne, while SkinGPT-4 identified features like plaque, nodules, pustules, and scarring, and diagnosed the skin disease as actinic keratosis, which is a common skin condition caused by prolonged exposure to the sun’s ultraviolet (UV) rays [67]. During the interactive dialogue, SkinGPT-4 also suggested the cause of the skin disease to be sun exposure, which was also verified as correct by the certified dermatologist. For the example of eczema fingertips case (Figure 3b), MiniGPT-4 identified some features like cracks and skin flakes, missed the type of the skin disease, and diagnosed the cause of the skin disease to be dry weather and excessive hand washing. In comparison, SkinGPT-4 identified either the features of the skin disease as dry itchy and flaky skin, and diagnosed the type of the skin disease to be eczema fingertips, which was also verified by certified dermatologists.
对于光化性角化病案例(图3a),MiniGPT-4识别出小红疹等特征,却误诊为痤疮;而SkinGPT-4则准确识别出斑块、结节、脓疱和瘢痕等特征,诊断为光化性角化病——这是一种由长期暴露于太阳紫外线(UV)引起的常见皮肤病[67]。在交互对话中,SkinGPT-4还指出病因是日晒,该结论也获得了认证皮肤科医生的确认。针对指尖湿疹案例(图3b),MiniGPT-4虽然识别出皲裂和皮屑等特征,但未能判断皮肤病类型,仅将病因归结于干燥天气和过度洗手;相比之下,SkinGPT-4准确识别出皮肤干燥瘙痒脱屑的特征,诊断为指尖湿疹,该判断同样得到了专业皮肤科医生的验证。
Fig. 4. Clinical evaluation of SkinGPT-4 by certified offline and online dermatologists. a. Questionnaire-based assessment of SkinGPT-4 by offline dermatologists. b. Response time of SkinGPT-4 compared to consulting dermatologists online.
图 4: 专业线下及线上皮肤科医生对 SkinGPT-4 的临床评估。a. 线下皮肤科医生基于问卷的 SkinGPT-4 评估。b. SkinGPT-4 与在线咨询皮肤科医生的响应时间对比。
In summary, the absence of dermatological knowledge and domain-specific adaptation poses a significant challenge for MiniGPT-4 in achieving accurate dermatological diagnoses. Contrast ingly, SkinGPT-4 successfully and accurately identified the characteristics of the skin diseases displayed in the images. It not only suggested potential disease types but also provided recommendations for potential treatments. This further highlights that domain-specific adaption is crucial for SkinGPT-4 to work for the dermatological diagnosis.
总之,缺乏皮肤病学知识和领域特定适配对MiniGPT-4实现准确皮肤病诊断构成了重大挑战。相比之下,SkinGPT-4成功且准确地识别了图像中皮肤病的特征,不仅提出了潜在疾病类型建议,还提供了治疗推荐方案。这进一步表明,领域特定适配对于SkinGPT-4实现皮肤病诊断功能至关重要。
2.3 SkinGPT-4 Masters Medical Features to Improve Diagnosis with the Two-step Training
2.3 SkinGPT-4 通过两步训练掌握医学特征以提升诊断能力
To further illustrate the capability of SkinGPT-4 in enhancing dermatological diagnosis through learning medical features in skin disease images, we conducted ablation studies, as depicted in Figure 3 by training SkinGPT-4 using either solely the step 1 dataset or solely the step 2 dataset. As specified in Method and illustrated in Figure 2, we designed a two-step training process for SkinGPT-4. Initially, we utilized the step 1 dataset to familiarize SkinGPT-4 with the medical features present in dermatological images and allow SkinGPT-4 to express medical features in skin disease images with natural language. Subsequently, we employed the step 2 dataset to train SkinGPT-4 to achieve a more precise diagnosis of disease types.
为进一步展示SkinGPT-4通过学习皮肤病图像中的医学特征来增强皮肤科诊断的能力,我们进行了消融实验(如图3所示),分别仅使用步骤1数据集或仅使用步骤2数据集训练SkinGPT-4。如方法部分所述(图2所示),我们为SkinGPT-4设计了两阶段训练流程:首先利用步骤1数据集使模型熟悉皮肤病图像中的医学特征,并学会用自然语言描述这些特征;随后通过步骤2数据集训练模型实现更精确的疾病类型诊断。
In the instance of actinic keratosis (Figure 3a), which is a hard case, SkinGPT-4 trained solely on the step 1 dataset demonstrated its proficiency in identifying pertinent medical features such as plaque, crust, erythema, and umbilicated. These precise and comprehensive morphological descriptions accurately captured the characteristics of the skin disease depicted in the image. However, when SkinGPT-4 was exclusively trained on the step 1 dataset, it erroneously diagnosed the skin condition as a viral infection, indicating the importance of incorporating the step 2 dataset for more accurate disease identification. In contrast, when trained solely on the step 2 dataset, SkinGPT-4 failed to capture the accurate morphological descriptions of the skin diseases and instead incorrectly diagnosed it as the result of excessive sebum production. It highlights the necessity of incorporating the step 1 dataset to effectively recognize and comprehend the specific medical features essential for precise dermatological diagnoses. In comparison, SkinGPT-4 with our two-step training simultaneously identified the medical features, such as plaque, nodules, pustules and scarring, and diagnosed the skin disease as actinic keratosis. For simple cases such as the eczema fingertips shown in Figure 3b, SkinGPT-4 could also provide more detailed descriptions of the skin disease image, encompass the medical features and accurately identify the type of skin disease. In conclusion, the two-step training process we have implemented allows SkinGPT-4 to effectively comprehend and master medical features in dermatological images, thereby significantly enhancing the accuracy of diagnoses, which is particularly crucial for hard cases where precise identification of medical features is paramount to accurately determining the type of disease.
在光化性角化病 (actinic keratosis) 的实例中 (图 3a) ——这是一个疑难病例,仅通过第一步数据集训练的 SkinGPT-4 展现了识别相关医学特征 (如斑块、结痂、红斑和脐凹) 的能力。这些精确全面的形态学描述准确捕捉了图像中皮肤病的特征。然而,当 SkinGPT-4 仅使用第一步数据集训练时,它错误地将皮肤病诊断为病毒感染,这表明引入第二步数据集对提升疾病识别准确性至关重要。相比之下,仅使用第二步数据集训练时,SkinGPT-4 未能准确描述皮肤病的形态特征,反而错误诊断为皮脂分泌过剩所致。这凸显了整合第一步数据集对于有效识别和理解皮肤病诊断关键医学特征的必要性。采用我们两步训练法的 SkinGPT-4 则能同时识别斑块、结节、脓疱和瘢痕等医学特征,并正确诊断为光化性角化病。对于如图 3b 所示湿疹指尖这类简单病例,SkinGPT-4 也能提供更详尽的皮肤病图像描述,涵盖医学特征并准确识别皮肤病类型。综上所述,我们实施的两步训练法使 SkinGPT-4 能有效理解和掌握皮肤病图像中的医学特征,显著提升诊断准确性,这对疑难病例尤为重要——精准识别医学特征是确定疾病类型的关键。
2.4 Clinical Evaluation of SkinGPT-4 by Certified Dermato logi sts
2.4 经认证皮肤科医生对SkinGPT-4的临床评估
To evaluate the reliability and robustness of SkinGPT-4, we conducted a comprehensive study involving a large number of real-life cases (150) and compared its diagnoses with those of certified dermatologists. The results, presented in Table 2 and Supplementary information, demonstrated that SkinGPT-4 consistently provided accurate diagnoses that were in agreement with those of the certified dermatologists as shown in Figure 4, as well as in all cases detailed in the Supplementary information.
为了评估 SkinGPT-4 的可靠性和鲁棒性,我们开展了一项涵盖大量真实病例 (150例) 的综合性研究,并将其诊断结果与认证皮肤科医生的诊断进行对比。如表 2 和补充信息所示,结果表明 SkinGPT-4 始终能提供与认证皮肤科医生一致的准确诊断 (如图 4 所示) ,且在补充信息详述的所有病例中均表现一致。
Among the 150 cases, a significant percentage of SkinGPT-4’s diagnoses $(78.76%)$ were evaluated as correct or relevant by certified dermatologists. This evaluation encompassed both strongly agree $(73.13%)$ ) and agree $(5.63%)$ ). Additionally, SkinGPT-4’s responses regarding the causes of the disease and potential treatments were considered informative $(80.63%)$ and useful $(83.13%)$ by the doctors. Furthermore, SkinGPT-4 proved to be a valuable tool for doctors in the diagnosis process $(85%)$ and for patients in gaining a better understanding of their diseases $(81.25%)$ ). The capability of SkinGPT-4 to support local deployment, ensuring user privacy, garnered high agreement $(91.88%)$ ), further enhancing the willingness to utilize SkinGPT-4 $(75%)$ .
在150个案例中,经认证皮肤科医生评估,SkinGPT-4的诊断结果有显著比例(78.76%)被认为是正确或相关的。这一评估包括强烈同意(73.13%)和同意(5.63%)。此外,医生认为SkinGPT-4关于疾病原因和潜在治疗方案的答复具有信息性(80.63%)且实用(83.13%)。进一步证明,SkinGPT-4对医生的诊断过程(85%)和患者更好地理解自身疾病(81.25%)都具有重要价值。SkinGPT-4支持本地部署以确保用户隐私的能力获得高度认可(91.88%),从而进一步提升使用意愿(75%)。
Overall, the study demonstrated that SkinGPT-4 delivers reliable diagnoses, aids doctors in the diagnostic process, facilitates patient understanding, and prioritizes user privacy, making it a valuable asset in the field of dermatology.
总体而言,该研究表明SkinGPT-4能提供可靠诊断、辅助医生诊疗流程、促进患者理解并优先保障用户隐私,是皮肤病学领域的宝贵工具。
2.5 SkinGPT-4 Acts as a 24/7 On-call Family Doctor
2.5 SkinGPT-4 担任24小时待命的家庭医生
In comparison to online consultations with dermatologists, which often entail waiting minutes for a response, SkinGPT4 offers several advantages. Firstly, it is available $24/7,$ ensuring constant access to medical advice. Additionally, SkinGPT-4 provides faster response times, typically within seconds, as depicted in Figure 4b, which makes it a swift and convenient option for patients requiring immediate diagnoses outside of regular office hours.
与需要等待数分钟才能得到回复的在线皮肤科医生咨询相比,SkinGPT4具有多项优势。首先,它提供24/7全天候服务,确保随时获取医疗建议。此外,如图4b所示,SkinGPT-4的响应速度通常仅需数秒,这对需要在非工作时间获得即时诊断的患者而言,是一种快速便捷的选择。
Moreover, SkinGPT-4’s ability to offer preliminary diagnoses empowers patients to make informed decisions about seeking in-person medical attention. This feature can help reduce unnecessary visits to the doctor’s office, saving patients both time and money. The potential to improve healthcare access is particularly significant in rural areas or regions experiencing a scarcity of dermatologists. In such areas, patients often face lengthy waiting times or must travel considerable distances to see a dermatologist [68]. By leveraging SkinGPT-4, patients can swiftly and conveniently receive preliminary diagnoses, potentially diminishing the need for in-person visits and alleviating the strain on healthcare syst