SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model
SkinGPT-4: 基于可视化大语言模型的交互式皮肤病诊断系统
Juexiao Zhou1,2,#, Xiaonan , , Liyuan , , Xiuying Chen1,2, Yuetan Chu1,2, Longxi Zhou1,2, Xingyu Liao1,2, Bin Zhang1,2, Xin Gao1,2,∗
周珏晓1,2,#, 肖楠 , 李源 , 简南 , 陈秀英1,2, 褚月潭1,2, 周龙溪1,2, 廖星宇1,2, 张斌1,2, 高鑫1,2,∗
Abstract—Skin and subcutaneous diseases rank high among the leading contributors to the global burden of nonfatal diseases, impacting a considerable portion of the population. Nonetheless, the field of dermatology diagnosis faces three significant hurdles. Firstly, there is a shortage of dermatologists accessible to diagnose patients, particularly in rural regions. Secondly, accurately interpreting skin disease images poses a considerable challenge. Lastly, generating patient-friendly diagnostic reports is usually a time-consuming and labor-intensive task for dermatologists. To tackle these challenges, we present SkinGPT-4, which is the world’s first interactive dermatology diagnostic system powered by an advanced visual large language model. SkinGPT-4 leverages a fine-tuned version of MiniGPT-4, trained on an extensive collection of skin disease images (comprising 52,929 publicly available and proprietary images) along with clinical concepts and doctors’ notes. We designed a two-step training process to allow SkinGPT-4 to express medical features in skin disease images with natural language and make accurate diagnoses of the types of skin diseases. With SkinGPT-4, users could upload their own skin photos for diagnosis, and the system could autonomously evaluate the images, identifies the characteristics and categories of the skin conditions, performs in-depth analysis, and provides interactive treatment recommendations. Meanwhile, SkinGPT-4’s local deployment capability and commitment to user privacy also render it an appealing choice for patients in search of a dependable and precise diagnosis of their skin ailments. To demonstrate the robustness of SkinGPT-4, we conducted quantitative evaluations on 150 real-life cases, which were independently reviewed by certified dermatologists, and showed that SkinGPT-4 could provide accurate diagnoses of skin diseases. Though SkinGPT-4 is not a substitute for doctors, it could enhance users’ comprehension of their medical conditions, facilitate improve communication between patients and doctors, expedite the diagnostic process for dermatologists, and potentially promote human-centred care and healthcare equity in underdeveloped areas.
摘要—在全球非致命性疾病负担的主要诱因中,皮肤及皮下组织疾病位居前列,影响着大量人群。然而皮肤科诊断领域面临三大挑战:一是诊断医师资源短缺,农村地区尤为突出;二是皮肤病图像识别存在显著困难;三是生成患者友好型诊断报告往往耗费医师大量时间精力。为此,我们推出全球首个基于先进视觉大语言模型的交互式皮肤病诊断系统SkinGPT-4。该系统采用经皮肤病图像集(含52,929张公开及专有图像)、临床概念和医师笔记微调的MiniGPT-4版本,通过两阶段训练实现自然语言描述皮损特征与精准分型诊断。用户可上传皮肤照片获取自主评估,系统能识别皮损特征与分类,提供深度分析及交互式诊疗建议。其本地化部署能力与隐私保护机制,使其成为患者寻求可靠精准诊断的优选方案。我们在150例经认证皮肤科医师独立复核的真实病例上开展定量评估,证实该系统可提供准确诊断。尽管无法替代医师,但能提升患者对病情的理解,优化医患沟通,加速诊断流程,并有望推动欠发达地区以人为本的医疗公平。
Index Terms—Dermatology, Deep learning, Large language model
关键词—皮肤病学、深度学习、大语言模型
1 INTRODUCTION
1 引言
Skin and subcutaneous diseases rank as the fourth major cause of nonfatal disease burden worldwide, affecting a considerable proportion of individuals, with a prevalence ranging from $30%$ to $70%$ across all ages and regions [1]. However, dermatologists are consistently in short supply, particularly in rural areas, and consultation costs are on the rise [2], [3], [4]. As a result, the responsibility of diagnosis often falls on non-specialists such as primary care physicians, nurse practitioners, and physician assistants, which may have limited knowledge and training [5] and low accuracy on diagnosis [6], [7]. The use of store-andforward tele dermatology has become dramatically popular in order to expand the range of services available to medical professionals [8], which involves transmitting digital images of the affected skin area (usually taken using a digital camera or smartphone) [9] and other relevant medical information from users to dermatologists. Then, the dermatologist reviews the case remotely and advises on diagnosis, workup, treatment, and follow-up recommendations [10], [11]. Nonetheless, the field of dermatology diagnosis faces three significant hurdles [12]. Firstly, there is a shortage of dermatologists accessible to diagnose patients, particularly in rural regions. Secondly, accurately interpreting skin disease images poses a considerable challenge. Lastly, generating patient-friendly diagnostic reports is usually a time-consuming and labor-intensive task for dermatologists [4], [13].
皮肤及皮下组织疾病是全球非致命性疾病负担的第四大诱因,影响约30%至70%的全球各年龄段和地区人群[1]。然而皮肤科医生长期短缺(尤其在农村地区)且诊疗费用持续攀升[2][3][4],导致诊断工作常由全科医生、执业护士等非专科人员承担,这些从业者可能存在专业知识局限[5]和诊断准确率偏低的问题[6][7]。为拓展医疗服务范围,存储转发式远程皮肤病学(store-and-forward teledermatology)应用日益广泛[8],该模式要求用户将患处数字图像(通常通过数码相机或智能手机拍摄)[9]及相关医疗信息传输给皮肤科医生,由医生远程评估后提供诊断、检查、治疗及随访建议[10][11]。当前皮肤病诊断领域面临三大挑战[12]:一是接诊医生(尤其是偏远地区)数量不足;二是皮肤病图像精准判读存在显著难度;三是生成患者友好型诊断报告通常耗费皮肤科医生大量时间精力[4][13]。
Advancements in technology have led to the development of various tools and techniques to aid dermatologists in their diagnosis [13], [14], [15]. For example, the development of artificial intelligence tools to aid in the diagnosis of skin disorders from images has been made possible by recent advancements in deep learning [16], [17], such as skin cancer classification [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], dermatopathology [28], [29], [30], predicting novel risk factors or epidemiology [31], [32], identifying ony cho my cos is [33], quantifying alopecia areata [34], classify skin lesions from mpox virus infection [35], and so on [4]. Among these, most studies have predominantly concentrated on identifying skin lesions through dermoscopic images [36], [37], [38]. However, der matos copy is often not readily available outside of dermatology clinics. Some studies have explored the use of clinical photographs of skin cancer [18], ony cho my cos is [33], and skin lesions on educational websites [39]. Nevertheless, those methods are tailored for particular diagnostic objectives as classification tasks and their approach still requires further analysis by dermatologists to issue reports and make clinical decisions. Those methods are unable to automatically generate detailed reports in natural language and allow interactive dialogues with patients. At present, there are no such diagnostic systems available for users to self-diagnose skin conditions by submitting images that can automatically and interactively analyze and generate easy-to-understand text reports.
技术进步推动了多种辅助皮肤科医生诊断的工具和技术发展 [13]、[14]、[15]。例如,深度学习的最新进展使得通过图像辅助诊断皮肤疾病的人工智能工具成为可能 [16]、[17],包括皮肤癌分类 [18]、[19]、[20]、[21]、[22]、[23]、[24]、[25]、[26]、[27]、皮肤病理学 [28]、[29]、[30]、预测新型风险因素或流行病学 [31]、[32]、识别甲真菌病 [33]、量化斑秃 [34]、分类猴痘病毒感染引起的皮肤病变 [35] 等 [4]。其中,大多数研究主要集中于通过皮肤镜图像识别皮肤病变 [36]、[37]、[38]。然而,皮肤镜检查在皮肤科诊所之外通常不易获得。一些研究探索了使用皮肤癌的临床照片 [18]、甲真菌病 [33] 以及教育网站上的皮肤病变图像 [39]。尽管如此,这些方法针对特定诊断目标(如分类任务)进行了定制,其方法仍需要皮肤科医生进一步分析以出具报告并做出临床决策。这些方法无法自动生成自然语言的详细报告,也无法与患者进行交互式对话。目前,尚无此类诊断系统可供用户通过提交图像进行自我诊断,并自动交互式分析和生成易于理解的文本报告。
Over the past few months, the field of large language models (LLMs) has seen significant advancements [40], [41], offering remarkable language comprehension abilities and the potential to perform complex linguistic tasks. One of the most anticipated models is GPT-4 [42], which is a largescale multimodal model that has demonstrated exceptional capabilities, such as generating accurate and detailed image descriptions, providing explanations for atypical visual occurrences, constructing websites based on handwritten textual descriptions, and even acting as family doctors [43]. Despite these remarkable advancements, some features of GPT-4 are still not accessible to the public and are closedsource. Users need to pay and use some features through API. As an accessible alternative, ChatGPT, which is also developed by OpenAI, has demonstrated the potential to assist in disease diagnosis through conversation with patients [44], [45], [46], [46], [47], [48], [49]. By leveraging its advanced natural language processing capabilities, ChatGPT could interpret symptoms and medical history provided by patients and make suggestions for potential diagnoses or referrals to appropriate dermatological specialists [50]. However, ChatGPT currently only allows text input and does not support direct image input for diagnosis, which limits its availability for dermatological diagnosis.
过去几个月,大语言模型 (LLM) 领域取得了重大进展 [40][41],展现出卓越的语言理解能力及执行复杂语言任务的潜力。其中最受期待的 GPT-4 [42] 作为大规模多模态模型,已展现出多项非凡能力:生成精确细致的图像描述、解释非典型视觉现象、根据手写文本描述构建网站,甚至能充当家庭医生 [43]。尽管成果显著,GPT-4 的某些功能仍未向公众开放且闭源,用户需通过付费 API 使用部分功能。作为可替代方案,OpenAI 开发的 ChatGPT 已展现通过医患对话辅助疾病诊断的潜力 [44][45][46][47][48][49],其先进自然语言处理能力可解析患者提供的症状与病史,并提出初步诊断建议或转诊至皮肤科专科医师 [50]。但 ChatGPT 目前仅支持文本输入,无法直接通过图像进行诊断,这限制了其在皮肤科诊断中的应用。
The idea of providing skin images directly for automatic dermatological diagnosis and generating text reports is exciting because it could greatly help solve the three aforementioned challenges in the field of dermatology diagnosis. However, there exists no method to accomplish this at present. But in related areas, ChatCAD [51] is one of the most advanced approaches that designed various networks to take X-rays, CT scans, and MRIs images to generate diverse outputs, which are then transformed into text descriptions. These descriptions are combined as inputs to ChatGPT to generate a condensed report and offer interactive explanations and medical recommendations based on the given image. However, their proposed visiontext models were limited to certain tasks. Meanwhile, for ChatCAD, users need to use ChatGPT’s API to upload text descriptions, which could raise data privacy issues [41], [52], [53] as both medical images and text descriptions contain a lot of patients’ private information [54], [55], [56], [57]. To address those issues, MiniGPT-4 [58] is the first open-source method that allows users to deploy locally to interface images with state-of-the-art LLMs and interact using natural language without the need to fine-tune both pre-trained large models but only a small alignment layer. MiniGPT4 aims to combine the power of a large language model with visual information obtained from a pre-trained vision encoder. To achieve this, the model uses Vicuna [59] as its language decoder, which is built on top of LLaMA [60] and is capable of performing complex linguistic tasks. To process visual information, the same visual encoder used in BLIP-2 [61] is employed, which consists of a ViT [62] backbone combined with a pre-trained Q-Former. Both the language and vision models are open-source. To bridge the gap between the visual encoder and the language model, MiniGPT-4 utilizes a linear projection layer. However, MiniGPT-4 is trained on the combined dataset of Conceptual Caption [63], SBU [64], and LAION [65], which are irrelevant to medical images, especially dermatological images. Therefore, it is still challenging to directly apply MiniGPT-4 to specific domains such as formal dermatology diagnosis.
直接提供皮肤图像用于自动皮肤病诊断并生成文本报告的想法令人振奋,因为这能极大助力解决皮肤病诊断领域的上述三大挑战。然而目前尚无方法能实现这一目标。在相关领域中,ChatCAD [51] 是最先进的方法之一,它设计了多种网络来处理X光、CT和MRI图像以生成多样化输出,再将其转化为文本描述。这些描述组合后输入ChatGPT,生成精简报告并提供基于给定图像的交互式解释与医疗建议。但其所提出的视觉-文本模型仅适用于特定任务。同时,ChatCAD需要用户通过ChatGPT API上传文本描述,这可能引发数据隐私问题 [41][52][53],因为医学图像和文本描述均包含大量患者隐私信息 [54][55][56][57]。
为解决这些问题,MiniGPT-4 [58] 成为首个开源方法,允许用户本地部署以连接图像与前沿大语言模型,通过自然语言交互而无需对两个预训练大模型进行微调,仅需调整小型对齐层。MiniGPT-4旨在将大语言模型能力与预训练视觉编码器获取的视觉信息相结合。为此,该模型采用基于LLaMA [60] 构建的Vicuna [59] 作为语言解码器,可执行复杂语言任务。视觉处理则使用与BLIP-2 [61] 相同的视觉编码器,包含ViT [62] 主干网络与预训练Q-Former。语言和视觉模型均为开源。为弥合视觉编码器与语言模型间的鸿沟,MiniGPT-4采用了线性投影层。然而,MiniGPT-4的训练数据来自Conceptual Caption [63]、SBU [64] 和LAION [65] 的组合数据集,这些数据与医学图像(尤其是皮肤病图像)无关。因此,直接将MiniGPT-4应用于正式皮肤病诊断等特定领域仍具挑战性。
Here, we propose SkinGPT-4, the world’s first dermatology diagnostic system powered by an advanced visionbased large language model (Figure 1). SkinGPT-4 leverages a fine-tuned version of MiniGPT-4, trained on an extensive collection of skin disease images (comprising 52,929 publicly available and proprietary images) along with clinical concepts and doctors’ notes. We designed a two-step training process to develop SkinGPT-4 as shown in Figure 2. In the initial step, SkinGPT-4 aligns visual and textual clinical concepts, enabling it to recognize medical features within skin disease images and express those medical features with natural language. In the subsequent step, SkinGPT
在此,我们提出SkinGPT-4,这是全球首个基于先进视觉大语言模型的皮肤病诊断系统(图1)。SkinGPT-4采用微调版MiniGPT-4架构,训练数据包含52,929张公开及专有皮肤疾病图像,并整合临床概念与医生笔记。如图2所示,我们设计了两阶段训练流程:第一阶段实现视觉特征与文本临床概念的对齐,使模型能够识别皮肤病图像中的医学特征并用自然语言描述;第二阶段着重提升SkinGPT...

Fig. 1. Illustration of SkinGPT-4. SkinGPT-4 incorporates a fine-tuned version of MiniGPT-4 on a vast collection (52,929) of both public and inhouse skin disease images, accompanied by clinical concepts and doctors’ notes. With SkinGPT-4, users could upload their own skin photos for diagnosis, and SkinGPT-4 could autonomously determine the characteristics and categories of skin conditions, perform analysis, provide treatment recommendations, and allow interactive diagnosis. On the right is an example of interactive diagnosis.
图 1: SkinGPT-4示意图。SkinGPT-4基于公开及内部皮肤疾病图像数据集(共52,929张)对MiniGPT-4进行微调,并结合临床概念与医生注释。用户可通过SkinGPT-4上传皮肤照片进行诊断,该系统能自主识别皮肤病症特征与类别,进行分析并提供治疗建议,支持交互式诊断。右侧为交互诊断示例。
4 learns to accurately diagnoses the specific types of skin diseases. This comprehensive training methodology ensures the system’s proficiency in analyzing and classifying various skin conditions. With SkinGPT-4, users have the ability to upload their own skin photos for diagnosis. The system autonomously evaluates the images, identifies the characteristics and categories of the skin conditions, performs in-depth analysis, and provides interactive treatment recom mend at ions (Figure 3). Moreover, SkinGPT-4’s localized deployment capability and a strong commitment to user privacy make it a trustworthy and precise diagnostic tool for patients seeking reliable assessments of their skin ailments. Meanwhile, we showed that SkinGPT-4 could empower patients to gain a clearer understanding of their symptoms, diagnosis, and treatment plans, which could help patients engage in more effective and economical consultations with dermatologists. With SkinGPT-4, patients can have more informed conversations with their doctors, leading to better treatment outcomes and a higher level of satisfaction. To demonstrate the robustness of SkinGPT-4, we conducted quantitative evaluations on 150 real-life cases, which were independently reviewed by certified dermatologists (Figure 4 and Supplementary information). The results showed that SkinGPT-4 consistently provided accurate diagnoses of skin diseases. It is important to note that while SkinGPT4 is not a substitute for medical professionals, it greatly enhances users’ understanding of their medical conditions, facilitates improved communication between patients and doctors, expedites the diagnostic process for dermatologists, and has the potential to advance human-centred care and healthcare equity, particularly in underdeveloped regions [66]. In summary, SkinGPT-4 represents a significant leap forward in the field of dermatology diagnosis in the era of large language models.
4学会准确诊断特定类型的皮肤病。这种全面的训练方法确保了系统在分析和分类各种皮肤状况方面的熟练程度。借助SkinGPT-4,用户可以上传自己的皮肤照片进行诊断。该系统自主评估图像、识别皮肤问题的特征和类别、进行深入分析,并提供交互式治疗建议(图3)。此外,SkinGPT-4的本地化部署能力以及对用户隐私的坚定承诺,使其成为患者寻求可靠皮肤问题评估时值得信赖的精准诊断工具。同时,我们证明SkinGPT-4能帮助患者更清晰地了解自身症状、诊断和治疗方案,从而促进患者与皮肤科医生进行更高效经济的咨询。通过SkinGPT-4,患者能与医生展开更专业的对话,获得更好的治疗效果和更高满意度。为验证SkinGPT-4的稳健性,我们对150个真实病例进行了量化评估(图4及补充资料),所有病例均经认证皮肤科医生独立复核。结果表明SkinGPT-4能持续提供准确的皮肤病诊断。需特别说明的是,虽然SkinGPT-4不能替代专业医疗人员,但它能显著提升用户对病情的理解,改善医患沟通效率,加速皮肤科医生的诊断流程,并有望推动以人为本的医疗关怀和健康公平,特别是在欠发达地区[66]。总之,在大语言模型时代,SkinGPT-4标志着皮肤病诊断领域的重大飞跃。
2 RESULTS
2 结果
2.1 The Overall Design of SkinGPT-4
2.1 SkinGPT-4的整体设计
SkinGPT-4 is an interactive system designed to provide a natural language-based diagnosis of skin disease images as shown in Figure 1. The process commences when the user uploads a skin image, which undergoes encoding by the Vision Transformer (VIT) and Q-Transformer models to comprehend its contents. The VIT model partitions the image into smaller patches and extracts vital features like edges, textures, and shapes. After that, the Q-Transformer model generates an embedding of the image based on the features identified by the VIT model, which is done by using a transformer-based architecture that allows the model to consider the context of the image. The alignment layer facilitates the synchronization of visual information and natural language, and the Vicuna component generates the text-based diagnosis. SkinGPT-4 is fine-tuned on MiniGPT-4 using large skin disease images along with clinical concepts and doctors’ notes to allow for interactive dermatological diagnosis. The system could provide an interactive and user-friendly way to help users self-diagnose skin diseases.
SkinGPT-4是一个交互式系统,旨在为皮肤病图像提供基于自然语言的诊断,如图1所示。该过程始于用户上传皮肤图像,随后通过Vision Transformer (VIT)和Q-Transformer模型进行编码以理解其内容。VIT模型将图像分割成小块并提取关键特征(如边缘、纹理和形状)。接着,Q-Transformer模型基于VIT识别的特征生成图像嵌入,这一过程利用了基于Transformer的架构,使模型能够考虑图像的上下文。对齐层实现了视觉信息与自然语言的同步,而Vicuna组件则生成基于文本的诊断结果。SkinGPT-4在MiniGPT-4基础上使用大型皮肤病图像、临床概念及医生笔记进行微调,以实现交互式皮肤病诊断。该系统可提供交互式且用户友好的方式,帮助用户自我诊断皮肤病。

Fig. 2. Illustration of our datasets for two-step training of SkinGPT-4. The notes below each image indicate clinical concepts and types of skin diseases. In addition, we have detailed descriptions from the certified dermatologists for images in the step 2 dataset. To avoid causing discomfort, we used a translucent grey box to obscure the displayed skin disease images.
图 2: SkinGPT-4 两步训练数据集的示意图。每张图片下方的注释标注了临床概念和皮肤病类型。此外,我们对第二步数据集中的图片附有认证皮肤科医生的详细描述。为避免引起不适,我们使用半透明灰色方框遮盖了显示的皮肤病图像。
2.2 Interactive, Informative and Understandable Dermatology Diagnosis of SkinGPT-4
2.2 SkinGPT-4 的交互式、信息丰富且易于理解的皮肤病诊断
SkinGPT-4 brings forth a multitude of advantages for both patients and dermatologists. One notable benefit lies in its utilization of comprehensive and trustworthy medical knowledge specifically tailored to skin diseases. This empowers SkinGPT-4 to deliver interactive diagnoses, explanations, and recommendations for skin diseases (Supplementary Video), which presents a challenge for MiniGPT-4. Unlike MiniGPT-4, which lacks training with pertinent medical knowledge and domain-specific adaptation, SkinGPT-4 overcomes this limitation, enhancing its proficiency in the dermatological domain. To demonstrate the advantage of SkinGPT-4 over MiniGPT-4, we presented two real-life examples of interactive diagnosis as shown in Figure 3. In Figure 3a, an image is presented of an elderly with actinic keratosis on her face. In Figure 3b, an image is provided of a patient with eczema fingertips.
SkinGPT-4为患者和皮肤科医生带来了多重优势。其显著优势在于利用了专门针对皮肤病的全面且可靠的医学知识。这使得SkinGPT-4能够提供皮肤病的交互式诊断、解释和建议(补充视频),而这正是MiniGPT-4面临的挑战。与未经过相关医学知识训练和领域适配的MiniGPT-4不同,SkinGPT-4克服了这一局限,提升了在皮肤病学领域的专业能力。为展示SkinGPT-4相对于MiniGPT-4的优势,我们提供了两个交互式诊断的真实案例,如图3所示。在图3a中,展示了一位面部患有光化性角化症的老年人图像;图3b则呈现了指尖湿疹患者的图像。

Fig. 3. Diagnosis generated by SkinGPT-4, SkinGPT-4 (step 1 only), SkinGPT-4 (step 2 only), MiniGPT-4 and Dermatologists. a. A case of actinic keratosis. b. A case of eczema fingertips.

图 3: 由SkinGPT-4、SkinGPT-4(仅步骤1)、SkinGPT-4(仅步骤2)、MiniGPT-4和皮肤科医生生成的诊断结果。a. 光化性角化病病例。b. 指尖湿疹病例。
For the actinic keratosis case (Figure 3a), MiniGPT-4 identified features like small and red bumps, and incorrectly diagnosed the skin disease as acne, while SkinGPT-4 identified features like plaque, nodules, pustules, and scarring, and diagnosed the skin disease as actinic keratosis, which is a common skin condition caused by prolonged exposure to the sun’s ultraviolet (UV) rays [67]. During the interactive dialogue, SkinGPT-4 also suggested the cause of the skin disease to be sun exposure, which was also verified as correct by the certified dermatologist. For the example of eczema fingertips case (Figure 3b), MiniGPT-4 identified some features like cracks and skin flakes, missed the type of the skin disease, and diagnosed the cause of the skin disease to be dry weather and excessive hand washing. In comparison, SkinGPT-4 identified either the features of the skin disease as dry itchy and flaky skin, and diagnosed the type of the skin disease to be eczema fingertips, which was also verified by certified dermatologists.
对于光化性角化病案例(图3a),MiniGPT-4识别出小红疹等特征,却误诊为痤疮;而SkinGPT-4则准确识别出斑块、结节、脓疱和瘢痕等特征,诊断为光化性角化病——这是一种由长期暴露于太阳紫外线(UV)引起的常见皮肤病[67]。在交互对话中,SkinGPT-4还指出病因是日晒,该结论也获得了认证皮肤科医生的确认。针对指尖湿疹案例(图3b),MiniGPT-4虽然识别出皲裂和皮屑等特征,但未能判断皮肤病类型,仅将病因归结于干燥天气和过度洗手;相比之下,SkinGPT-4准确识别出皮肤干燥瘙痒脱屑的特征,诊断为指尖湿疹,该判断同样得到了专业皮肤科医生的验证。

Fig. 4. Clinical evaluation of SkinGPT-4 by certified offline and online dermatologists. a. Questionnaire-based assessment of SkinGPT-4 by offline dermatologists. b. Response time of SkinGPT-4 compared to consulting dermatologists online.
图 4: 专业线下及线上皮肤科医生对 SkinGPT-4 的临床评估。a. 线下皮肤科医生基于问卷的 SkinGPT-4 评估。b. SkinGPT-4 与在线咨询皮肤科医生的响应时间对比。
In summary, the absence of dermatological knowledge and domain-specific adaptation poses a significant challenge for MiniGPT-4 in achieving accurate dermatological diagnoses. Contrast ingly, SkinGPT-4 successfully and accurately identified the characteristics of the skin diseases displayed in the images. It not only suggested potential disease types but also provided recommendations for potential treatments. This further highlights that domain-specific adaption is crucial for SkinGPT-4 to work for the dermatological diagnosis.
总之,缺乏皮肤病学知识和领域特定适配对MiniGPT-4实现准确皮肤病诊断构成了重大挑战。相比之下,SkinGPT-4成功且准确地识别了图像中皮肤病的特征,不仅提出了潜在疾病类型建议,还提供了治疗推荐方案。这进一步表明,领域特定适配对于SkinGPT-4实现皮肤病诊断功能至关重要。
2.3 SkinGPT-4 Masters Medical Features to Improve Diagnosis with the Two-step Training
2.3 SkinGPT-4 通过两步训练掌握医学特征以提升诊断能力
To further illustrate the capability of SkinGPT-4 in enhancing dermatological diagnosis through learning medical features in skin disease images, we conducted ablation studies, as depicted in Figure 3 by training SkinGPT-4 using either solely the step 1 dataset or solely the step 2 dataset. As specified in Method and illustrated in Figure 2, we designed a two-step training process for SkinGPT-4. Initially, we utilized the step 1 dataset to familiarize SkinGPT-4 with the medical features present in dermatological images and allow SkinGPT-4 to express medical features in skin disease images with natural language. Subsequently, we employed the step 2 dataset to train SkinGPT-4 to achieve a more precise diagnosis of disease types.
为进一步展示SkinGPT-4通过学习皮肤病图像中的医学特征来增强皮肤科诊断的能力,我们进行了消融实验(如图3所示),分别仅使用步骤1数据集或仅使用步骤2数据集训练SkinGPT-4。如方法部分所述(图2所示),我们为SkinGPT-4设计了两阶段训练流程:首先利用步骤1数据集使模型熟悉皮肤病图像中的医学特征,并学会用自然语言描述这些特征;随后通过步骤2数据集训练模型实现更精确的疾病类型诊断。
In the instance of actinic keratosis (Figure 3a), which is a hard case, SkinGPT-4 trained solely on the step 1 dataset demonstrated its proficiency in identifying pertinent medical features such as plaque, crust, erythema, and umbilicated. These precise and comprehensive morphological descriptions accurately captured the characteristics of the skin disease depicted in the image. However, when SkinGPT-4 was exclusively trained on the step 1 dataset, it erroneously diagnosed the skin condition as a viral infection, indicating the importance of incorporating the step 2 dataset for more accurate disease identification. In contrast, when trained solely on the step 2 dataset, SkinGPT-4 failed to capture the accurate morphological descriptions of the skin diseases and instead incorrectly diagnosed it as the result of excessive sebum production. It highlights the necessity of incorporating the step 1 dataset to effectively recognize and comprehend the specific medical features essential for precise dermatological diagnoses. In comparison, SkinGPT-4 with our two-step training simultaneously identified the medical features, such as plaque, nodules, pustules and scarring, and diagnosed the skin disease as actinic keratosis. For simple cases such as the eczema fingertips shown in Figure 3b, SkinGPT-4 could also provide more detailed descriptions of the skin disease image, encompass the medical features and accurately identify the type of skin disease. In conclusion, the two-step training process we have implemented allows SkinGPT-4 to effectively comprehend and master medical features in dermatological images, thereby significantly enhancing the accuracy of diagnoses, which is particularly crucial for hard cases where precise identification of medical features is paramount to accurately determining the type of disease.
在光化性角化病 (actinic keratosis) 的实例中 (图 3a) ——这是一个疑难病例,仅通过第一步数据集训练的 SkinGPT-4 展现了识别相关医学特征 (如斑块、结痂、红斑和脐凹) 的能力。这些精确全面的形态学描述准确捕捉了图像中皮肤病的特征。然而,当 SkinGPT-4 仅使用第一步数据集训练时,它错误地将皮肤病诊断为病毒感染,这表明引入第二步数据集对提升疾病识别准确性至关重要。相比之下,仅使用第二步数据集训练时,SkinGPT-4 未能准确描述皮肤病的形态特征,反而错误诊断为皮脂分泌过剩所致。这凸显了整合第一步数据集对于有效识别和理解皮肤病诊断关键医学特征的必要性。采用我们两步训练法的 SkinGPT-4 则能同时识别斑块、结节、脓疱和瘢痕等医学特征,并正确诊断为光化性角化病。对于如图 3b 所示湿疹指尖这类简单病例,SkinGPT-4 也能提供更详尽的皮肤病图像描述,涵盖医学特征并准确识别皮肤病类型。综上所述,我们实施的两步训练法使 SkinGPT-4 能有效理解和掌握皮肤病图像中的医学特征,显著提升诊断准确性,这对疑难病例尤为重要——精准识别医学特征是确定疾病类型的关键。
2.4 Clinical Evaluation of SkinGPT-4 by Certified Dermato logi sts
2.4 经认证皮肤科医生对SkinGPT-4的临床评估
To evaluate the reliability and robustness of SkinGPT-4, we conducted a comprehensive study involving a large number of real-life cases (150) and compared its diagnoses with those of certified dermatologists. The results, presented in Table 2 and Supplementary information, demonstrated that SkinGPT-4 consistently provided accurate diagnoses that were in agreement with those of the certified dermatologists as shown in Figure 4, as well as in all cases detailed in the Supplementary information.
为了评估 SkinGPT-4 的可靠性和鲁棒性,我们开展了一项涵盖大量真实病例 (150例) 的综合性研究,并将其诊断结果与认证皮肤科医生的诊断进行对比。如表 2 和补充信息所示,结果表明 SkinGPT-4 始终能提供与认证皮肤科医生一致的准确诊断 (如图 4 所示) ,且在补充信息详述的所有病例中均表现一致。
Among the 150 cases, a significant percentage of SkinGPT-4’s diagnoses $(78.76%)$ were evaluated as correct or relevant by certified dermatologists. This evaluation encompassed both strongly agree $(73.13%)$ ) and agree $(5.63%)$ ). Additionally, SkinGPT-4’s responses regarding the causes of the disease and potential treatments were considered informative $(80.63%)$ and useful $(83.13%)$ by the doctors. Furthermore, SkinGPT-4 proved to be a valuable tool for doctors in the diagnosis process $(85%)$ and for patients in gaining a better understanding of their diseases $(81.25%)$ ). The capability of SkinGPT-4 to support local deployment, ensuring user privacy, garnered high agreement $(91.88%)$ ), further enhancing the willingness to utilize SkinGPT-4 $(75%)$ .
在150个案例中,经认证皮肤科医生评估,SkinGPT-4的诊断结果有显著比例(78.76%)被认为是正确或相关的。这一评估包括强烈同意(73.13%)和同意(5.63%)。此外,医生认为SkinGPT-4关于疾病原因和潜在治疗方案的答复具有信息性(80.63%)且实用(83.13%)。进一步证明,SkinGPT-4对医生的诊断过程(85%)和患者更好地理解自身疾病(81.25%)都具有重要价值。SkinGPT-4支持本地部署以确保用户隐私的能力获得高度认可(91.88%),从而进一步提升使用意愿(75%)。
Overall, the study demonstrated that SkinGPT-4 delivers reliable diagnoses, aids doctors in the diagnostic process, facilitates patient understanding, and prioritizes user privacy, making it a valuable asset in the field of dermatology.
总体而言,该研究表明SkinGPT-4能提供可靠诊断、辅助医生诊疗流程、促进患者理解并优先保障用户隐私,是皮肤病学领域的宝贵工具。
2.5 SkinGPT-4 Acts as a 24/7 On-call Family Doctor
2.5 SkinGPT-4 担任24小时待命的家庭医生
In comparison to online consultations with dermatologists, which often entail waiting minutes for a response, SkinGPT4 offers several advantages. Firstly, it is available $24/7,$ ensuring constant access to medical advice. Additionally, SkinGPT-4 provides faster response times, typically within seconds, as depicted in Figure 4b, which makes it a swift and convenient option for patients requiring immediate diagnoses outside of regular office hours.
与需要等待数分钟才能得到回复的在线皮肤科医生咨询相比,SkinGPT4具有多项优势。首先,它提供24/7全天候服务,确保随时获取医疗建议。此外,如图4b所示,SkinGPT-4的响应速度通常仅需数秒,这对需要在非工作时间获得即时诊断的患者而言,是一种快速便捷的选择。
Moreover, SkinGPT-4’s ability to offer preliminary diagnoses empowers patients to make informed decisions about seeking in-person medical attention. This feature can help reduce unnecessary visits to the doctor’s office, saving patients both time and money. The potential to improve healthcare access is particularly significant in rural areas or regions experiencing a scarcity of dermatologists. In such areas, patients often face lengthy waiting times or must travel considerable distances to see a dermatologist [68]. By leveraging SkinGPT-4, patients can swiftly and conveniently receive preliminary diagnoses, potentially diminishing the need for in-person visits and alleviating the strain on healthcare systems in these under served regions.
此外,SkinGPT-4提供初步诊断的能力使患者能够就寻求线下医疗做出明智决策。该功能有助于减少不必要的就诊,为患者节省时间和金钱。这一提升医疗可及性的潜力在农村地区或皮肤科医生短缺的区域尤为重要。此类地区的患者通常面临漫长等待时间,或必须长途跋涉才能就诊皮肤科医生[68]。通过SkinGPT-4,患者可快速便捷地获得初步诊断,从而减少线下就诊需求,缓解这些医疗服务不足地区的医疗系统压力。
3 METHODS
3 方法
3.1 Dataset
3.1 数据集
Our datasets include two public datasets and our private in-house dataset, where the first public dataset was used for the step 1 training, and the second public dataset and our in-house dataset were used for the step 2 training.
我们的数据集包含两个公开数据集和一个私有内部数据集,其中第一个公开数据集用于步骤1训练,第二个公开数据集和我们的内部数据集用于步骤2训练。
TABLE 1 Characteristics of Step 1 Dataset. It is possible for a single image to have multiple medical concepts at the same time. The total number of samples is 3886.
| Clinical Concepts | Number of Samples |
| Erythema | 2139 |
| Plaque | 1966 |
| Papule | 1169 |
| own(Hyperpigmentation) | 759 |
| Scale | 686 |
| Crust | 497 |
| Vhite(Hypopigmentation) | 257 |
| Yellow | 245 |
| Erosion | 200 |
| Nodule | 189 |
| Ulcer | 154 |
| Friable | 153 |
| Patch | 149 |
| Dome-shaped | 146 |
| Exudate | 144 |
| Scar | 123 |
| Pustule | 103 |
| Telangiectasia | 100 |
| Black | 90 |
| Purple | 85 |
| Atrophy | 69 |
| Bulla | 64 |
| Umbilicated | 49 |
| Vesicle | 46 |
| Warty/Papillomatous | 46 |
| Excoriation | 46 |
| Exophytic/Fungating | 42 |
| Xerosis | 35 |
| Induration | 33 |
| Fissure | 32 |
| Sclerosis | 27 |
| Pedunculated | 26 |
| Lichenification | 25 |
| Comedo | 24 |
| Wheal | 21 |
| Flat topped | 18 |
| Translucent | 16 |
| Macule | 13 |
| Salmon | 10 |
| Purpura/Petechiae | 10 |
| Acuminate | 8 |
| Cyst | 6 |
| Blue | 5 |
| Abscess | 5 |
| Poikiloderma | 5 |
| Burrow | 5 |
| Gray | 5 |
| Pigmented | 5 |
表 1: 步骤1数据集特征。单张图像可能同时包含多个医学概念。样本总数为3886。
| 临床概念 | 样本数量 |
|---|---|
| 红斑 (Erythema) | 2139 |
| 斑块 (Plaque) | 1966 |
| 丘疹 (Papule) | 1169 |
| 色素沉着 (Hyperpigmentation) | 759 |
| 鳞屑 (Scale) | 686 |
| 结痂 (Crust) | 497 |
| 色素减退 (Hypopigmentation) | 257 |
| 黄色 (Yellow) | 245 |
| 糜烂 (Erosion) | 200 |
| 结节 (Nodule) | 189 |
| 溃疡 (Ulcer) | 154 |
| 易碎 (Friable) | 153 |
| 斑片 (Patch) | 149 |
| 圆顶形 (Dome-shaped) | 146 |
| 渗出物 (Exudate) | 144 |
| 瘢痕 (Scar) | 123 |
| 脓疱 (Pustule) | 103 |
| 毛细血管扩张 (Telangiectasia) | 100 |
| 黑色 (Black) | 90 |
| 紫色 (Purple) | 85 |
| 萎缩 (Atrophy) | 69 |
| 大疱 (Bulla) | 64 |
| 脐凹状 (Umbilicated) | 49 |
| 水疱 (Vesicle) | 46 |
| 疣状/乳头状 (Warty/Papillomatous) | 46 |
| 抓痕 (Excoriation) | 46 |
| 外生性/蕈样 (Exophytic/Fungating) | 42 |
| 干燥症 (Xerosis) | 35 |
| 硬结 (Induration) | 33 |
| 裂隙 (Fissure) | 32 |
| 硬化 (Sclerosis) | 27 |
| 带蒂 (Pedunculated) | 26 |
| 苔藓样变 (Lichenification) | 25 |
| 粉刺 (Comedo) | 24 |
| 风团 (Wheal) | 21 |
| 平顶 (Flat topped) | 18 |
| 半透明 (Translucent) | 16 |
| 斑疹 (Macule) | 13 |
| 鲑鱼色 (Salmon) | 10 |
| 紫癜/瘀点 (Purpura/Petechiae) | 10 |
| 尖锐 (Acuminate) | 8 |
| 囊肿 (Cyst) | 6 |
| 蓝色 (Blue) | 5 |
| 脓肿 (Abscess) | 5 |
| 异色性皮病 (Poikiloderma) | 5 |
| 隧道 (Burrow) | 5 |
| 灰色 (Gray) | 5 |
| 色素性 (Pigmented) | 5 |
The first public dataset named SKINCON [69] is the first medical dataset densely annotated by domain experts to provide annotations useful across multiple disease processes. SKINCON is a skin disease dataset densely annotated by dermatologists and it includes 3230 images from the Fitzpatrick 17k skin disease dataset densely annotated with 48 clinical concepts as shown in Table 1, 22 of which have at least 50 images representing the concept, and 656 skin disease images from the Diverse Dermatology Images dataset. The 48 clinical concepts proposed by SKINCON include Vesicle, Papule, Macule, Plaque, Abscess, Pustule, Bulla, Patch, Nodule, Ulcer, Crust, Erosion, Excoriation, Atrophy, Exudate, Purpura/Petechiae, Fissure, Induration, Xerosis, Tel angie ct asia, Scale, Scar, Friable, Sclerosis, Pedunculated, Exophytic/Fungating, Warty/Papi l loma to us, Domeshaped, Flat-topped, Brown (Hyper pigmentation), Translucent, White (Hypo pigmentation), Purple, Yellow, Black, Erythema, Comedo, Lichen if i cation, Blue, Umb i lica ted, Poikiloderma, Salmon, Wheal, Acuminate, Burrow, Gray, Pigmented, and Cyst.
首个公开数据集SKINCON [69]是首个由领域专家密集标注的医学数据集,其注释适用于多种疾病进程。该皮肤疾病数据集由皮肤科医生精细标注,包含来自Fitzpatrick 17k皮肤疾病数据集的3230张图像(如表1所示),这些图像密集标注了48个临床概念(其中22个概念至少有50张代表性图像),以及来自Diverse Dermatology Images数据集的656张皮肤疾病图像。SKINCON提出的48个临床概念包括:水疱(Vesicle)、丘疹(Papule)、斑疹(Macule)、斑块(Plaque)、脓肿(Abscess)、脓疱(Pustule)、大疱(Bulla)、斑片(Patch)、结节(Nodule)、溃疡(Ulcer)、结痂(Crust)、糜烂(Erosion)、表皮脱落(Excoriation)、萎缩(Atrophy)、渗出物(Exudate)、紫癜/瘀点(Purpura/Petechiae)、裂隙(Fissure)、硬结(Induration)、干燥症(Xerosis)、毛细血管扩张(Tel angie ct asia)、鳞屑(Scale)、瘢痕(Scar)、易碎(Friable)、硬化(Sclerosis)、带蒂(Pedunculated)、外生性/真菌样(Exophytic/Fungating)、疣状/乳头瘤样(Warty/Papi l loma to us)、穹顶形(Domeshaped)、平顶(Flat-topped)、棕色(色素沉着过度)、半透明、白色(色素减退)、紫色、黄色、黑色、红斑(Erythema)、粉刺(Comedo)、苔藓样变(Lichen if i cation)、蓝色、脐凹状(Umb i lica ted)、皮肤异色症(Poikiloderma)、鲑鱼色(Salmon)、风团(Wheal)、尖锐(Acuminate)、隧道(Burrow)、灰色、色素性(Pigmented)、囊肿(Cyst)。
表1:
TABLE 2 Characteristics of Step 2 Dataset and Clinical Evaluation Dataset.
| Major Classes of Skin Disease | Number of Samples in Step 2 Dataset | Number of Samples in Clinical EvaluationDataset |
| AcneandRosacea | 840 | 10 |
| Malignant Lesions (Actinic Keratosis, Basal Cell Carcinoma, etc.) | 8166 | 10 |
| Dermatitis (Atopic Dermatitis, Eczema, Exanthems, Drug Eruptions, Contact Dermatitis, etc.) | 5262 | 10 |
| Bullous Disease | 448 | 10 |
| Bacterial Infections (Cellulitis, Impetigo, etc.) | 228 | 10 |
| Light Diseases (vitiligo, sun damaged skin, etc.) | 568 | 10 |
| Connective Tissue diseases (Lupus,etc.) | 420 | 10 |
| Benign Tumors (Seborrheic Keratoses,etc.) | 1916 | 10 |
| Melanoma Skin Cancer, Nevi, Moles | 23373 | 10 |
| Fungal Infections (Nail Fungus, Tinea Ringworm, Candidiasis, etc.) | 2340 | 10 |
| Psoriasis and Lichen Planus | 3460 | 10 |
| Infestations and Bites (Scabies, Lyme Disease,etc.) | 431 | 10 |
| Urticaria Hives | 212 | 10 |
| Vascular Tumors | 735 | 10 |
| Herpes | 405 | 10 |
| Others | 239 | / |
| Total | 49043 | 150 |
表 2: 第二阶段数据集与临床评估数据集特征。
| 主要皮肤病类别 | 第二阶段数据集样本数 | 临床评估数据集样本数 |
|---|---|---|
| 痤疮与玫瑰痤疮 | 840 | 10 |
| 恶性病变(光化性角化病、基底细胞癌等) | 8166 | 10 |
| 皮炎(特应性皮炎、湿疹、疹病、药疹、接触性皮炎等) | 5262 | 10 |
| 大疱性疾病 | 448 | 10 |
| 细菌感染(蜂窝织炎、脓疱疮等) | 228 | 10 |
| 色素疾病(白癜风、日光性皮肤损伤等) | 568 | 10 |
| 结缔组织病(狼疮等) | 420 | 10 |
| 良性肿瘤(脂溢性角化病等) | 1916 | 10 |
| 黑色素瘤皮肤癌、痣、胎记 | 23373 | 10 |
| 真菌感染(甲癣、癣菌病、念珠菌病等) | 2340 | 10 |
| 银屑病与扁平苔藓 | 3460 | 10 |
| 寄生虫感染与叮咬(疥疮、莱姆病等) | 431 | 10 |
| 荨麻疹 | 212 | 10 |
| 血管肿瘤 | 735 | 10 |
| 疱疹 | 405 | 10 |
| 其他 | 239 | / |
| 总计 | 49043 | 150 |
The second public dataset named the Dermnet contains 18,856 images, which are further classified into 15 classes by our board-certified dermatologists, including Acne and Rosacea, Malignant Lesions (Actinic Keratosis, Basal Cell Carcinoma, etc.), Dermatitis (Atopic Dermatitis, Eczema, Exanthems, Drug Eruptions, Contact Dermatitis, etc.), Bullous Disease, Bacterial Infections (Cellulitis, Impetigo, etc.), Light Diseases (vitiligo, sun damaged skin, etc.), Connective Tissue diseases (Lupus, etc.), Benign Tumors (Seborrheic Keratoses, etc.), Melanoma Skin Cancer (Nevi, Moles, etc.), Fungal Infections (Nail Fungus, Tinea Ringworm, Candidiasis, etc.), Psoriasis and Lichen Planus, Infestations and Bites (Scabies, Lyme Disease, etc.), Urticaria Hives, Vascular
第二个公开数据集名为Dermnet,包含18,856张图像,由我们的委员会认证皮肤科医生进一步划分为15个类别,包括痤疮与酒渣鼻、恶性病变(光化性角化病、基底细胞癌等)、皮炎(特应性皮炎、湿疹、疹病、药疹、接触性皮炎等)、大疱性疾病、细菌感染(蜂窝织炎、脓疱疮等)、光性疾病(白癜风、日光性皮肤损伤等)、结缔组织疾病(狼疮等)、良性肿瘤(脂溢性角化病等)、黑色素瘤皮肤癌(痣、色素痣等)、真菌感染(甲真菌病、癣菌病、念珠菌病等)、银屑病与扁平苔藓、寄生虫感染与叮咬(疥疮、莱姆病等)、荨麻疹、血管性病变。
Tumors, Herpes, and others.
肿瘤、疱疹及其他。
Our private in-house dataset contains 30,187 pairs of skin disease images and corresponding doctors’ descriptions. The complete dataset for step 2 training comprises in total of 49,043 pairs of images and textual descriptions as shown in Table 2.
我们的内部私有数据集包含30,187对皮肤病图像及对应的医生描述。如表2所示,用于第二步训练的完整数据集共计包含49,043对图像与文本描述。
3.2 The two-step training of SkinGPT-4
3.2 SkinGPT-4的两阶段训练
SkinGPT-4 was trained using a vast of skin disease images along with clinical concepts and doctors’ notes (Figure 1). In the first step, we fine-tuned the pre-trained MiniGPT-4 model using the step 1 training dataset. This dataset consists of paired skin disease images along with corresponding descriptions of clinical concepts. By training SkinGPT-4 on this dataset, we enabled the model to grasp the nuances of clinical concepts specific to skin diseases.
SkinGPT-4 使用了大量皮肤病图像、临床概念及医生笔记进行训练 (图 1)。第一步中,我们采用步骤1的训练数据集对预训练的 MiniGPT-4 模型进行微调。该数据集包含成对的皮肤病图像及对应的临床概念描述。通过在此数据集上训练 SkinGPT-4,我们使模型能够掌握皮肤病特有的临床概念细节。
In the second step, we further refined the model by fine-tuning it using the step 2 dataset, which comprises additional skin images and refined doctors’ notes. This iterative training process facilitated the accurate diagnosis of various skin diseases, as SkinGPT-4 incorporated the refined medical insights from the doctors’ notes.
在第二步中,我们使用步骤2的数据集(包含更多皮肤图像和精炼的医生笔记)对模型进行微调,进一步优化模型。这一迭代训练过程使SkinGPT-4能够融合医生笔记中的精炼医学见解,从而实现对多种皮肤病的准确诊断。
By following this two-step fine-tuning approach, SkinGPT-4 attained an enhanced understanding of clinical concepts related to skin diseases and acquired the proficiency to generate accurate diagnoses.
通过采用这种两步微调方法,SkinGPT-4增强了对皮肤病相关临床概念的理解,并具备了生成准确诊断的能力。
3.3 Model Training and Resources
3.3 模型训练与资源
During the training of both steps, the max number of epochs was fixed to 20, the iteration of each epoch was set to 5000, the warmup step was set to 5000, batch size was set to 2, the learning rate was set to 1e-4, and max text length was set to 160. The entire fine-tuning process required approximately 9 hours to complete and utilized two NVIDIA V100 (32GB) GPUs. During inference, only one NVIDIA V100 (32GB) GPU was necessary. SkinGPT-4 was developed using Python 3.7, PyTorch 1.9.1, and CUDA 11.4. For a comprehensive list of dependencies, please refer to our code availability documentation. The training and inference procedures were conducted on a workstation equipped with 252 GB RAM, 112 CPU cores, and two NVIDIA V100 GPUs, which provided the computational resources necessary for efficient model training and inference.
在两步训练的每一步中,最大训练轮数(epoch)固定为20,每轮迭代次数设为5000,预热步数设为5000,批量大小设为2,学习率设为1e-4,最大文本长度设为160。整个微调过程耗时约9小时,使用了两块NVIDIA V100 (32GB) GPU。推理阶段仅需一块NVIDIA V100 (32GB) GPU。SkinGPT-4基于Python 3.7、PyTorch 1.9.1和CUDA 11.4开发,完整依赖项列表请参阅我们的代码可用性文档。训练与推理均在配备252GB内存、112个CPU核心和两块NVIDIA V100 GPU的工作站上完成,该配置为高效模型训练与推理提供了必要的计算资源。
3.4 Clinical Evaluation of SkinGPT-4
3.4 SkinGPT-4的临床评估
To assess the reliability and effectiveness of SkinGPT-4, we assembled a dataset comprising 150 real-life cases of various skin diseases as shown in Table 2. Interactive diagnosis sessions were conducted with SkinGPT-4, utilizing four specific prompts:
为评估SkinGPT-4的可靠性和有效性,我们构建了包含150例真实皮肤疾病病例的数据集,如表2所示。采用四种特定提示词与SkinGPT-4进行交互式诊断会话:
- Could you describe the skin disease in this image for me? 2. Please provide a paragraph listing additional features you observed in the image. 3. Based on the previous information, please provide a detailed explanation of the cause of this skin disease. 4. What treatment and medication should be recommended for this case?
- 能否为我描述这张图片中的皮肤病?
- 请提供一个段落,列出你在图片中观察到的其他特征。
- 根据之前的信息,请详细解释这种皮肤病的成因。
- 针对此病例,应推荐哪些治疗方法和药物?
To conduct the clinical evaluation, certified dermatologists were provided with the same set of four questions and were required to make diagnoses based on the given skin disease images. Meanwhile, the dermatologists also evaluated the reports generated by SkinGPT-4 and assigned scores (strongly agree, agree, neutral, disagree, and strongly disagree) to each item in the evaluation form (Figure 4a), including the following questions:
为开展临床评估,认证皮肤科医生需回答相同的四个问题,并根据提供的皮肤病图像做出诊断。同时,皮肤科医生还对SkinGPT-4生成的报告进行评估,并对评估表(图4a)中的每个项目打分(非常同意、同意、中立、不同意、非常不同意),包括以下问题:
In particular, for questions 3 and 5, we further collected the opinions of users of SkinGPT-4, who usually do not have strong background knowledge in dermatology, to show that SkinGPT-4 is friendly to the general users. Those results allowed for a comprehensive evaluation of SkinGPT-4’s performance in relation to certified dermatologists and patients.
特别是对于问题3和5,我们进一步收集了SkinGPT-4用户的意见(这些用户通常不具备深厚的皮肤病学背景知识),以证明SkinGPT-4对普通用户友好。这些结果有助于全面评估SkinGPT-4相对于认证皮肤科医生和患者的表现。
4 CONCLUSION AND DISCUSSION
4 结论与讨论
Our study showcases the promising potential of utilizing visual inputs in LLMs to enhance dermatological diagnosis. With the upcoming release of more advanced LLMs like GPT-4, the accuracy and quality of diagnoses could be further improved. However, it is essential to address potential privacy concerns associated with using LLMs like ChatGPT and GPT-4 as an API, as it requires users to upload their private data. In contrast, SkinGPT-4 offers a solution to this privacy issue. By allowing users to deploy the model locally, the concerns regarding data privacy are effectively resolved. Users have the autonomy to use SkinGPT-4 within the confines of their own system, ensuring the security and confidentiality of their personal information.
我们的研究展示了利用视觉输入增强大语言模型(LLM)在皮肤病诊断中的潜力。随着GPT-4等更先进大语言模型的即将发布,诊断的准确性和质量有望进一步提升。但需要注意的是,使用ChatGPT和GPT-4等大语言模型作为API时涉及潜在隐私问题,因为用户需要上传私人数据。相比之下,SkinGPT-4提供了解决这一隐私问题的方案。通过允许用户本地部署模型,数据隐私问题得到了有效解决。用户可在自有系统内自主使用SkinGPT-4,确保个人信息的安全性和机密性。
During the course of a patient’s consultation with a dermatologist, the doctor often asks additional questions to gather crucial information that aids in arriving at a precise diagnosis. In contrast, SkinGPT-4 relies on the information provided by users to assist in the diagnostic process. Additionally, doctors often engage in empathetic interactions with patients, as the emotional connection could contribute to the diagnostic process. Due to these factors, it remains challenging for SkinGPT-4 to fully replace dermatologists at present. However, SkinGPT-4 still holds significant value as a tool for both patients and dermatologists. It can greatly expedite the diagnostic process and enhance the overall service delivery. By leveraging its capabilities, SkinGPT-4 empowers patients to obtain preliminary insights into their skin conditions and aids dermatologists in providing more efficient care. While it may not fully substitute for the expertise and empathetic nature of dermatologists, SkinGPT-4 serves as a valuable complementary resource in the field of dermatological diagnosis.
在患者咨询皮肤科医生时,医生通常会通过追问获取关键信息以辅助精准诊断。相比之下,SkinGPT-4仅能基于用户提供的信息参与诊断过程。此外,医生常与患者建立共情互动,这种情感联结可能对诊断产生积极影响。由于这些因素,目前SkinGPT-4尚难以完全取代皮肤科医生。但该工具对患者和医生仍具有重要价值:它能显著加速诊断流程并提升整体服务质量,既帮助患者获得皮肤问题的初步判断,也辅助医生提供更高效的诊疗服务。尽管无法替代皮肤科医生的专业能力和人文关怀,SkinGPT-4仍是皮肤疾病诊断领域的重要辅助资源。
As LLMs-based applications like SkinGPT-4 continue to evolve and improve with the acquisition of even more reliable medical training data, the potential for significant advancements in online medical services is enormous. SkinGPT-4 could play a critical role in improving access to healthcare and enhancing the quality of medical services for patients worldwide. We will continue our research in this field to further develop and refine this technology.
随着基于大语言模型的应用如SkinGPT-4通过获取更可靠的医学训练数据持续进化升级,在线医疗服务领域实现重大突破的潜力巨大。SkinGPT-4有望在提升全球患者医疗可及性与服务质量方面发挥关键作用。我们将持续深耕该领域研究,推动这项技术的完善与发展。
5 ACKNOWLEDGEMENTS
5 致谢
Special thanks: Thanks to Jun Chen, the author of MiniGPT-4 for the discussion of this work.
特别致谢:感谢 MiniGPT-4 作者 Jun Chen 对本工作的讨论。
Funding: Juexiao Zhou, Xiuying Chen, Yuetan Chu, Longxi Zhou, Xingyu Liao, Bin Zhang, and Xin Gao were supported in part by grants from the Office of Research Administration (ORA) at King Abdullah University of
资助声明:Juexiao Zhou、Xiuying Chen、Yuetan Chu、Longxi Zhou、Xingyu Liao、Bin Zhang和Xin Gao的部分研究工作得到阿卜杜拉国王科技大学(KAUST)科研管理办公室(ORA)的资助支持。
Science and Technology (KAUST) under award number FCC/1/1976-44-01, FCC/1/1976-45-01, REI/1/5202-01-01, REI/1/5234-01-01, REI/1/4940-01-01, RGC/3/4816-01-01, and REI/1/0018-01-01. Xiaonan He was supported by the foundation of the National Natural Science Foundation of China (No. 62272327).
科技(KAUST)资助编号FCC/1/1976-44-01、FCC/1/1976-45-01、REI/1/5202-01-01、REI/1/5234-01-01、REI/1/4940-01-01、RGC/3/4816-01-01及REI/1/0018-01-01。Xiaonan He获中国国家自然科学基金(编号62272327)资助。
Competing Interests: The authors have declared no competing interests.
竞争性利益:作者声明无竞争性利益。
Author Contribution Statements: J.Z. and X.G. conceived of the presented idea. J.Z. designed the computational framework and analysed the data. J.Z, L.S., J.X., X.C., Y.C., L.Z., X.L., B.Z. and X.H. conducted the clinical evaluation. X.G. supervised the findings of this work. J.Z., X.H., L.S., J.X. and X.G. took the lead in writing the manuscript and supplementary information. All authors discussed the results and contributed to the final manuscript.
作者贡献声明:J.Z.和X.G.提出了研究构想。J.Z.设计了计算框架并分析数据。J.Z、L.S.、J.X.、X.C.、Y.C.、L.Z.、X.L.、B.Z.和X.H.负责临床评估工作。X.G.监督了本研究的成果。J.Z.、X.H.、L.S.、J.X.和X.G.主导了论文及补充材料的撰写。全体作者对研究结果进行了讨论,并为最终稿件作出贡献。
Data availability: The data that support the findings of this study are divided into two groups: shared data and restricted data. Shared data include the SKINCON dataset and the Dermnet dataset. The SKINCON dataset can be accessed at https://skincon-dataset.github.io/. The Dermnet dataset can be accessed at https: //www.kaggle.com/datasets/s hub ham go el 27/dermnet. The restricted in-house skin disease images used in this study are not publicly available due to restrictions in the data-sharing agreement.
数据可用性:本研究结果所依据的数据分为两组:共享数据和受限数据。共享数据包括SKINCON数据集和Dermnet数据集。SKINCON数据集可通过https://skincon-dataset.github.io/获取,Dermnet数据集可通过https://www.kaggle.com/datasets/shubhamgoel27/dermnet获取。由于数据共享协议限制,本研究中使用的受限内部皮肤病图像不对外公开。
Code availability: To promote academic exchanges, under the framework of data and privacy security, the code proposed by SkinGPT-4 is publicly available at https: //github.com/Joshua Chou 2018/SkinGPT-4. In the case of non-commercial use, researchers can sign the license provided in the above link and contact J.Z. or X.G. to access the latest non-commercial trained model weights.
代码可用性:为促进学术交流,在数据与隐私安全框架下,SkinGPT-4提出的代码已公开于https://github.com/JoshuaChou2018/SkinGPT-4。非商业用途的研究人员可签署上述链接提供的许可协议,联系J.Z.或X.G.获取最新非商业训练模型权重。
