Viewpoint
观点
Utility of ChatGPT in Clinical Practice
ChatGPT在临床实践中的应用价值
Jialin Liu1,2,3, MD; Changyu Wang1,4, BSc; Siru Liu5, PhD
Jialin Liu1,2,3, MD; Changyu Wang1,4, BSc; Siru Liu5, PhD
Corresponding Author:
通讯作者:
Abstract
摘要
ChatGPT is receiving increasing attention and has a variety of application scenarios in clinical practice. In clinical decision support, ChatGPT has been used to generate accurate differential diagnosis lists, support clinical decision-making, optimize clinical decision support, and provide insights for cancer screening decisions. In addition, ChatGPT has been used for intelligent question-answering to provide reliable information about diseases and medical queries. In terms of medical documentation, ChatGPT has proven effective in generating patient clinical letters, radiology reports, medical notes, and discharge summaries, improving efficiency and accuracy for health care providers. Future research directions include real-time monitoring and predictive analytics, precision medicine and personalized treatment, the role of ChatGPT in tele medicine and remote health care, and integration with existing health care systems. Overall, ChatGPT is a valuable tool that complements the expertise of health care providers and improves clinical decision-making and patient care. However, ChatGPT is a double-edged sword. We need to carefully consider and study the benefits and potential dangers of ChatGPT. In this viewpoint, we discuss recent advances in ChatGPT research in clinical practice and suggest possible risks and challenges of using ChatGPT in clinical practice. It will help guide and support future artificial intelligence research similar to ChatGPT in health.
ChatGPT正受到越来越多的关注,并在临床实践中拥有多样化的应用场景。在临床决策支持方面,ChatGPT已被用于生成准确的鉴别诊断列表、辅助临床决策、优化临床决策支持系统,并为癌症筛查决策提供参考。此外,ChatGPT还应用于智能问答系统,为疾病和医疗咨询提供可靠信息。在医疗文书方面,ChatGPT能高效生成患者临床信函、放射学报告、医疗记录和出院小结,提升了医护人员的效率与准确性。未来研究方向包括实时监测与预测分析、精准医疗与个性化治疗、ChatGPT在远程医疗中的作用,以及与现有医疗系统的整合。总体而言,ChatGPT是辅助医疗专业人员、改善临床决策和患者护理的有力工具。但ChatGPT是一把双刃剑,我们需要审慎考量其益处与潜在风险。本文探讨了ChatGPT在临床实践中的最新研究进展,并提出其在临床应用中的潜在风险与挑战,这将为未来类似ChatGPT的医疗人工智能研究提供指导与支持。
(J Med Internet Res 2023;25:e48568) doi: 10.2196/48568
(J Med Internet Res 2023;25:e48568) doi: 10.2196/48568
KEYWORDS
关键词
ChatGPT; artificial intelligence; large language models; clinical practice; large language model; natural language processing; NLP; doctor-patient; patient-physician; communication; challenges; barriers; recommendations; guidance; guidelines; best practices; risks
ChatGPT;人工智能;大语言模型;临床实践;大语言模型;自然语言处理;NLP;医患;患者-医生;沟通;挑战;障碍;建议;指导;指南;最佳实践;风险
Introduction
引言
ChatGPT is a large language model developed by OpenAI. It is based on the GPT architecture and uses deep learning techniques to generate natural language text [1,2]. The model has been developed using supervised and reinforcement learning strategies [3]. ChatGPT can generate coherent, grammatically correct text, which is an important development in artificial intelligence (AI) [4]. It shows great potential for using large language models and reinforcement learning from human feedback to improve clinical decision support (CDS) alert logic and potentially other medical areas involving complex clinical logic, a key step in the development of an advanced learning health care system. ChatGPT has quickly gained worldwide attention for its accurate well-formulated responses to various topics. As physicians, we have the opportunity to help guide and develop new ways of using this powerful tool. It can be used in research and development to analyze large amounts of medical data, identify trends, and provide insights into best clinical practices. Physicians need to consider using ChatGPT in their clinical practice. Furthermore, we are using ChatGPT as a tool to support physicians’ clinical practice, not to replace them.
ChatGPT是由OpenAI开发的大语言模型。它基于GPT架构,采用深度学习技术生成自然语言文本[1,2]。该模型通过监督学习和强化学习策略进行训练[3]。ChatGPT能生成连贯、语法正确的文本,这是人工智能(AI)领域的重要进展[4]。它展现出利用大语言模型和人类反馈强化学习改进临床决策支持(CDS)警报逻辑的巨大潜力,并可能应用于其他涉及复杂临床逻辑的医疗领域,这是发展高级学习型医疗系统的关键一步。ChatGPT因能对各种话题给出准确规范的答复而迅速获得全球关注。作为医生,我们有责任引导并开发这一强大工具的新应用方式。它可用于研发领域分析海量医疗数据、识别趋势并提供最佳临床实践建议。医生应考虑将ChatGPT应用于临床实践。需强调的是,我们是将ChatGPT作为辅助医生临床实践的工具,而非替代品。
Despite the increasing popularity and performance of ChatGPT, there is still a lack of studies evaluating its use in clinical practice. At the same time, we should be aware that ChatGPT is a double-edged sword, with powerful functions and potential dangers. To better understand the application of ChatGPT in clinical practice, we introduced the recent progress of ChatGPT in clinical practice to help interested researchers effectively grasp the key aspects of this topic and to provide possible future research directions. The purpose of this viewpoint is to provide an overview of the recent advances in ChatGPT in clinical practice (Multimedia Appendix 1 [5-16]), to explore the future direction of ChatGPT in clinical practice, to highlight the risks and challenges of its use in clinical practice, and to propose appropriate mitigation strategies. Although ChatGPT has demonstrated promising prospects in clinical practice, further research is needed to refine and improve its capabilities. Integrating ChatGPT into existing electronic health record (EHR) systems has the potential to improve diagnostic accuracy, treatment planning, and patient outcomes. However, it is essential to regard ChatGPT as a valuable tool that supplements the expertise of health care professionals rather than replacing them.
尽管ChatGPT日益普及且性能不断提升,但目前仍缺乏对其在临床实践中应用效果的研究评估。与此同时,我们应认识到ChatGPT是把双刃剑,既具备强大功能又存在潜在风险。为更好地理解ChatGPT在临床实践中的应用,本文介绍了ChatGPT在临床领域的最新进展,旨在帮助相关研究者有效把握该主题的核心要点,并提供未来可能的研究方向。本文旨在:(1) 综述ChatGPT在临床实践中的最新进展(多媒体附录1 [5-16]);(2) 探讨ChatGPT在临床实践中的未来发展方向;(3) 强调其在临床应用中存在的风险与挑战;(4) 提出相应的缓解策略。虽然ChatGPT在临床实践中展现出良好前景,但仍需进一步研究以完善其功能。将ChatGPT整合至现有电子健康档案(EHR)系统,有望提升诊断准确性、治疗方案制定和患者预后效果。但必须明确:ChatGPT应作为补充医疗专业人员专业知识的辅助工具,而非替代者。
Clinical Decision Support
临床决策支持
Clinical decision-making is a complex process. It involves many factors, such as the physician’s clinical thinking, clinical reasoning, individual judgement, and the patient’s condition [17]. These factors can lead to cognitive biases, errors in reasoning, and preventable harm. AI-based CDS can effectively support physicians’ clinical decisions and improve treatment outcomes [18]. Current applications of ChatGPT in CDS include the following:
临床决策是一个复杂的过程,涉及诸多因素,例如医生的临床思维、临床推理、个人判断以及患者病情 [17]。这些因素可能导致认知偏差、推理错误和可预防的伤害。基于AI的临床决策支持(CDS)能有效辅助医生临床决策并改善治疗结果 [18]。当前ChatGPT在CDS中的应用主要包括:
Differential-diagnosis lists: Hirosawa et al [5] evaluated ChatGPT-3 and general internal medicine physicians to generate clinical cases, correct diagnoses, and five differential diagnoses for 10 common chief complaints. In the 10 differential diagnosis lists, the correct diagnosis rate of ChatGPT-3 was 28 out of 30 $(93.3%)$ . In the 5 differential diagnosis lists, the correct diagnosis rate of physicians was superior to ChatGPT-3 $98.3%$ vs $83.3%$ ; $P{=}.03,$ . In the 10 differential diagnosis lists generated by ChatGPT-3, the consistent differential diagnosis rate of the doctors was 62 out of 88 $(70.5%)$ . This study shows that the differential diagnosis list generated by ChatGPT-3 has high diagnostic accuracy for clinical cases with common chief complaints. Clinical decision-making: Rao et al [6] entered all 36 published clinical vignettes from the Merck Sharp & Dohme (MSD) Clinical Manual into ChatGPT and compared the accuracy of differential diagnosis, diagnostic tests, final diagnosis, and management according to the patient age and gender, and the sensitivity of the case. ChatGPT achieved an overall accuracy rate of $71.7%$ $95%$ CI $69.3%-74.1%$ ) across all 36 clinical cases.
鉴别诊断列表:Hirosawa等人[5]评估了ChatGPT-3和普通内科医生针对10种常见主诉生成临床病例、正确诊断及五项鉴别诊断的能力。在10项鉴别诊断列表中,ChatGPT-3的正确诊断率为30例中的28例$(93.3%)$。在5项鉴别诊断列表中,医生的正确诊断率优于ChatGPT-3($98.3%$ vs $83.3%$;$P{=}.03$)。在ChatGPT-3生成的10项鉴别诊断列表中,医生的一致性鉴别诊断率为88例中的62例$(70.5%)$。该研究表明,ChatGPT-3生成的鉴别诊断列表对常见主诉的临床病例具有较高诊断准确性。临床决策:Rao等人[6]将《默克诊疗手册》(MSD Manual)中全部36个已发表临床案例输入ChatGPT,根据患者年龄、性别及病例敏感度,对比分析了鉴别诊断、诊断检查、最终诊断和治疗方案的准确性。ChatGPT在36个临床案例中的总体准确率为$71.7%$($95%$ CI $69.3%-74.1%$)。
Cancer screening: Rao et al [7] compared ChatGPT responses with the American College of Radiology appropriateness criteria for breast pain and breast cancer screening. The ChatGPT prompt formats were open-ended (OE) and select all that apply (SATA). The results of the study showed that breast cancer screening achieved an average OE score of 1.83 out of 2, with an average correct rate of $88.9%$ for SATA; breast pain achieved an average OE score of 1.125 out of 2, with an average correct rate of $58.3%$ for SATA. The results show the feasibility of using ChatGPT for radiological decision-making and have the potential to improve clinical workflow.
癌症筛查:Rao等[7]将ChatGPT的回答与美国放射学会(ACR)关于乳房疼痛和乳腺癌筛查的适用性标准进行了比较。ChatGPT的提示格式分为开放式(OE)和"全选适用项"(SATA)两种。研究结果显示,在乳腺癌筛查方面,OE平均得分为1.83分(满分2分),SATA平均正确率为$88.9%$;在乳房疼痛方面,OE平均得分为1.125分(满分2分),SATA平均正确率为$58.3%$。结果表明使用ChatGPT进行放射学决策的可行性,并具有改善临床工作流程的潜力。
CDS optimization: Liu et al [8] studied 5 clinicians’ ratings of $36\mathrm{CDS}$ recommendations generated by ChatGPT and 29 recommendations generated by experts. The research results revealed that 9 of the top 20 recommendations in the survey were generated by ChatGPT. The study found that recommendations generated by ChatGPT provided a unique perspective and were rated as highly understandable and relevant, moderately useful but with low acceptability, bias, inversion, and redundancy. These recommendations can be an important complementary part of optimizing CDS alerts, identifying potential improvements to alert logic and supporting their implementation or even helping experts to develop their recommendations for CDS improvements.
CDS优化:Liu等人[8]研究了5位临床医生对ChatGPT生成的36条CDS建议和专家生成的29条建议的评分。研究结果显示,调查中排名前20的建议中有9条由ChatGPT生成。研究发现,ChatGPT生成的建议提供了独特视角,被评为高度可理解且相关,中等有用但接受度较低,存在偏见、反转和冗余问题。这些建议可作为优化CDS警报的重要补充,帮助识别警报逻辑的潜在改进点,支持其实施,甚至协助专家制定CDS改进建议。
ChatGPT has been evaluated for CDS applications. It has been shown to generate accurate lists of differential diagnoses, clinical decision making, optimize CDS, and provide insights for cancer screening decisions. Further research could focus on developing advanced models that integrate ChatGPT with existing CDS systems. These models can leverage the extensive medical literature, clinical guidelines, and patient data to support physicians in making accurate diagnoses, formulating treatment plans, and predicting patient outcomes. By combining the expertise of health care professionals with the capabilities of ChatGPT, comprehensive and personalized decision support is provided.
ChatGPT已针对临床决策支持(CDS)应用进行评估。研究表明,它能生成准确的鉴别诊断列表、辅助临床决策、优化CDS系统,并为癌症筛查决策提供见解。未来研究可重点开发将ChatGPT与现有CDS系统整合的进阶模型。这些模型能利用海量医学文献、临床指南和患者数据,辅助医生进行精准诊断、制定治疗方案及预测患者预后。通过结合医疗专业人员的经验与ChatGPT的能力,可提供全面且个性化的决策支持。
Question-Answer (Medical Queries)
问答 (医学查询)
Intelligent question-answering is often used to provide information about diseases or to discuss the results of clinical tests. The use of intelligent question-answering in clinical practice has various benefits for health care systems, such as support for health care professionals and patients, triage, disease screening, health management, consultation, and training of health care professionals [19]. ChatGPT can be used for intelligent question-answering in health care. However, it should be noted that the answers may change over time and with different question prompts and that harmful biases in answers may occur [9]. It is important to use ChatGPT responsibly to ensure that they can help and not harm users seeking disease knowledge and information. Below are some examples of ChatGPT’s application in medical queries, demonstrating its potential in generating intelligent questions and answer prompts for various diseases:
智能问答常用于提供疾病相关信息或讨论临床检测结果。在临床实践中使用智能问答可为医疗保健系统带来多重益处,例如为医护人员和患者提供支持、分诊、疾病筛查、健康管理、咨询以及医护人员培训 [19]。ChatGPT可用于医疗保健领域的智能问答。但需注意,答案可能随时间推移和不同提问方式而变化,且回答中可能出现有害偏见 [9]。负责任地使用ChatGPT至关重要,以确保其能帮助而非伤害寻求疾病知识的用户。以下是ChatGPT在医学查询中的应用示例,展示其为各类疾病生成智能问答提示的潜力:
Common retinal diseases: Potapenko et al [10] conducted a study to evaluate the accuracy of ChatGPT in providing information on common retinal diseases: age-related macular degeneration, diabetic ret in opa thy, retinal vein occlusion, retinal artery occlusion, and central serous c horio ret in opa thy. A total of 100 responses were obtained through a series of questions that included the disease summary, prevention, treatment options, and prognosis for each disease. The results indicate that ChatGPT provides highly accurate general information (median score 5, IQR 4-5, range 3-5), disease prevention information (median 4, IQR 4-5, range 4-5), prognosis information (median 5, IQR 4-5, range 3-5), and treatment options (median 3, IQR 2-3, range 2-5). Reliability statistics showed a Cronbach $\alpha$ of .910 ( $95%$ CI .867-.940). Of the 100 responses evaluated, 45 were rated as very good with no inaccuracies, 26 had minor harmless inaccuracies, 17 were marked as potentially misinterpreted inaccuracies, and 12 had potentially harmful errors.
常见视网膜疾病:Potapenko等人[10]开展了一项研究,评估ChatGPT在提供常见视网膜疾病信息时的准确性,包括年龄相关性黄斑变性、糖尿病视网膜病变、视网膜静脉阻塞、视网膜动脉阻塞和中心性浆液性脉络膜视网膜病变。通过一系列涉及疾病概述、预防措施、治疗方案及预后的提问,共获得100条回答。结果显示,ChatGPT在提供疾病概述(中位数评分5,四分位距4-5,范围3-5)、预防信息(中位数4,四分位距4-5,范围4-5)、预后信息(中位数5,四分位距4-5,范围3-5)和治疗方案(中位数3,四分位距2-3,范围2-5)方面具有较高准确性。信度分析显示Cronbach $\alpha$ 系数为.910(95%置信区间.867-.940)。在评估的100条回答中,45条被评为优秀且无错误,26条存在无害微小错误,17条存在可能被误解的错误,12条包含潜在有害错误。
Obstetrics and gynecology: Grünebaum et al [9] presented a series of questions (14 questions) on obstetrics and gynecology to ChatGPT, and evaluated the answers to each question. The study shows that ChatGPT is valuable for users seeking preliminary information on almost any topic in the field. The answers are generally convincing and informative. They do not contain a significant number of errors or misinformation. A major drawback is that the data on which the model is trained does not appear to be easily updatable.
妇产科:Grünebaum等人[9]向ChatGPT提出了一系列妇产科相关问题(14个问题),并对每个问题的回答进行了评估。研究表明,ChatGPT对于寻求该领域几乎所有主题初步信息的用户具有重要价值。这些回答通常具有说服力且信息丰富,不包含大量错误或误导性信息。主要缺陷在于模型训练所依据的数据似乎不易更新。
Hepatic disease: Yeo et al [11] investigated the accuracy and reproducibility of the ChatGPT in answering questions about knowledge, management, and emotional support for cirrhosis and hepatocellular carcinoma (HCC). The responses to the 164 questions in ChatGPT were independently assessed by two transplant he pato logi sts and reviewed by a third reviewer. The results of the study showed that ChatGPT had extensive knowledge of cirrhosis $(79.1%$ correct) and HCC $74%$ correct). However, only a small proportion ( $47.3%$ for cirrhosis and $41.1%$ for HCC) was rated as comprehensive. Performance was better in basic knowledge, lifestyle, and treatment than in diagnosis and prevention. Regarding quality measures, the model answered $76.9%$ of questions correctly but failed to provide specific decision cutoff points and treatment duration. ChatGPT may have a role as a supplementary information tool for patients and physicians to improve outcomes.
肝脏疾病:Yeo等[11]研究了ChatGPT在回答肝硬化与肝细胞癌(HCC)相关知识、管理和情感支持问题时的准确性与可重复性。两位移植肝病学家对ChatGPT回答的164个问题进行了独立评估,并由第三位评审人复核。研究结果显示,ChatGPT对肝硬化$(79.1%$正确率)和HCC$(74%$正确率)具有广泛认知,但仅少部分回答(肝硬化$47.3%$,HCC$41.1%$)被评为全面覆盖。该模型在基础知识、生活方式和治疗方面表现优于诊断和预防领域。在质量指标方面,模型正确回答了$76.9%$的问题,但未能提供具体的决策临界点和治疗周期。ChatGPT可能作为辅助信息工具帮助医患改善诊疗结果。
Cancer: Johnson et al [12] used questions from the “Common Cancer Myths and Misconceptions” web page to assess the accuracy of ChatGPT and National Cancer Institute (NCI) answers to the questions. The results showed an overall accuracy of $100%$ for NCI answers and $96.9%$ for questions 1 to 13 output by ChatGPT $(\mathrm{k}{=}-0.03$ , SE 0.08). There was no significant difference in word count and readability between NCI and ChatGPT answers. ChatGPT provided accurate information about common cancer myths and misconceptions.
癌症:Johnson 等人 [12] 使用"常见癌症谣言与误解"网页中的问题评估了 ChatGPT 和美国国家癌症研究所 (NCI) 的回答准确性。结果显示 NCI 答案的总体准确率为 $100%$,ChatGPT 对问题 1 至 13 的输出准确率为 $96.9%$ $(\mathrm{k}{=}-0.03$,SE 0.08)。NCI 与 ChatGPT 回答的字数和可读性无显著差异。ChatGPT 提供了关于常见癌症谣言与误解的准确信息。
The use of ChatGPT in answering medical queries has shown promise in assisting health care professionals by providing reliable information and guidance. However, ChatGPT’s responses are generated based on patterns and knowledge learned from training data, and it does not currently have up-to-date medical information or take into account specific patient situations. Therefore, health care providers should exercise caution and independently verify key information obtained from ChatGPT to ensure accuracy and appropriateness for individual patients. Careful and responsible use, as well as continued research and development, are necessary to maximize its benefits and minimize potential limitations.
ChatGPT在回答医疗咨询方面的应用已显示出为医疗专业人员提供可靠信息和指导的潜力。然而,ChatGPT的回复是基于训练数据中学到的模式和知识生成的,目前不具备最新医疗信息,也无法考虑特定患者情况。因此,医疗提供者应谨慎行事,并独立验证从ChatGPT获取的关键信息,以确保其准确性和对个体患者的适用性。为最大化其效益并减少潜在局限,有必要进行负责任的使用以及持续研发。
Medical Document
医疗文档
Overview
概述
Writing medical documents is a tedious and time-consuming process for health care providers. At the same time, errors in medical documentation are common [20,21]. Correctly documenting and exchanging clinical information between physician and patient is paramount. Medical documentation requires a high level of accuracy, so recorders should be able to capture and accurately record all medical information discussed during the interview. ChatGPT is an effective tool for medical documentation [13,22]. Using ChatGPT as a language assistant or providing templates can significantly reduce the time and improve the accuracy of medical documentation for clinicians [2]. The following four subsections illustrate specific areas where ChatGPT can be effectively applied, including the generation of patient clinic letters, radiology reports, medical notes, and discharge summaries, demonstrating its potential to simplify medical documentation and improve clinician efficiency.
撰写医疗文书对医护人员而言是一项繁琐且耗时的任务。与此同时,医疗文档中的错误屡见不鲜 [20,21]。在医患之间准确记录并交换临床信息至关重要。医疗文档需要极高的精确度,因此记录者必须能够完整捕捉并准确记载问诊过程中讨论的所有医疗信息。ChatGPT已被证明是医疗文档记录的有效工具 [13,22]。将其作为语言助手或模板生成器使用,可显著减少临床医生的文书时间并提升记录准确性 [2]。以下四个小节将具体展示ChatGPT的高效应用场景,包括生成患者门诊信函、放射学报告、医疗记录和出院小结,这些应用证明了其简化医疗文书流程、提升临床工作效率的潜力。
Patient Clinic Letters
患者临床信函
Using skin cancer as an example, Ali et al [14] evaluated the readability, factual accuracy, and humanization of clinical letters to patients generated by ChatGPT. Of the 38 hypothetical clinical scenarios created, 7 involved basal cell carcinoma, 11 to squamous cell carcinoma, and 20 to malignant melanoma. The overall median accuracy of the clinical information in the letter was 7 (range 1-9). The overall median humanness of the writing style was 7 (5-9). The weighting for accuracy κ was 0.80 $(P{<}.001)$ and for humanness 0.77 $(P{<}.001)$ . This assessment demonstrates that ChatGPT can generate clinical letters with high overall accuracy and humanization. In addition, the reading level of these letters is generally similar to that of letters currently generated by doctors.
以皮肤癌为例,Ali等[14]评估了ChatGPT生成的患者临床信函的可读性、事实准确性和人性化程度。在创建的38个假设临床场景中,7个涉及基底细胞癌,11个涉及鳞状细胞癌,20个涉及恶性黑色素瘤。信函中临床信息的总体准确度中位数为7(范围1-9)。写作风格的总体人性化中位数为7(5-9)。准确性权重κ为0.80 $(P{<}.001)$,人性化权重为0.77 $(P{<}.001)$。该评估表明,ChatGPT能够生成具有较高整体准确性和人性化的临床信函。此外,这些信函的阅读水平通常与医生当前生成的信函相似。
Radiology Reports
放射学报告
Jeblick et al [15] investigated 15 radiologists to assess the quality of the ChatGPT simplified radiology reports. Of all the ratings, $75%$ were “agree” or “strongly agree” $_ {(\ma