Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

实践中的大语言模型(Large Language Model)力量：关于ChatGPT及其他模型的综述

BING YIN, Amazon, USA

BING YIN, Amazon, 美国

XIA HU, Department of Computer Science, Rice University, USA

XIA HU，美国莱斯大学计算机科学系

This paper presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream natural language processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. Firstly, we offer an introduction and brief summary of current GPT- and BERT-style LLMs. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, natural language generation tasks, emergent abilities, and considerations for specific tasks. We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. We also try to understand the importance of data and the specific challenges associated with each NLP task. Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice. This comprehensive guide aims to provide researchers and practitioners with valuable insights and best practices for working with LLMs, thereby enabling the successful implementation of these models in a wide range of NLP tasks. A curated list of practical guide resources of LLMs, regularly updated, can be found at https://github.com/Mooler0410/LL Ms Practical Guide.

本文为从业者和终端用户在大语言模型(LLM)下游自然语言处理(NLP)任务中的应用提供了全面实用的指南。我们从模型、数据和下游任务三个维度，深入探讨了大语言模型的使用策略与洞见。首先，我们对当前GPT和BERT架构的大语言模型进行了介绍与简要总结。随后，我们分析了预训练数据、训练数据和测试数据的影响机制。最重要的是，我们针对各类自然语言处理任务(如知识密集型任务、传统自然语言理解任务、自然语言生成任务、涌现能力等)详细论证了大语言模型的适用场景与局限边界，通过具体案例展示其实际应用效果与约束条件。我们还着力解析了数据要素的重要性，以及不同NLP任务面临的特殊挑战。此外，我们探究了虚假偏差对大语言模型的影响，并深入讨论了效率、成本和延迟等关键部署因素。本指南旨在为研究者和实践者提供有价值的洞见与最佳实践，促进大语言模型在各类NLP任务中的成功落地。最新整理的实用资源清单持续更新于：https://github.com/Mooler0410/LLMsPracticalGuide。

CCS Concepts: $\cdot$ Computing methodologies $\rightarrow$ Natural language processing; Natural language generation; Machine translation.

CCS概念: $\cdot$ 计算方法 $\rightarrow$ 自然语言处理; 自然语言生成; 机器翻译。

Additional Key Words and Phrases: Large Language Models, Neural Language Processing, Practical Guide, ChatGPT

附加关键词和短语：大语言模型 (Large Language Models)、神经语言处理 (Neural Language Processing)、实用指南 (Practical Guide)、ChatGPT

1 INTRODUCTION

1 引言

In recent years, the rapid development of Large language Models has been revolutionizing the field of natural language processing [12, 128, 131]. These powerful models have shown great potential in addressing a variety of NLP tasks, ranging from natural language understanding (NLU) to generation tasks, even paving the way to Artificial General Intelligence (AGI). However, utilizing these models effectively and efficiently requires a practical understanding of their capabilities and limitations, as well as the data and tasks involved in NLP.

近年来，大语言模型 (Large Language Model) 的快速发展正在彻底改变自然语言处理领域 [12, 128, 131]。这些强大的模型在解决各类自然语言处理任务中展现出巨大潜力，涵盖从自然语言理解 (NLU) 到生成任务，甚至为通往通用人工智能 (AGI) 开辟了道路。然而，要高效且有效地利用这些模型，需要对其能力与局限性，以及自然语言处理中涉及的数据和任务有实际的理解。

To provide a guide for partition ers and end-users, this work focuses on the practical aspects of working with LLMs in downstream NLP tasks. This guide aims to provide practical advice on why or why not to choose LLMs for a given task, as well as guidance on how to select the most suitable LLM, taking into account factors such as model sizes, computational requirements, and the availability of domain-specific pre-trained models. This work offers a thorough understanding of LLMs from a practical perspective, therefore, empowers practitioners and end-users with the practical knowledge needed to successfully leverage the power of LLMs for their own NLP tasks.

为给开发者和终端用户提供指导，本文重点探讨大语言模型 (LLM) 在下游 NLP 任务中的实际应用。本指南旨在针对特定任务是否选用大语言模型提供实用建议，同时结合模型规模、计算资源需求和领域专用预训练模型的可用性等因素，指导如何选择最合适的大语言模型。这项工作从实践角度全面解析大语言模型，从而帮助开发者和终端用户掌握成功运用大语言模型解决自身 NLP 任务所需的实用知识。

Our work is structured as follows. First, our work offers a brief introduction to LLMs by discussing the most important models, such as GPT-style and BERT-style architectures. Then, we delve into the critical factors that influence model performance from the data perspective, including pre-training data, training/tuning data, and test data. Last and most importantly, we dive deep into various concrete NLP tasks, offering insights into the applicability of LLMs for knowledge-intensive tasks, traditional NLU tasks, and generation tasks, along with the emergent abilities that these models possess and challenging real-world scenarios. We provide detailed examples to highlight both the successful use cases and the limitations of LLMs in practice.

我们的工作结构如下。首先，通过讨论最重要的模型（如 GPT 风格和 BERT 风格的架构），简要介绍大语言模型。接着，我们从数据角度深入探讨影响模型性能的关键因素，包括预训练数据、训练/调优数据和测试数据。最后也是最重要的，我们深入探讨各种具体的 NLP 任务，分析大语言模型在知识密集型任务、传统自然语言理解任务和生成任务中的适用性，以及这些模型具备的涌现能力和具有挑战性的现实场景。我们提供了详细示例，以突出大语言模型在实践中的成功用例和局限性。

To analyze the abilities of large language models, we compare them with fine-tuned models. As of present, there is no universally recognized definition for LLMs and fine-tuned models. With consideration to practical utility, in our article, the definitions of them are proposed as: LLMs are huge language models pretrained on large amounts of datasets without tuning on data for specific tasks; fine-tuned models are typically smaller language models which are also pretrained and then further tuned on a smaller, task-specific dataset to optimize their performance on that task1.

为分析大语言模型(LLM)的能力，我们将其与微调模型进行对比。目前业界对LLM和微调模型尚无统一定义。基于实际应用考量，本文对其定义如下：大语言模型是在海量数据集上预训练、未经特定任务数据微调的巨型语言模型；微调模型通常是规模较小的语言模型，经过预训练后会在特定任务的较小数据集上进一步调优，以提升该任务表现[20]。

This work summarizes the following main practical guides for using LLMs:

本工作总结出以下使用大语言模型的主要实践指南:

2 PRACTICAL GUIDE FOR MODELS

2 模型实用指南

This section provides a brief introduction to state-of-the-art LLMs. These models differ in their training strategies, model architectures, and use cases. To provide a clearer understanding of the LLM landscape, we categorize them into two types: encoder-decoder or encoder-only language models and decoder-only language models. In Figure 1, we show the detailed evolution process of language models. From the evolutionary tree, we make the following interesting observations:

本节简要介绍最先进的大语言模型 (LLM)。这些模型在训练策略、架构设计和应用场景上各有不同。为更清晰地呈现大语言模型的发展脉络，我们将其划分为两类：编码器-解码器/纯编码器语言模型和纯解码器语言模型。图 1 展示了语言模型的详细演进历程，从进化树中我们可以得出以下重要发现：

a) Decoder-only models have been gradually dominating the development of LLMs. At the early stage of LLMs development, decoder-only models were not as popular as encoder-only and encoder-decoder models. However, after 2021, with the introduction of game-changing LLMs - GPT-3, decoder-only models experienced a significant boom. Meanwhile, after the initial explosive growth brought about by BERT, encoder-only models gradually began to fade away.

a) 仅解码器 (decoder-only) 模型已逐渐主导大语言模型的发展。在大语言模型发展初期，仅解码器模型不如仅编码器 (encoder-only) 和编码器-解码器 (encoder-decoder) 模型流行。然而2021年后，随着颠覆性的大语言模型 GPT-3 的推出，仅解码器模型迎来了显著爆发。与此同时，在 BERT 带来初期爆发式增长后，仅编码器模型逐渐式微。

Fig. 1. The evolutionary tree of modern LLMs traces the development of language models in recent years and highlights some of the most well-known models. Models on the same branch have closer relationships. Transformer-based models are shown in non-grey colors: decoder-only models in the blue branch, encoder-only models in the pink branch, and encoder-decoder models in the green branch. The vertical position of the models on the timeline represents their release dates. Open-source models are represented by solid squares, while closed-source models are represented by hollow ones. The stacked bar plot in the bottom right corner shows the number of models from various companies and institutions.

图 1: 现代大语言模型 (LLM) 的演化树展示了近年来语言模型的发展历程，并标注了一些最知名的模型。同一分支上的模型具有更紧密的关联性。基于Transformer的模型用非灰色标示：蓝色分支为纯解码器 (decoder-only) 模型，粉色分支为纯编码器 (encoder-only) 模型，绿色分支为编码器-解码器 (encoder-decoder) 模型。模型在时间轴上的纵向位置代表其发布日期。开源模型用实心方块表示，闭源模型用空心方块表示。右下角的堆叠条形图展示了来自不同公司和机构的模型数量。

b) OpenAI consistently maintains its leadership position in LLM, both currently and potentially in the future. Other companies and institutions are struggling to catch up with OpenAI in developing models comparable to GPT-3 and the current GPT-4. This leadership position may be attributed to OpenAI’s steadfast commitment to its technical path, even when it was not widely acknowledged initially. c) Meta contributes significantly to open-source LLMs and promotes research of LLMs. When considering contributions to the open-source community, particularly those related to LLMs, Meta stands out as one of the most generous commercial companies, as all the LLMs developed by Meta are open-sourced. d) LLMs exhibit a tendency towards closed-sourcing. In the early stages of LLM development (before 2020), the majority of models were open-sourced. However, with the introduction of GPT-3, companies have increasingly

b) OpenAI 在大语言模型(LLM)领域始终保持着领先地位，无论是现在还是未来。其他公司和机构在开发能与 GPT-3 和当前 GPT-4 相媲美的模型时，都难以追赶 OpenAI。这种领导地位可能归功于 OpenAI 对其技术路线的坚定承诺，即使最初并未获得广泛认可。

c) Meta 对开源大语言模型做出了重大贡献，并推动了大语言模型的研究。在考虑对开源社区的贡献时，特别是与大语言模型相关的贡献，Meta 是最慷慨的商业公司之一，因为其开发的所有大语言模型都是开源的。

d) 大语言模型呈现出闭源化的趋势。在大语言模型发展的早期阶段(2020 年之前)，大多数模型都是开源的。然而，随着 GPT-3 的推出，企业越来越...

Table 1. Summary of Large Language Models.

	Characteristic		LLMs
Encoder-Decoder orEncoder-only (BERT-style)	Training: Model type: Pretrain task:	Masked Language Models Discriminative Predictmaskedwords	ELMo [80], BERT [28],RoBERTa [65], DistilBERT[90],BioBERT[57],XLM[54], Xlnet[119],ALBERT[55],ELECTRA[24], T5[84],GLM[123],XLM-E[20],ST-MoE[133], AlexaTM [95]
Decoder-only (GPT-style)	Training Model type: Pretrain task:	AutoregressiveLanguageModels Generative Predictnextword	GPT-3[16], OPT [126].PaLM [22], BLOOM [92], MT-NLG [93], GLaM[32],Gopher[83],chinchilla[41], LaMDA [102],GPT-J[107],LLaMA[103], GPT-4[76],BloombergGPT [117]

表 1: 大语言模型总结

	特性		大语言模型

[论文翻译]实践中的大语言模型(Large Language Model)力量：关于ChatGPT及其他模型的综述

原文地址：https://arxiv.org/pdf/2304.13712