[论文翻译]为什么AI智能体是生成式AI的下一个前沿领域




By moving from information to action—think virtual coworkers able to complete complex workflows-the technology promises a new wave of productivity and innovation.

通过从信息处理转向行动执行——设想能够完成复杂工作流程的虚拟同事——这项技术有望带来新一轮的生产力提升和创新发展。

By Lareina Yee, Michael Chui, and Roger Roberts with Stephen Xu

作者:Lareina Yee、Michael Chui、Roger Roberts,与Stephen Xu合作

Overthe past couple of years, the world has marveled at the capabilities and possibilities unleashed by generative Al (gen Al). Foundation models such as large language models (LLMs) can perform impressive feats, extracting insights and generating content across numerous mediums, such as text, audio, images, and video. But the next stage of gen Al is likely to be more transformative.

在过去的几年里,世界为生成式AI(Generative AI)释放的能力和可能性感到惊叹。大语言模型(LLM)等基础模型能够完成令人印象深刻的任务,在文本、音频、图像和视频等多种媒介中提取洞察并生成内容。但生成式AI的下一阶段可能会更具变革性。

We are beginning an evolution from knowledge-based, gen Al-powered tools—say, chatbots that answer questions and generate content—to gen Al-enabled “agents" that use foundation models to execute complex, multistep workflows across a digital world. In short, the technology is moving from thought to action.

我们正从基于知识的生成式AI工具——例如能回答问题、生成内容的聊天机器人——向启用生成式AI的"智能体"演进,这些智能体能够运用基础模型在数字世界中执行复杂的多步骤工作流。简而言之,这项技术正在从思考迈向行动。

Broadly speaking,"agentic" systems refer to digital systems that can independently interact in a dynamic world. While versions of these software systems have existed for years, the natural-language capabilities of gen Al unveil new possibilities, enabling systems that can plan their actions,use online tools to complete those tasks, collaborate with other agents and people, and learn to improve their performance.GenAl agents eventually could act as skilled virtual coworkers, working with humans in a seamless and natural manner. A virtual assistant, for example, could plan and book a complex personalized travel itinerary, handling logistics across multiple travel platforms. Using everyday language, an engineer could describe a new software feature to a programmer agent, which would then code, test, iterate, and deploy the tool it helped create.

广义而言,"智能体"系统指的是能够在动态世界中独立交互的数字系统。虽然这类软件系统的不同版本已存在多年,但生成式AI的自然语言能力揭示了新的可能性——使系统能够规划行动、使用在线工具完成任务、与其他智能体和人类协作,并学会提升性能。生成式AI智能体最终可能成为熟练的虚拟同事,以无缝自然的方式与人类协同工作。例如,虚拟助手可以规划并预订复杂的个性化旅行行程,处理跨多个旅行平台的物流事务。工程师使用日常语言向编程智能体描述新软件功能后,该智能体便能对协助创建的工具进行编码、测试、迭代和部署。

Agentic systems traditionally have been difficult to implement, requiring laborious, rule-based programming or highly specific training of machine-learning models. Gen Al changes that. When agentic systems are built using foundation models (which have been trained on extremely large and varied unstructured data sets) rather than predefined rules, they have the potential to adapt to different scenarios in the same way that LLMs can respond intelligibly to prompts on which they have not been explicitly trained. Furthermore, using natural language rather than programming code, a human user could direct a gen Al-enabled

传统AI智能体系统一直难以实现,需要基于规则进行繁琐编程或对机器学习模型进行高度专业化训练。生成式AI (Generative AI) 改变了这一现状。当基于基础模型 (在极其庞大多样的非结构化数据集上训练而成) 而非预定义规则构建智能体系统时,它们就具备了适应不同场景的潜力,正如大语言模型能够对未经明确训练的提示做出智能响应那样。此外,人类用户可以使用自然语言而非编程代码来指导具备生成式AI能力的智能体系统。

agent system to accomplish a complex workflow. A multiagent system could then interpret and organize this workflow into actionable tasks, assign workto specialized agents, execute these refined tasks using a digital ecosystem of tools, and collaborate with other agents and humans to iterative ly improve the quality of its actions.

智能体系统来完成复杂的工作流程。多智能体系统随后可以将这个工作流程解释并组织成可执行的任务,分配给专门的智能体,使用工具的数字生态系统执行这些经过优化的任务,并与其他智能体和人类协作,迭代地提高其行动的质量。

In this article, we explore the opportunities that the use of gen Al agents presents. Although the technology remains in its nascent phase and requires further technical development before it's ready for business deployment, it's quickly attracting attention. In the past year alone, Google, Microsoft, OpenAl, and others have invested in software libraries and frameworks to support agentic functionality. LLM-powered applications such as Microsoft Copilot, Amazon Q, and Google's upcoming Project Astra are shifting from being knowledge-based to becoming more action-based. Companies and research labs such as Adept, crewAl, and Imbue also are developing agent-based models and multiagent systems. Given the speed with which gen Al is developing, agents could become as commonplace as chatbots are today.

在本文中, 我们探讨了使用生成式AI智能体 (Generative AI Agents) 所带来的机遇。尽管这项技术仍处于起步阶段, 需要进一步的技术开发才能投入商业应用, 但它正迅速吸引各方关注。仅在过去一年, Google、Microsoft、OpenAI等公司就投资了支持智能体功能的软件库和框架。由大语言模型驱动的应用程序, 如Microsoft Copilot、Amazon Q以及Google即将推出的Project Astra, 正从知识型转向行动型。Adept、crewAI和Imbue等公司与研究实验室也在开发基于智能体的模型和多智能体系统。鉴于生成式AI的发展速度, 智能体有望变得像如今的聊天机器人一样普及。

What value can agents bring to businesses?

AI智能体能给企业带来什么价值?

The value that agents can unlock comes from their potential to automate a long tail of complex use cases characterized by highly variable inputs and outputs—use cases that have historically been difficult to address in a cost-ortime-efficient manner. Something as simple as a business trip, for example,caninvolve numerous possible itineraries encompassing different airlines and flights, not to mention hotel rewards programs, restaurant reservations, and off-hours activities, all of which must be handled across different online platforms. While there have been efforts to automate parts of this process, much of it still must be done manually. This is in large part because the wide variation in potential inputs and outputs makes the process too complicated,costly,ortime-intensive to automate.

AI智能体 (AI Agent) 能够创造的价值,源于其自动化处理长尾复杂用例的潜力——这些用例以高度可变的输入输出为特征,历来难以通过成本或时间高效的方式解决。以简单的商务旅行为例,可能涉及包含不同航空公司和航班的多种行程方案,更不用说酒店奖励计划、餐厅预订和业余活动安排,所有这些都需要在不同在线平台上处理。虽然已有尝试对此流程进行部分自动化,但大部分环节仍需人工操作。这在很大程度上是因为输入输出变量的广泛差异,使得自动化流程过于复杂、昂贵或耗时。

Gen Al-enabled agents can ease the automation of complex and open-ended use cases in three important ways:

生成式AI (Generative AI) 赋能的智能体可通过三种重要方式简化复杂开放式用例的自动化流程:

Agents can manage multiplicity. Many business use cases and processes are characterized by a linear workflow, with a clear beginning and series of steps that lead to a specific resolution or outcome. This relative simplicity makes them easily codified and automated inrule-based systems. But rule-based systems often exhibit "brittleness"-that is, they break down when faced with situations not contemplated by the designers of the explicit rules. Many workflows, for example, are far less predictable, marked by unexpected twists and turns and a range of possible outcomes; these workflows require special handling and nuanced judgment that makes rules-based automation challenging. But gen Al agent systems, because they are based on foundation models, have the potential to handle a wide variety of less-likely situations for a given use case, adapting in real time to perform the specialized tasks required to bring a process to completion.

智能体 (AI Agent) 能够处理多重复杂性。许多商业用例和流程都具有线性工作流的特征,即具有明确的起点和一系列步骤,最终达成特定解决方案或结果。这种相对简单的特性使其易于在基于规则的系统中进行编码和自动化。但基于规则的系统常常表现出"脆弱性"——当遇到规则设计者未预料到的情况时,系统就会崩溃。例如,许多工作流程具有高度不可预测性,充满意外转折和多种可能结果;这些流程需要特殊处理和细致判断,使得基于规则的自动化难以应对。而基于基础模型的生成式AI智能体系统,由于具备适应能力,有望处理特定用例中各类小概率场景,实时调整以执行专业任务,推动流程达成最终结果。

Agent systems can be directed with natural language. Currently, to automate a use case, it first must be broken down into a series of rules and steps that can be codified. These steps are typically translated into computer code and integrated into software systems—an often costly and laborious process that requires significant technical expertise.Because agent ic systems use natural language as a form of instruction, even complex workflows can be encoded more quickly and easily.What's more, the process can potentially be done by nontechnical employees, rather than software engineers. This makes it easier to integrate subject matter expertise, grants wider access to gen Al and Al tools, and eases collaboration between technical and nontechnical teams.

AI智能体 (AI Agent) 系统可通过自然语言进行引导。目前要实现用例自动化,首先需要将其分解为可编码的规则和步骤序列。这些步骤通常需转化为计算机代码并集成到软件系统中——这一过程往往成本高昂且费时费力,需要大量专业技术知识。由于AI智能体系统使用自然语言作为指令形式,即使复杂的工作流也能更快速便捷地完成编码。更重要的是,该过程可能由非技术人员完成,而非必须依赖软件工程师。这使得领域专业知识更易整合,拓宽了生成式AI (Generative AI) 与AI工具的访问范围,并促进了技术团队与非技术团队之间的协作。

Agents can work with existing software tools and platforms.In additionto analyzing and generating knowledge, agent systems can use tools and communicate across a broader digital ecosystem. For instance, an agent can be directed to work with software applications (such as plotting and charting tools), search the web for information, collect and compile human feedback, and even leverage additional foundation models. Digital-tool use is both a

智能体 (Agent) 能够与现有软件工具和平台协同工作。除了分析和生成知识外,智能体系统还可以在更广泛的数字生态系统中使用工具并进行交互。例如,可以指导智能体使用软件应用程序(如绘图和图表工具)、在网络上搜索信息、收集和整理人类反馈,甚至利用额外的基础模型。数字工具的使用既是

defining characteristic of agents (it's one way that they can act in the world) but also a way in which their gen Al capabilities can uniquely be brought to bear. Foundation models can learn how to interface with tools, whether through natural language or other interfaces. Without foundation models, these capabilities would require extensive manual efforts to integrate systems (for example, using extract, transform, and load tools) or tedious manual efforts to collate outputs from different software systems.

智能体 (agent) 的显著特征在于其能够通过某种方式在现实世界中行动,这也是其生成式人工智能 (Generative AI) 能力得以独特发挥的途径。基础模型 (foundation model) 能够学习如何与工具交互,无论是通过自然语言还是其他接口。若没有基础模型,这些能力将需要大量人工努力来集成系统(例如使用提取、转换和加载工具),或需要繁琐的手动操作来整理不同软件系统的输出。

How gen AI-enabled agents could work

生成式AI智能体如何运作

Agents can support high-complexity use cases across industries and business functions, particularly for workflows involving time-consuming tasks or requiring various specialized types of qualitative and quantitative analysis.Agents do this by recursively breaking down complex workflows and performing subtasks across specialized instructions and data sources to reach the desired goal. The process generally follows these four steps (Exhibit 1):

AI智能体 (AI Agent) 能够支持跨行业和业务职能的高复杂度用例,尤其适用于涉及耗时任务或需要各种专业定性和定量分析的工作流。其实现方式是通过递归分解复杂工作流,并基于专业指令和数据源执行子任务以达成目标。该过程通常遵循以下四个步骤 (图 1):

User provides instruction:A user interacts with the Al system by giving a natural-language prompt, much like one would instruct a trusted employee. The system identifies the intended use case, asking the user for additional clarification when required.

用户通过自然语言提示与AI系统交互,类似于指导可信员工的方式。系统会识别预期用例,并在需要时要求用户提供额外说明。

Agent system plans, allocates, and executes work: The agent system processes the prompt into a workflow, breaking it down into tasks and subtasks, which a manager subagent assigns to other specialized sub agents.These sub agents, equipped with necessary domain knowledge and tools, draw on prior"experiences" and codified domain expertise, coordinating with each other and using organizational data and systems to execute these assignments.

AI智能体系统规划、分配并执行工作:AI智能体系统将提示处理为工作流,将其分解为任务和子任务,由管理子智能体分配给其他专业子智能体。这些子智能体具备必要的领域知识和工具,借鉴先前的"经验"和系统化的领域专长,相互协调并利用组织数据和系统来执行这些任务。

Agent system iterative ly improves output: Throughout the process, the agent may request additional user input to ensure accuracy and relevance. The process may conclude with the agent providing final output to the user, iterating on any feedback shared by the user.

智能体系统迭代改进输出:在整个过程中,智能体可能会请求额外的用户输入以确保准确性和相关性。该过程可能以智能体向用户提供最终输出作为结束,并根据用户反馈进行迭代改进。

Agents enabled by generative Al soon could function as hyper efficient virtual coworkers.

生成式AI驱动的智能体即将成为高效虚拟同事

McKinsey & Company

Agent executes action: The agent executes any necessary actions in the world to fully complete the user-requested task.
AI智能体执行操作: AI智能体在世界中执行任何必要的操作, 以完全完成用户请求的任务。

Art of the possible: Three potential use cases

可能性的艺术:三个潜在用例

What do these kinds of systems mean for businesses? The following three hypothetical use cases offer a glimpse of what could be possible in the not-too-distant future.

这类系统对企业意味着什么?以下三个假设用例展示了在不久的将来可能实现的应用场景。

Use case 1: Loan underwriting

用例 1: 贷款审批

Financial institutions prepare credit-risk memos to assess the risks of extending credit or a loan to a borrower. The process involves compiling, analyzing, and reviewing various forms of information pertaining to the borrower, loan type, and other factors. Given the multiplicity of credit-risk scenarios and analyses required, this tends to be a time-consuming and highly collaborative effort, requiring a relationship manager to work with the borrower, stakeholders, and credit analysts to conduct specialized analyses, which are then submitted to a credit manager for review and additional expertise.

金融机构通过编写信用风险备忘录来评估向借款人提供信贷或贷款的风险。该流程涉及整理、分析和审查与借款人、贷款类型及其他因素相关的各类信息。由于需要处理的信用风险场景和分析类型多样,这往往成为一项耗时且高度协作的工作:客户经理需要与借款人、利益相关方及信贷分析师共同完成专项分析,随后将分析结果提交给信贷经理进行复核并获取专业意见。

Potential agent-based solution:Anagentic system—comprising multiple agents, each assuming a specialized, task-based role—could potentially be designed tohandle a wide range of credit-risk scenarios. A human user would initiate the process by using natural language to provide a high-level work plan of tasks with specific rules, standards, and conditions.Then this team of agents would break down the work into executable subtasks.

基于智能体的潜在解决方案:一个由多个智能体组成的系统——每个智能体承担专业化的、基于任务的特定角色——有望被设计用于处理广泛的信用风险场景。用户可使用自然语言启动流程,提供包含具体规则、标准和条件的顶层工作任务规划。随后这个智能体团队会将工作分解为可执行的子任务。

One agent, for example, could act as the relationship manager to handle communications between the borrower and financial institutions. An executor agent could compile the necessary documents and forward them to a financial analyst agent that would, say, examine debt from cash flow statements and calculate relevant financial ratios, which would then be reviewed by a critic agent to identify discrepancies and errors and provide feedback. This process of breakdown, analysis, refinement,and review would be repeated until the final credit memo is completed (Exhibit 2).

例如, 某个智能体可作为关系经理, 负责处理借款人与金融机构间的沟通。执行者智能体可汇编必要文件并转发给金融分析师智能体, 由后者审查现金流量表中的债务并计算相关财务比率, 再由评审智能体核查差异错误并提供反馈。这种分解、分析、优化和复核的流程将循环往复, 直至最终信贷备忘录完成 (图表 2) 。

Unlike simpler gen Al architectures, agents can produce high-quality content, reducing review cycle times by 20 to 60 percent. Agents are also able to traverse multiple systems and make sense of data pulled from multiple sources. Finally, agents can show their work: credit analysts can quickly drill into any generated text or numbers, accessing the complete chain of tasks and using data sources to produce the generated insights. This facilitates the rapid verification of outputs.

与简单的生成式AI (Generative AI) 架构不同,AI智能体 (AI Agent) 能够生成高质量内容,将审核周期缩短20%至60%。AI智能体还能跨多个系统运行,整合来自不同来源的数据。此外,AI智能体可展示其工作过程:信贷分析师能快速追溯任何生成文本或数字的完整任务链,通过调用数据源验证生成结论的可靠性,这显著加速了输出结果的核查效率。

Use case 2: Code documentation and modernization

用例2: 代码文档化与现代化

Legacy software applications and systems at large enterprises often pose security risks and can slow thepace of business innovation.But modernizing these systems can be complex, costly, and time-intensive, requiring engineers to review and understand millionsof linesof theolder codebase and manual documentation of business logic, and then translating this logic to an updated codebase and integrating it with other systems.

大型企业的遗留软件应用程序和系统通常存在安全风险,并可能拖慢业务创新步伐。但实现这些系统的现代化改造可能复杂、昂贵且耗时,需要工程师审查和理解数百万行旧代码库及业务逻辑的手动文档,然后将这些逻辑转换到更新的代码库中并与其他系统集成。

Potential agent-based solution:Al agentshave the potential to significantly streamline this process. A specialized agent could be deployed as a legacy-software expert, analyzing old code and documenting and translating various code segments. Concurrently, a quality assurance agent could critique this documentation and produce test cases, helping the Al system to iterative ly refine its output and ensure its accuracy and adherence to

基于智能体的潜在解决方案:AI智能体 (AI Agent) 有潜力显著简化这一流程。可部署专业智能体作为遗留软件专家,分析旧代码并对各代码段进行文档记录和翻译。同时,质量保障智能体可对文档进行评审并生成测试用例,帮助AI系统迭代优化输出结果,确保其准确性和合规性。

Generative Al agents have the potential to change the way we work by supercharging productivity.

生成式AI智能体 (Generative AI Agents) 有望通过提升生产力来改变我们的工作方式。

ll lust rat ive use case:credit-riskmemos

说明性用例:信用风险备忘录

Financial institutions often spend 1-4 weeks creating a credit-risk memo. The current process:

金融机构通常需要花费1-4周时间撰写信用风险备忘录。当前流程:

Generative Al (gen Al) agents could cut time spent on creating credit-risk memos by $20{-}60%$ using these steps:

生成式AI (Generative AI) 智能体可通过以下步骤将信用风险备忘录的撰写时间缩短 $20{-}60%$:

organizational standards. The repeatable nature of this process, meanwhile, could produce a flywheel effect, in which components of the agent framework arereused for other software migrations across the organization, significantly improving productivity and reducing the overall cost in software development.

组织标准。同时,这种流程的可重复性可能产生飞轮效应:智能体框架的组件可在组织内其他软件迁移项目中复用,从而显著提升生产力并降低软件开发总成本。

Use case 3: Online marketing campaign creation

用例3:在线营销活动创建

Designing, launching, and running an online marketing campaign tends to involve an array of different software tools, applications, and platforms. And the workflow for an online marketing campaign is highly complex. Business objectives and market trends must be translated into creative campaign ideas. Written and visual material must be created and customized for different segments and geographies.