Investigating Tax Evasion Emergence Using Dual Large Language Model and Deep Reinforcement Learning Powered Agent-based Simulation

基于双重大语言模型和深度强化学习驱动的智能体仿真研究税务规避现象

Teddy Lazebnik1, Labib Shami2,∗

1 Department of Cancer Biology, Cancer Institute, University College London, London, UK 2 Department of Economics, Western Galilee College, Acre, Israel ∗Corresponding author: labibs $@$ wgalil.ac.il

1 伦敦大学学院癌症研究所癌症生物学系，英国伦敦 2 西加利利学院经济系，以色列阿卡 ∗通讯作者：labibs $@$ wgalil.ac.il

Abstract

摘要

Tax evasion, usually the largest component of an informal economy, is a persistent challenge over history with significant socio-economic implications. Many socio-economic studies investigate its dynamics, including influencing factors, the role and influence of taxation policies, and the prediction of the tax evasion volume over time. These studies assumed such behavior is given, as observed in the real world, neglecting the “big bang” of such activity in a population. To this end, computational economy studies adopted developments in computer simulations, in general, and recent innovations in artificial intelligence (AI), in particular, to simulate and study informal economy appearance in various socio-economic settings. This study presents a novel computational framework to examine the dynamics of tax evasion and the emergence of informal economic activity. Employing an agent-based simulation powered by Large Language Models and Deep Reinforcement Learning, the framework is uniquely designed to allow informal economic behaviors to emerge organically, without presupposing their existence or explicitly signaling agents about the possibility of evasion. This provides a rigorous approach for exploring the socio-economic determinants of compliance behavior. The experimental design, comprising model validation and exploratory phases, demonstrates the framework’s robustness in replicating theoretical economic behaviors. Findings indicate that individual personality traits, external narratives, enforcement probabilities, and the perceived efficiency of public goods provision significantly influence both the timing and extent of informal economic activity. The results underscore that efficient public goods provision and robust enforcement mechanisms are complementary; neither alone is sufficient to curtail informal activity effectively. By modeling the emergence of informal economic behavior without assumptions, this research advances the theoretical and practical understanding of tax compliance, offering critical policy insights for designing equitable tax systems and fostering sustainable economic governance.

逃税行为，通常作为非正式经济中最大的组成部分，一直是历史上具有重大社会经济影响的持久挑战。许多社会经济研究探讨了其动态变化，包括影响因素、税收政策的作用与影响，以及逃税规模随时间的预测。这些研究假设这种行为如同现实世界中所观察到的那样是既定的，而忽略了此类活动在人群中“大爆发”的情况。为此，计算经济学研究采纳了计算机模拟的普遍发展，尤其是人工智能（AI）领域的最新创新，来模拟和研究不同社会经济环境下的非正式经济现象。

本研究提出了一种新颖的计算框架，旨在探究逃税动态及非正式经济活动的涌现。利用由大语言模型和深度强化学习驱动的主体模拟（agent-based simulation），该框架独特设计允许非正式经济行为自然涌现，而不预设其存在或明确向智能体（agents）传递逃税可能性。这为探索顺应性行为的社会经济决定因素提供了严谨的方法。

实验设计包含模型验证与探索性阶段，展示了框架在复制理论经济行为方面的鲁棒性。研究结果表明，个体性格特征、外部叙事、执法概率以及公共物品供给的感知效率显著影响非正式经济活动的时间与程度。结果强调，高效的公共物品供给与强有力的执法机制相辅相成；单靠任何一方都无法有效遏制非正式活动。

通过无假设地模拟非正式经济行为的涌现，本研究深化了对税务合规性的理论与实践理解，为设计公平的税收制度与促进可持续经济治理提供了关键的政策洞见。

Keywords: informal economy; socio-economic simulation; computational behavioral economy; economic decision-making; emergent behavior analysis.

关键词：非正式经济；社会经济仿真；计算行为经济学；经济决策；涌现行为分析

1 Introduction

1 引言

The study of informal economic activities has long fascinated researchers and policymakers alike due to its significant impact on economic stability, taxation policies, and societal well-being (Shami, 2019; Gyomai and van de Ven, 2014). Despite some positive contributions, such as informal wealth redistribution, the informal economic activity undermines tax revenues and public goods provision. The study of informal economic activities can lead to better estimation of economic indicators, such as GDP, impacting macroeconomic policies significantly (Gyomai et al., 2012). In addition, the coexistence of informal and formal economies can erode trust in public institutions and contribute to misusing social insurance programs and reducing tax revenues, as evidenced in previous studies (Schneider, 2016).

非正规经济活动的长期研究因其对经济稳定性、税收政策和社会福祉的重大影响而吸引了研究人员和政策制定者的广泛关注 (Shami, 2019; Gyomai and van de Ven, 2014)。尽管非正规经济活动在一些方面有积极贡献，如非正规的财富再分配，但它也削弱了税收收入和公共产品的提供。研究非正规经济活动可以更好地估计经济指标，如GDP，从而显著影响宏观经济政策 (Gyomai et al., 2012)。此外，非正规经济与正规经济的并存可能会削弱对公共机构的信任，并导致社会保障计划的滥用和税收收入的减少，正如先前研究所证明的那样 (Schneider, 2016)。

To tackle this challenge, scholars proposed a wide range of models regarding the informal economy and its dynamics as a whole, as well as models and methods to measure the size of such informal economy (Schneider et al., 2010; Breusch, $2005\mathrm{a}$ ; Enste and Schneider, 2002; Schneider and Enste, 2000). Nonetheless, these studies more often than not fall short as authors often emphasize varying aspects of the informal economy, failing to capture the entire dynamics and the root causes that generate the informal economy (Ha et al., 2021; Schneider and Buehn, 2016; Elgin and Schneider, 2016; Breusch, 2005b).

为应对这一挑战，学者们提出了大量关于非正规经济及其整体动态的模型，以及衡量此类非正规经济规模的模型和方法 (Schneider et al., 2010; Breusch, $2005\mathrm{a}$; Enste and Schneider, 2002; Schneider and Enste, 2000)。然而，这些研究往往存在不足，因为作者们通常强调非正规经济的各个方面，未能捕捉到其整体动态及产生非正规经济的根本原因 (Ha et al., 2021; Schneider and Buehn, 2016; Elgin and Schneider, 2016; Breusch, 2005b)。

As an indicator for such “shooting in the dark” scenario, estimations over time, even based on the same data, exhibit considerable variability (Schneider and Buehn, 2018; Thai and Turkina, 2013). These results are obtained using a diverse set of methods, including the direct approach which assesses the magnitude of the informal economy through either voluntary survey responses or tax audit techniques (Cantekin and Elgin, 2017; Feld and Larsen, 2012; Feld and Schneider, 2010), an indirect approach which uses macroeconomic methods and involves the utilization of diverse economic and non-economic indicators that provide insights into the evolution of the informal economy over time due to available indicators, mainly from the formal economy (Tanzi, 1980, 1983; Ferwerda et al., 2010; Ardizzi et al., 2014), and the modeling approach which uses statistical and data-driven models to estimate the informal economy as an unobservable (latent) variable (Elgin and Erturk, 2019; Andrews et al., 2011; Elgin and Schneider, 2016).

作为这种“盲目射击”场景的指标，随着时间推移的估计，即使基于相同的数据，也表现出相当大的可变性 (Schneider and Buehn, 2018; Thai and Turkina, 2013)。这些结果是使用多种方法获得的，包括直接方法，即通过自愿调查响应或税务审计技术评估非正规经济的规模 (Cantekin and Elgin, 2017; Feld and Larsen, 2012; Feld and Schneider, 2010)，间接方法，即使用宏观经济方法并涉及利用各种经济和非经济指标，这些指标由于可用的指标（主要来自正规经济）提供了对非正规经济随时间演变的洞察 (Tanzi, 1980, 1983; Ferwerda et al., 2010; Ardizzi et al., 2014)，以及建模方法，即使用统计和数据驱动模型将非正规经济估计为不可观察的（潜在）变量 (Elgin and Erturk, 2019; Andrews et al., 2011; Elgin and Schneider, 2016)。

From these approaches, the modeling approach is considered the most accurate and app li cat ive in real-world settings (Schneider and Buehn, 2017; Schneider and Enste, 2000). Indeed, a growing body of work has emerged in recent years of studies using the modeling approach to estimate the dynamics and size of the informal economy (Kireenko and Nevzorova, 2015; Alanon and Gomez-Antonio, 2005). In particular, machine learning (ML) and deep learning (DL) based models have shown to be powerful tools to study the informal economy size (Shami and Lazebnik, 2023; Lazebnik, 2024; Felix et al., 2023; Ivas and Tefoni, 2023), marking the first step toward using artificial intelligence (AI) to study the informal economy. Nevertheless, these models focused on the informal economy’s size and the macroeconomic indicators responsible for such size, ignoring the more basic question of how the informal economy is established, changed, and adapted to the formal economy and government policies.

在这些方法中，建模方法被认为是在现实世界中最准确和适用的 (Schneider and Buehn, 2017; Schneider and Enste, 2000)。事实上，近年来出现了越来越多使用建模方法来估计非正规经济动态和规模的研究 (Kireenko and Nevzorova, 2015; Alanon and Gomez-Antonio, 2005)。特别是，基于机器学习 (ML) 和深度学习 (DL) 的模型已被证明是研究非正规经济规模的有力工具 (Shami and Lazebnik, 2023; Lazebnik, 2024; Felix et al., 2023; Ivas and Tefoni, 2023)，这标志着使用人工智能 (AI) 研究非正规经济的第一步。然而，这些模型主要关注非正规经济的规模以及导致这种规模的宏观经济指标，忽略了非正规经济如何建立、变化以及适应正规经济和政府政策等更基本的问题。

In a more general sense, the central challenge in any economic model lies in its ability to effectively replicate the phenomenon it aims to investigate, based on the assumptions and structure defined by its developers. This critique is particularly salient in theoretical models addressing informal economic activity as such models often presuppose the existence of informal economic phenomena and incorporate this presumption into their foundational parameters, thereby undermining their core purpose: to simulate and analyze the emergence of the phenomenon rather than to assume its presence (Ferraro et al., 2005; Bodenhorn, 1956). This methodological flaw is not a trivial matter; by assuming the existence of the phenomenon, these models are inherently limited in their capacity to elucidate the underlying causes of its formation. Consequently, they fail to contribute meaningfully to our understanding and risk becoming analytically redundant. This study seeks to address this gap by proposing a model that refrains from presupposing the existence of informal economic activity. Instead, it builds upon fundamental characteristics of economic behavior to explore how an informal economy might emerge alongside a formal economy within the broader economic system.

从更广泛的意义上讲，任何经济模型的核心挑战在于其能否基于开发者定义的假设和结构，有效复制其旨在研究的现象。这一批评在涉及非正式经济活动的理论模型中尤为突出，因为这类模型通常预先假定非正式经济现象的存在，并将这一假设纳入其基本参数中，从而削弱了其核心目的：模拟和分析现象的出现，而非假设其存在 (Ferraro et al., 2005; Bodenhorn, 1956)。这一方法上的缺陷并非小事；通过假设现象的存在，这些模型在解释其形成的基本原因方面的能力受到本质上的限制。因此，它们无法为我们的理解做出有意义的贡献，并且可能在分析上变得冗余。本研究旨在通过提出一个不预先假设非正式经济活动存在的模型来解决这一差距。相反，该模型建立在经济行为的基本特征之上，以探索在更广泛的经济体系中，非正式经济如何与正式经济同时出现。

To this end, in this study, we explore the informal economy from a micro economic perspective using in silico methodology. Formally, we take advantage of the recent advances in the field of AI in the form of Large-Language Models (LLMs) that present similar reasoning, decision-making, and world-understanding performance to humans (Ke et al., 2024; Gilhooly, 2023; Ivey et al., 2024). Formally, we developed an agent-based simulation (ABS) with a heterogeneous population and central government, operating in a monetized economy, such that each agent has a unique personality powered by a combined LLM and Deep Reinforcement Learning (DRL) model. Using this approach, we investigate the emergence of informal economic activity, focusing specifically on tax evasion, over different scenarios as well as its dynamics following various economic interventions of the central government. Our objective is to model the emergence and behavior of an informal economy, in the form of tax evasion, where rational agents, equipped with limited knowledge, interact in a dynamic environment characterized by transactions, risks, and tax obligations. The novelty of the proposed study lies in the utilization of a multi-agent AI model to study the informal economy dynamics that emerge from the LLM’s world knowledge rather than pre-programmed actions defined by the modeler which artificially allows and encourages the simulated agents to practice in an informal economic activity.

为此，在本研究中，我们从微观经济学的角度，利用计算机模拟方法探索非正规经济。具体而言，我们利用大语言模型（LLMs）领域的最新进展，这些模型展现出与人类相似的推理、决策和世界理解能力（Ke et al., 2024; Gilhooly, 2023; Ivey et al., 2024）。我们正式开发了一个基于智能体的模拟（ABS），模拟了一个异质性群体和中央政府在一个货币化经济中的运作，每个智能体都由一个结合了LLM和深度强化学习（DRL）模型提供动力，拥有独特的个性。通过这种方法，我们研究了不同情景下非正规经济活动的出现，特别是逃税行为，以及中央政府对不同经济干预措施后的动态变化。我们的目标是以逃税的形式模拟非正规经济的出现和行为，其中具备有限知识的理性智能体在由交易、风险和税收义务为特征的动态环境中互动。本研究的创新之处在于利用多智能体AI模型来研究非正规经济动态，这些动态源于LLM的世界知识，而非由模型设计者预先定义的行为，这些行为人为地允许并鼓励模拟智能体参与非正规经济活动。

The remainder of the paper is organized as follows. Section 2 provides an overview of the computational methods used as part of the model as well as the economic theory of informal economy, in general, and tax evasion, in particular. Section 3 formally introduces the proposed AI-driven agent-based simulation model. Section 4 outlines the experimental settings using the proposed model inspired by the socio-economic settings in the United States (US) and presents the obtained results for the experiments. Section 5 discusses the economic applications of the obtained results and suggests possible future work. Section 6 concludes briefly.

本文的其余部分组织如下。第 2 节概述了模型中使用的计算方法以及非正规经济的一般经济学理论，特别是逃税理论。第 3 节正式介绍了提出的基于 AI 智能体的仿真模型。第 4 节概述了受美国社会经济环境启发的实验设置，并展示了实验结果。第 5 节讨论了实验结果的经济应用并提出了未来的研究方向。第 6 节简要总结。

2 相关工作

In this section, we outline the economic and computational background of this study. We initially reviewed the economic theory of the informal economy establishment and dynamics followed by an overview of macroeconomic informal economy models. Afterward, we focus on the computational methods adopted for the proposed model, including the agent-based simulation approach, LLM and their usage as decision-making tools, and deep reinforcement learning as a method to allow AI agents to solve complex tasks in dynamic environments.

在本节中，我们概述了本研究的经济和计算背景。首先，我们回顾了非正规经济建立和动态的经济理论，随后对宏观经济中的非正规经济模型进行了概述。接着，我们重点介绍了所提出模型采用的计算方法，包括基于智能体的模拟方法、大语言模型 (LLM) 及其作为决策工具的用途，以及深度强化学习作为让 AI 智能体在动态环境中解决复杂任务的方法。

2.1 The economic rationale behind tax evasion

2.1 逃税背后的经济原理

Tax evasion is a significant challenge for governments globally, impacting tax revenue collection and undermining public trust in tax systems (Sandmo, 2005). Formally, tax evasion is the illegal act of deliberately avoiding paying taxes owed to the government by under reporting income, inflating deductions, or concealing money or assets (Slemrod, 1985). Research into the motivations behind tax evasion has highlighted various economic, psychological, and institutional factors (Elffers et al., 1987; Khlif and Achek, 2015). In particular, factors such as demographic characteristics, personality traits, perceptions of tax fairness, and cultural contexts; influence taxpayers’ attitudes toward tax evasion (Khlif and Achek, 2015).

逃税是全球各国政府面临的一项重大挑战，影响了税收征收，并削弱了公众对税收制度的信任 (Sandmo, 2005)。从法律上讲，逃税是一种故意通过少报收入、夸大扣除额或隐瞒资金或资产来避免向政府缴纳税款的非法行为 (Slemrod, 1985)。对逃税动机的研究强调了各种经济、心理和制度因素 (Elffers et al., 1987; Khlif and Achek, 2015)。特别是，人口特征、个性特征、对税收公平的看法和文化背景等因素会影响纳税人对逃税的态度 (Khlif and Achek, 2015)。

Characterizing the likelihood of an individual evading taxes is complex and often incorporates opposing aspects of economic reasoning (Weigel et al., 1987). For instance, economists have posited a relationship between tax rates and tax evasion, suggesting that higher levels of taxation create a stronger incentive to avoid tax obligations (Alexi et al., 2023). Similarly, at a given point in time, taxpayers subject to high marginal tax rates may experience greater financial rewards from tax evasion compared to those facing lower rates, potentially leading to higher evasion behavior among the former. However, this relationship is not straightforward and may be disrupted by the principle of diminishing marginal utility of money. While the potential rewards of tax resistance are greater for highincome taxpayers, they may ascribe lower economic value to these gains compared to lower-income taxpayers, who might perceive the additional income as addressing more immediate financial needs (Hofmann et al., 2017).

描述个人逃税的可能性是复杂的，通常包含经济推理的对立面 (Weigel et al., 1987)。例如，经济学家提出了税率与逃税之间的关系，认为更高的税率会创造更强的避税动机 (Alexi et al., 2023)。同样，在某个时间点，面临高边际税率的纳税人可能比面临较低税率的人从逃税中获得更大的财务回报，这可能导致前者有更高的逃税行为。然而，这种关系并不简单，可能会被金钱边际效用递减的原则所打断。虽然高收入纳税人逃税的潜在回报更大，但与低收入纳税人相比，他们可能对这些收益赋予较低的经济价值，而低收入纳税人可能将额外收入视为满足更紧迫的财务需求 (Hofmann et al., 2017)。

When focusing on income taxes (government-imposed tax on an individual or entity’s earnings), due to its global utilization and influence on the economy (Graham et al., 2012), the seminal work by Allingham and Sandmo (1972) on income tax evasion provides a theoretical framework that has significantly influenced subsequent research in the field. Their model conceptualizes tax evasion as a decision under uncertainty, where taxpayers weigh the potential benefits of evasion against the risks of detection and penalties, assuming that taxpayers are amoral, risk-averse, and driven by utility maximization. Their model is based on Becker’s (1968) theory of crime, emphasizing rational decision-making through empirical testing and econometric modeling. The authors posit that higher tax rates increase the incentive for individuals to evade taxes, as the potential financial benefits of evasion become more substantial. Similarly, Wentworth and Rickel (1985) suggest that individuals consider the economic benefit of tax evasion against the risk of detection and penalties, and Dean et al. (1980) found that perceived high tax levels and unfairness in tax burdens were commonly cited reasons for tax evasion.

当聚焦于所得税（政府向个人或实体的收入征收的税）时，由于其全球应用及其对经济的影响（Graham et al., 2012），Allingham 和 Sandmo (1972) 在所得税逃避方面的开创性工作提供了一个显著影响该领域后续研究的理论框架。他们的模型将逃税概念化为不确定性下的决策，纳税人权衡逃税的潜在收益与被发现和处罚的风险，假设纳税人是非道德的、风险规避的，并且受效用最大化的驱动。该模型基于 Becker (1968) 的犯罪理论，强调通过经验测试和计量经济学建模进行理性决策。作者认为，较高的税率会增加个人逃税的动机，因为逃税的潜在财务收益变得更加可观。同样，Wentworth 和 Rickel (1985) 提出，个人会权衡逃税的经济收益与被发现和处罚的风险，而 Dean 等人 (1980) 发现，感知到的高税收水平和不公平税收负担是常被提及的逃税原因。

Nevertheless, the relationship between tax rates and tax evasion is not able to explain the observed socio-economic dynamics fully as at its core it is based on the concept of diminishing marginal utility of income, which seems not to capture the entire story (Pommerehne and Weck-Hannemann, 1996a; Adebisi et al., 2013; Sury, 2015; Wall schutz ky, 1984; Dergham and Al-Omour, 2010). This complexity is further explored in studies incorporating behavioral economics perspectives, such as prospect theory (Levy, 1992), to understand taxpayers’ decision-making processes. For example, Piolatto and Rablen (2017) examines how elements of prospect theory, including loss aversion and probability weighting, influence tax evasion behavior, challenging traditional expected utility models. The authors revisit the Yitzhaki puzzle (Yitzhaklt, 1974), which suggests that tax evasion decreases as the marginal tax rate increases, a counter intuitive result under the standard expected utility theory. The authors explore whether prospect theory, which accounts for psychological factors, can resolve this puzzle. The findings indicate that while prospect theory introduces new dimensions to understanding tax evasion, it does not universally overturn the Yitzhaki puzzle without specific conditions or modifications to the reference level used in the model.

然而，税率与逃税之间的关系并不能完全解释所观察到的社会经济动态，因为其核心基于收入边际效用递减的概念，这一概念似乎未能捕捉到全部情况 (Pommerehne and Weck-Hannemann, 1996a; Adebisi et al., 2013; Sury, 2015; Wall schutz ky, 1984; Dergham and Al-Omour, 2010) 。这种复杂性在结合行为经济学视角的研究中得到了进一步探讨，例如前景理论 (Levy, 1992) ，以理解纳税人的决策过程。例如，Piolatto 和 Rablen (2017) 研究了前景理论中的损失厌恶和概率加权等要素如何影响逃税行为，挑战了传统的期望效用模型。作者重新审视了 Yitzhaki 悖论 (Yitzhaklt, 1974) ，该悖论表明，随着边际税率的增加，逃税行为会减少，这是标准期望效用理论下的一个反直觉结果。作者探讨了考虑到心理因素的前景理论是否可以解决这一悖论。研究结果表明，虽然前景理论为理解逃税行为引入了新的维度，但如果没有特定条件或对模型中使用的参考水平进行修改，它并不能普遍推翻 Yitzhaki 悖论。

The empirical study by McGee and Maranjyan (2006) in Armenia found that taxpayers justified evasion when they believed their government did not use tax revenue responsibly. A study by Uadiale and Noah (2010) in Nigeria shows how individuals’ ethical beliefs and perceptions of the social contract influence their tax compliance (Temitope et al., 2010). People who view tax compliance as a moral obligation are less likely to evade taxes, while those who consider tax payments as optional or unjust are more inclined to evade them. This moral reasoning, where taxpayers justify evasion as a response to perceived governmental misuse of funds, reflects a broader ethical dilemma facing taxpayers. Similarly, Green (2008) explores how individuals rationalize tax evasion by framing it as a reaction to government inefficiencies or a corrupt tax system. A key variable examined by Dean et al. (1980) is the perception that government tax revenues are not efficiently allocated to finance public goods and services. According to the study’s findings, approximately $62%$ of respondents expressed a negative view regarding the government’s effective use of tax revenues.

McGee 和 Maranjyan (2006) 在亚美尼亚的实证研究发现，当纳税人认为政府没有负责任地使用税收时，他们会为逃税行为辩护。Uadiale 和 Noah (2010) 在尼日利亚的一项研究表明，个人的道德信念和社会契约观念如何影响他们的纳税合规性 (Temitope 等, 2010)。将纳税合规视为道德义务的人不太可能逃税，而将纳税视为可选项或不公正的人更倾向于逃税。这种道德推理，即纳税人将逃税作为对政府滥用资金的回应，反映了纳税人面临的更广泛的道德困境。同样，Green (2008) 探讨了个人如何通过将逃税行为归结为对政府低效或腐败税收制度的反应来合理化逃税行为。Dean 等 (1980) 研究的一个关键变量是纳税人认为政府税收没有有效分配用于公共产品和服务。根据研究结果，大约 $62%$ 的受访者对政府有效使用税收持负面看法。

Our approach in the current study combines both worlds and gives expression to both the utility-maximization approach and behavioral-social considerations in the individual’s decision.

我们的方法在当前研究中结合了两个方面，既体现了效用最大化的方法，又在个体决策中融入了行为社会因素的考量。

2.2 Large language models

2.2 大语言模型

A LLM is an AI model designed to process and generate human-like text through the use of DL techniques, in general, and using the Transformer neural network architecture, in particular (Zhao et al., 2024). It typically involves a neural network architecture with numerous layers and parameters that are trained on large datasets of text (G rue tze mac her and Paradice, 2022) which can get up to 110 billion words1. The training process involves learning the statistical patterns and relationships within the text data, allowing the model to generate coherent and con textually relevant responses to input text or prompts (Chang et al., 2024).

大语言模型（LLM）是一种通过深度学习（DL）技术，特别是使用Transformer神经网络架构，设计用于处理和生成类人文本的 AI 模型 (Zhao et al., 2024)。它通常涉及具有多层和大量参数的神经网络架构，这些架构在包含多达1100亿单词的文本数据集上进行训练 (Grue tze mac her and Paradice, 2022)。训练过程包括学习文本数据中的统计模式和关系，使模型能够生成与输入文本或提示连贯且上下文相关的响应 (Chang et al., 2024)。

LLMs have driven significant advancements in natural language processing and are now integral to various products with millions of users, including the coding assistant Copilot by Microsoft, the Bing search engine, and more recently, ChatGPT by OpenAI (Chen et al., 2023; Egli, 2023; Youssef, 2023). The combination of memorization and composition ally has enabled LLMs to perform tasks such as language understanding and both conditional and unconditional text generation at an unprecedented level of performance (Kocon et al., 2023). This progress paves the way for more sophisticated and higher-bandwidth human-computer interactions (Huang and Tan, 2023; Nadkarni et al., 2011; Sallam, 2023; Rosenfeld and Lazebnik, 2024; Lazebnik and Rosenfeld, 2024).

大语言模型推动了自然语言处理领域的显著进展，并已成为数百万用户使用的各种产品的核心部分，包括 Microsoft 的编程助手 Copilot、Bing 搜索引擎，以及最近的 OpenAI 的 ChatGPT (Chen et al., 2023; Egli, 2023; Youssef, 2023)。记忆与组合能力的结合使大语言模型能够以前所未有的性能水平执行语言理解以及条件和无条件文本生成等任务 (Kocon et al., 2023)。这种进展为更复杂和更高带宽的人机交互铺平了道路 (Huang and Tan, 2023; Nadkarni et al., 2011; Sallam, 2023; Rosenfeld and Lazebnik, 2024; Lazebnik and Rosenfeld, 2024)。

LLMs have demonstrated impressive potential in achieving reasoning and planning capabilities comparable to humans (Espejel et al., 2023; Guo et al., 2023). This aligns perfectly with human expectations for autonomous agents that can perceive their surroundings, make decisions, and take actions accordingly (Wang et al., 2024). Consequently, LLM-based agents have garnered significant attention and development to comprehend and generate human-like instructions, enabling sophisticated interactions and decision-making across various contexts (Mehandru et al., 2023; Zhang et al., 2024; Chen et al., 2024). Inspired by the remarkable capabilities of individual LLM-based agents, researchers have proposed LLM-based multi-agents to harness collective intelligence and specialized profiles and skills from multiple agents (Cheng et al., 2024; Wu et al., 2024). Compared to systems relying on a single LLM-powered agent, multiagent systems offer advanced capabilities by segmenting LLMs into distinct agents with unique capabilities and facilitating interactions among these diverse agents to effectively simulate complex real-world environments (de Zarza et al., 2023). In this framework, multiple autonomous agents collaborate in planning, discussions, and decision-making, mimicking the cooperative nature of human group work in problem-solving tasks (Rasal and Hauer, 2024).

大语言模型在实现与人类相当的推理和规划能力方面展示了令人印象深刻的潜力 (Espejel et al., 2023; Guo et al., 2023)。这完全符合人类对能够感知环境、做出决策并采取相应行动的自主AI智能体的期望 (Wang et al., 2024)。因此，基于大语言模型的AI智能体在理解和生成类似人类的指令方面引起了广泛关注和发展，使其能够在各种情境下进行复杂的交互和决策 (Mehandru et al., 2023; Zhang et al., 2024; Chen et al., 2024)。受单个基于大语言模型的AI智能体的显著能力启发，研究人员提出了基于大语言模型的多智能体系统，以利用多个智能体的集体智慧和专业技能 (Cheng et al., 2024; Wu et al., 2024)。与依赖单个基于大语言模型的AI智能体的系统相比，多智能体系统通过将大语言模型分割为具有独特能力的独立智能体，并促进这些不同智能体之间的交互，有效模拟复杂的现实世界环境 (de Zarza et al., 2023)。在这一框架中，多个自主AI智能体在规划、讨论和决策中进行合作，模仿人类群体在解决问题任务中的协作性质 (Rasal and Hauer, 2024)。

The multi-agent LLM approach leverages the communicative abilities of LLMs, utilizing their text generation and response capabilities. Moreover, it taps into LLMs’ broad knowledge across domains and potential for specialization in specific tasks (Zhang et al., 2023). Recent studies have shown promising results in employing LLM-based multi-agents for various tasks such as software development (Nam et al., 2024), multi-robot systems (Luan et al., 2024), and society simulation (Gao et al., 2024).

多智能体大语言模型方法利用了大语言模型的交流能力，运用了其文本生成和响应能力。此外，它还挖掘了大语言模型在各领域的广泛知识以及在特定任务中的专业化潜力（Zhang等人，2023）。最近的研究表明，基于大语言模型的多智能体在各种任务中展现出良好的效果，如软件开发（Nam等人，2024）、多机器人系统（Luan等人，2024）以及社会模拟（Gao等人，2024）。

2.3 Deep reinforcement learning

2.3 深度强化学习

Reinforcement learning (RL) is a type of ML where an agent (or group of agents) learns to make decisions by interacting with an environment to maximize cumulative rewards (Abdellatif et al., 2018). The key components of RL are the agent, environment, actions, states, and rewards (Abdellatif et al., 2018). The agent is the learner or decision-maker, while the environment represents everything the agent interacts with, including other agents. States are the different situations in which the agent can be, and actions are the choices the agent can make. The agent receives rewards or punishments as feedback based on its actions, guiding it to learn optimal behaviors over time. The agent uses a policy, which is a strategy mapping states to actions, to maximize the total expected reward, often utilizing value functions to estimate the long-term benefit of actions (El-Bouri et al., 2021). Through exploration (trying new actions) and exploitation (using known actions that yield high rewards), the agent improves its policy, aiming to achieve the best possible outcomes in the environment (El-Bouri et al., 2021).

强化学习 (Reinforcement Learning, RL) 是一种机器学习 (Machine Learning, ML) 方法，其中智能体（或一组智能体）通过与环境交互来学习做出决策，以最大化累积奖励 (Abdellatif et al., 2018)。强化学习的关键组成部分包括智能体、环境、动作、状态和奖励 (Abdellatif et al., 2018)。智能体是学习或决策者，而环境代表智能体与之交互的一切，包括其他智能体。状态是智能体可能处于的不同情境，动作是智能体可以做出的选择。智能体根据其行动获得奖励或惩罚作为反馈，随着时间的推移引导其学习最佳行为。智能体使用策略（一种将状态映射到动作的策略）来最大化总预期奖励，通常利用价值函数来估计动作的长期收益 (El-Bouri et al., 2021)。通过探索（尝试新动作）和利用（使用已知会产生高奖励的动作），智能体改进其策略，旨在在环境中实现最佳结果 (El-Bouri et al., 2021)。

Deep reinforcement learning (DRL) extends RL by incorporating deep neural networks (Sainath et al., 2015) to handle complex decision-making tasks (Mao et al., 2016a,b; Hurtado Sa´nchez et al., 2022; Giupponi et al., 2005). In DRL, an agent interacts with an environment to maximize cumulative rewards, just like in traditional RL. However, DRL leverages deep learning to efficiently process high-dimensional input and approximate the optimal policy or value functions in a numerical fashion. The key components—agent, environment, actions, states, and rewards—remain the same (Mao et al., 2016a). DRL has multiple implications with unique strengths and limitations (Hao et al., 2023; Stooke and Abbeel, 2019; Kahn et al., 2018). For instance, Deep Q-Networks uses a deep neural network to approximate the Q-value function, which estimates the expected reward for taking a particular action in a given state (Fan et al., 2020). By using experience replay and target networks, Deep Q-Networks can stabilize learning and are effective in environments like video games where the state space is large and complex. Proximal Policy Optimization is a policy gradient method that improves training stability by using a clipped objective function to limit the size of policy updates (Gu et al., 2022). It balances exploration and exploitation and is known for its robustness and efficiency in continuous control tasks such as robotic manipulation and locomotion. Actor-Critic methods is a family of methods that involve two neural networks - the actor, which selects actions, and the critic, which evaluates them by estimating the value function (Grondman et al., 2012). This approach allows the agent to learn both the policy and the value function concurrently, leading to improved learning efficiency and effectiveness in environments with continuous action spaces.

深度强化学习 (Deep Reinforcement Learning, DRL) 通过引入深度神经网络 (Sainath et al., 2015) 来扩展强化学习，以处理复杂的决策任务 (Mao et al., 2016a,b; Hurtado Sa´nchez et al., 2022; Giupponi et al., 2005)。在 DRL 中，智能体与环境交互以最大化累积奖励，与传统强化学习类似。然而，DRL 利用深度学习高效处理高维输入，并以数值方式逼近最优策略或价值函数。其关键组件——智能体、环境、动作、状态和奖励——保持不变 (Mao et al., 2016a)。DRL 具有多种含义，各有独特的优势和局限性 (Hao et al., 2023; Stooke and Abbeel, 2019; Kahn et al., 2018)。例如，深度 Q 网络 (Deep Q-Networks) 使用深度神经网络逼近 Q 值函数，该函数估计在给定状态下执行特定动作的预期奖励 (Fan et al., 2020)。通过使用经验回放和目标网络，深度 Q 网络可以稳定学习，并在状态空间大且复杂的环境中（如视频游戏）表现出色。近端策略优化 (Proximal Policy Optimization) 是一种策略梯度方法，通过使用剪裁目标函数来限制策略更新的幅度，从而提高训练稳定性 (Gu et al., 2022)。它在探索与利用之间取得了平衡，并在机器人操作和运动等连续控制任务中以其鲁棒性和高效性著称。演员-评论家方法 (Actor-Critic methods) 是一类方法，涉及两个神经网络——演员负责选择动作，评论家通过估计价值函数来评估这些动作 (Grondman et al., 2012)。这种方法使智能体能够同时学习策略和价值函数，从而在具有连续动作空间的环境中提高学习效率和效果。

DRL is commonly utilized in the context of multi-agent tasks, in general, and as part of ABS, in particular (Zhang et al., 2023; Hernandez-Leal et al., 2019; Du and Ding, 2021). For example, Bushaj et al. (2022) used DRL for resource allocation and intervention policies in pandemic control settings based on ABS. Lazebnik (2023) used a combined ABS with DRL for the hospital’s staff and resource allocation. The author shows the model well aligned with results from expert-driven models while also successfully dealing with limited knowledge of the state and in a very stochastic environment. Vargas-Perez et al. (2023) develop a DRL agent that represents a brand as part of an ABS of a market with the goal of obtaining a marketing investment strategy that improves the awareness of its corresponding brand in a given marketing scenario. The authors compared the policy obtained by the agent with a human expert, showing a statistically good agreement between the two. Zheng et al. (2020) proposed a detailed and large-scale ABS with DRL agents for taxation policy optimization. The author did not use any economic models or assumptions but rather allowed an economy to emerge from labor cost with skill-related pricing in a heterogeneous population with a central government gathering income and bracketed taxes.

DRL 常用于多智能体任务，特别是作为 ABS 的一部分 (Zhang et al., 2023; Hernandez-Leal et al., 2019; Du and Ding, 2021)。例如，Bushaj et al. (2022) 使用 DRL 进行基于 ABS 的疫情控制环境中的资源分配和干预策略研究。Lazebnik (2023) 结合 ABS 和 DRL 进行医院的员工和资源分配。作者展示的模型与专家驱动模型的结果高度一致，同时成功处理了有限的状态知识和高度随机的环境。Vargas-Perez et al. (2023) 开发了一个代表品牌的 DRL 智能体，作为市场 ABS 的一部分，旨在获得一种营销投资策略，以提高其在特定营销场景中对应品牌的知名度。作者将智能体获得的策略与人类专家进行了比较，结果显示两者在统计上具有良好的一致性。Zheng et al. (2020) 提出了一个详细且大规模的 ABS，使用 DRL 智能体进行税收政策优化。作者没有使用任何经济模型或假设，而是允许经济从劳动成本中产生，并在具有中央政府收入和分级税收的异质群体中进行技能相关的定价。

2.4 Agent-based simulation

2.4 基于AI智能体的模拟 (Agent-based simulation)

Agent-based simulation (ABS) is a computational approach for capturing the (spatio-)temporal dynamics of multiple agents (Polhill et al., 2021; Epstein, 1999; Bonabeau, 2002). An ABS typically comprises two main components: an environment and a population of agents, which can be either homogeneous or heterogeneous (Zhou et al., 2022; Raberto et al., 2001). ABS involves three types of interactions between agents and their environment: spontaneous, agent-agent, and agent-environment interactions. Spontaneous interactions occur between an agent and itself, depending solely on the agent’s current state and time. Agent-agent interactions involve two or more agents, altering the state of at least one of the participating agents. Agent-environment interactions involve agents and their environment, resulting in changes to the state of the agent, the environment, or both. Notably, ABS can be computationally reduced to the population protocol model (Aspnes and Ruppert, 2009) and is thus Turing-complete (North, 2014; Lau be nba cher et al., 2007), meaning that ABS can represent any dynamics solvable by a computer.

基于智能体的仿真 (Agent-based simulation, ABS) 是一种用于捕捉多个智能体时空动态的计算方法 (Polhill et al., 2021; Epstein, 1999; Bonabeau, 2002)。ABS 通常包含两个主要组件：一个环境和一组智能体，这些智能体可以是同质的，也可以是异质的 (Zhou et al., 2022; Raberto et al., 2001)。ABS 涉及智能体与其环境之间的三种交互类型：自发交互、智能体-智能体交互和智能体-环境交互。自发交互发生在智能体与其自身之间，仅取决于智能体的当前状态和时间。智能体-智能体交互涉及两个或更多智能体，改变至少一个参与智能体的状态。智能体-环境交互涉及智能体及其环境，导致智能体、环境或两者的状态发生变化。值得注意的是，ABS 在计算上可以简化为群体协议模型 (Aspnes and Ruppert, 2009)，因此具有图灵完备性 (North, 2014; Lau be nba cher et al., 2007)，这意味着 ABS 可以表示任何可通过计算机求解的动态。

ABS has become a prominent tool for studying complex economic phenomena arising from individual agent interactions (Poledna et al., 2023; Evans et al., 2021; Canese et al., 2021; Epstein and Axtell, 1996; Axelrod, 1998). Studies have employed ABS to model tax policy influence on the economy (Alexi et al., 2023), corruption (Zausinova et al., 2020), and the emergence of formal economies (Axtell, 2007; Tesfatsion, 2002). For example, Lazebnik et al. (2021) used ABS to simulate the spread of a pandemic and its influence on the economy as well as the usage of different pandemic intervention policies and their epidemiological-economical effectiveness for different configurations. Goro chow ski et al. (2012) show that ABS is an effective modeling method for the interactions between cells as well as bacterial populations in synthetic biology. Lanham et al. (2014) used ABS to study crisis de-escalation activities in complex social networks, showing that ABS was able to capture the heterogeneity in the population from real data.

ABS 已成为研究个体智能体交互产生的复杂经济现象的重要工具 (Poledna 等, 2023; Evans 等, 2021; Canese 等, 2021; Epstein 和 Axtell, 1996; Axelrod, 1998)。研究已使用 ABS 来模拟税收政策对经济的影响 (Alexi 等, 2023)、腐败 (Zausinova 等, 2020) 以及正式经济的出现 (Axtell, 2007; Tesfatsion, 2002)。例如，Lazebnik 等 (2021) 使用 ABS 模拟了疫情的传播及其对经济的影响，以及不同疫情干预政策的使用及其在不同配置下的流行病学-经济学有效性。Goro chow ski 等 (2012) 表明，ABS 是合成生物学中细胞和细菌种群之间交互的有效建模方法。Lanham 等 (2014) 使用 ABS 研究了复杂社交网络中的危机降级活动，表明 ABS 能够从真实数据中捕捉到人口的异质性。

Traditional approaches, such as statistical methods, struggle to disentangle social interaction effects from exogenous and correlated influences, a challenge that ABS overcomes by enabling virtual experiments that isolate specific mechanisms (Manski, 2000). Unlike descriptive statistical data analysis, ABS focuses on the generative processes underlying tax compliance, providing a deeper understanding of causality (Hedstro¨m, 2005). Early ABS applications for tax compliance included studies by Mittone and Patelli (2000), Davis et al. (2003), and Bloomquist (2004, 2006), who developed models incorporating heterogeneous agents and probabilistic audits validated against real-world data (Bloomquist, 2006, 2004; Davis et al., 2003; Mittone and Patelli, 2000). Subsequent advancements introduced memory and social imitation, along with autonomous tax inspectors to model compliance with indirect taxes (Antunes et al., 2005).

传统方法，如统计方法，难以将社会交互效应与外生和相关影响区分开来，而基于代理的模拟 (ABS) 通过允许虚拟实验来隔离特定机制，从而克服了这一挑战 (Manski, 2000)。与描述性统计数据分析不同，ABS 专注于税收合规背后的生成过程，提供了对因果关系的更深入理解 (Hedstro¨m, 2005)。早期的 ABS 在税收合规中的应用包括 Mittone 和 Patelli (2000)、Davis 等人 (2003) 以及 Bloomquist (2004, 2006) 的研究，他们开发了包含异质代理和概率审计的模型，并通过真实数据验证 (Bloomquist, 2006, 2004; Davis 等人, 2003; Mittone 和 Patelli, 2000)。随后的进展引入了记忆和社会模仿，以及自主税务检查员来模拟间接税收合规 (Antunes 等人, 2005)。

Moreover, physics-inspired models aimed at replacing particle interactions with behavioral contagion, as seen in the work of Zaklan et al. on tax evasion dynamics (Zaklan et al., 2008). Further, the SIMULFIS model introduced by Noguera et al. (2014) integrates rational choice, fairness concerns, and social contagion, emphasizing the importance of social mechanisms often neglected in deterrence-based theories. SIMULFIS employs a decision algorithm composed of four sequential filters—opportunity, normative, rational choice, and social influence. These filters reflect recent advancements in behavioral social science, moving beyond traditional utility-maximizing functions to incorporate fairness and social influence. Virtual experiments conducted with SIMULFIS revealed that audits are more effective than fines in improving compliance, and that publicizing tax compliance levels can positively influence behavior. Overall, ABS provides a robust tool for understanding tax compliance dynamics, aiding policymakers in designing effective strategies.

此外，Zaklan 等人在逃税动力学研究中提出了一种受物理学启发的模型，旨在用行为传染代替粒子相互作用 (Zaklan et al., 2008)。进一步地，Noguera 等人 (2014) 提出的 SIMULFIS 模型整合了理性选择、公平关注和社会传染，强调了在威慑理论中常被忽视的社会机制的重要性。SIMULFIS 采用了一种由四个连续过滤器组成的决策算法——机会、规范性、理性选择和社会影响。这些过滤器反映了行为社会科学的最新进展，超越了传统的效用最大化函数，纳入了公平和社会影响。利用 SIMULFIS 进行的虚拟实验表明，审计比罚款在提高合规性方面更有效，并且公开税收合规水平可以积极影响行为。总体而言，基于主体的建模 (Agent-Based Modeling, ABS) 为理解税收合规动态提供了强大的工具，帮助政策制定者设计有效的策略。

Recently, ABS has been greatly upgraded with the emergence of data-driven models such as ML and DL models which allowed ABS to have an adaptive behavior which not explicitly defined by the modeler, allowing it to simulate more realistic dynamics (Ciatto et al., 2020; Wang and Usher, 2005). For instance, Jang et al. (2018) used an ABS with agents powered by a DRL-based model to explore traffic flow dynamics for various traffic simulations. Collins et al. (2014) proposed a framework for training deep reinforcement learning models in agent-based price-order-book simulations that yield non-trivial policies under diverse conditions with market impact. Joubert et al. (2022) proposed ABS with agents powered by a reinforcement learning model with memory to simulate street robbery, showing the simulation was able to recreate reported dynamics from the real world.

近期，随着数据驱动模型（如机器学习和深度学习模型）的出现，ABS（基于智能体的仿真）得到了极大升级，使其能够拥有不由建模者明确定义的自适应行为，从而模拟更真实的动态（Ciatto 等，2020；Wang 和 Usher，2005）。例如，Jang 等（2018）使用了一个由基于深度强化学习模型驱动的智能体的 ABS，来探索各种交通模拟中的交通流动态。Collins 等（2014）提出了一个框架，用于在基于智能体的价格订单簿模拟中训练深度强化学习模型，该模型能够在不考虑市场影响的多种条件下产生非平凡策略。Joubert 等（2022）提出了一种由具有记忆的强化学习模型驱动的智能体的 ABS，用于模拟街头抢劫，展示了该模拟能够重现现实世界中报告的动态。

In addition, recent studies focused on the integration of LLMs to ABS to further extend previous simulations’ capabilities (Gau et al., 2024). For instance, Park et al. (2022) developed a system that creates a simulated community consisting of a thousand personas (agents). This system takes the designer’s vision for the community—including its goals, rules, and member personas—and simulates it, generating behaviors such as posting, replying, and even anti-social actions. Extending this work, Gao et al. (2023) created extensive networks with 8,563 and 17,945 agents, designed to simulate social networks centered on the topics of Gender Discrimination and Nuclear Energy, respectively. With a more direct focus on economic dynamics, Li et al. (2023) utilized LLMs for macroeconomic simulation, employing prompt-engineering-driven agents that mimic human decision-making. The authors show that this approach significantly improves the realism of economic simulations compared to rule-based methods or other AI agents. Li et al. (2023) introduced financial trading where agents interact using conversations such that the agents have a layered memory system, debate mechanisms, and individualized trading characters.

此外，近期研究专注于将大语言模型 (LLM) 集成到ABS中，以进一步扩展先前模拟的能力 (Gau et al., 2024) 。例如，Park et al. (2022) 开发了一个系统，该系统可以创建一个由一千个角色 (智能体) 组成的模拟社区。该系统根据设计者对社区的愿景——包括其目标、规则和成员角色——进行模拟，生成诸如发帖、回复甚至反社会行为。扩展这项工作，Gao et al. (2023) 创建了包含 8,563 和 17,945 个智能体的广泛网络，分别设计用于模拟以性别歧视和核能为主题的社交网络。更直接地关注经济动态，Li et al. (2023) 利用大语言模型进行宏观经济模拟，采用提示工程驱动的智能体来模仿人类决策。作者表明，与基于规则的方法或其他AI智能体相比，这种方法显著提高了经济模拟的真实性。Li et al. (2023) 引入了金融交易，其中智能体通过对话进行交互，使得智能体具有分层记忆系统、辩论机制和个性化的交易特征。

In common, these models have three unique properties that control the behavior of the AI agents in the simulation: agentenvironment interface, agents’ personalities, and agent capabilities acquisition (Gau et al., 2024). Below, we briefly discuss the different methods for each one of them with their strength and limitations.

这些模型通常具有三个独特的属性，控制着模拟中AI智能体的行为：智能体环境接口、智能体个性和智能体能力获取 (Gau et al., 2024)。下面，我们将简要讨论每种方法的不同方法及其优势和局限性。

2.4.1 Agents-Environment interface

2.4.1 智能体-环境交互界面

The operational environments define the specific contexts or settings in which the LLM-driven agents deployed and interact, such as the financial market, as an abstract environment, or a settlement, as a physical environment. The Agents-Environment interface describes how agents interact with and perceive their environment. This interface enables agents to understand their surroundings, make decisions, and learn from the results of their actions. These environments can be roughly divided into two main groups: “sandbox” and “realworld”. The sandbox is a virtual environment created by humans, where agents can freely interact and experiment with different actions and strategies. However, in the context of AI agents interacting with each other, a sandbox’s environment definition can be extended to the inner world of the agent where it can strategies as such computing possible actions it may try in the actual simulation’s environment (Ahlgren et al., 2020; Truby et al., 2022). On the other hand, the real world is a real-world environment where agents interact with physical entities and obey real-world physics and constraints. The real world’s level of details and exact rules enforced depends on the context of the simulation commonly balancing between computational power, relevance, and realism (Kadian et al., 2020).

操作环境定义了部署和交互的大语言模型驱动型智能体的具体背景或设置，例如作为抽象环境的金融市场，或作为物理环境的结算场景。智能体-环境接口描述了智能体如何与其环境互动和感知。这个接口使智能体能够理解其周围环境、做出决策并从其行动结果中学习。这些环境大致可以分为两大类：“沙盒”和“现实世界”。沙盒是人类创建的虚拟环境，智能体可以在其中自由互动和实验不同的行动和策略。然而，在智能体相互互动的背景下，沙盒的环境定义可以扩展到智能体的内部世界，在其中它可以策略性地计算它可能在模拟的实际环境中尝试的行动 (Ahlgren等，2020；Truby等，2022)。另一方面，现实世界是一个真实的环境，智能体在其中与物理实体互动并遵守现实世界的物理和约束。现实世界的细节水平和执行的精确规则取决于模拟的上下文，通常需要在计算能力、相关性和真实性之间进行平衡 (Kadian等，2020)。

2.4.2 Agents personality

2.4.2 AI智能体个性

In LLM-powered ABS systems, agents are characterized by their traits, actions, and skills, all designed to achieve specific goals. These agents take on distinct roles within different systems, each role thoroughly described by its characteristics, capabilities, behaviors, and constraints. For example, in business environments, agents are profiled as companies with diverse capabilities and objectives, each influence uniquely the economic’s course. Generally speaking, one can divide the agent personality generation for LLM-powered ABS into three methods: pre-defined, model-generated, and data-derived. For the pre-defined case, agent profiles are explicitly defined by the modeler in a manual fashion (Gau et al., 2024). This method allows a lot of control over the agents’ personalities while limiting the diversity and scale of the simulation due to the time and resources required to apply this method on a large-scale simulation.

在大语言模型驱动的ABS系统中，智能体通过其特性、行动和技能来刻画，这些设计都旨在实现特定目标。这些智能体在不同系统中承担着不同的角色，每个角色都由其特性、能力、行为和约束详细描述。例如，在商业环境中，智能体被描述为具有不同能力和目标的企业，每个企业都以独特的方式影响经济进程。一般而言，大语言模型驱动的ABS中的智能体个性生成可以分为三种方法：预定义、模型生成和数据驱动。在预定义的情况下，建模者以手动方式明确地定义智能体配置文件（Gau等人，2024）。这种方法允许对智能体个性进行大量控制，但由于大规模模拟所需的时间和资源，限制了模拟的多样性和规模。

2.4.3 Agents capabilities acquisition

2.4.3 AI智能体能力获取

Agent capabilities acquisition in LLM-powered ABS systems is crucial for enabling dynamic learning and evolution. This process relies on various types of feedback and strategies for agents to adapt effectively. Feedback is typically textual and can come from the environment, interactions between agents, or pre-defined model, each providing critical information that helps agents understand the impact of their actions and adapt to complex problems (Wang et al., 2023). In some scenarios, no feedback is provided, especially when the focus is on result analysis rather than agent planning. To enhance their capabilities, agents can use memory modules to store and retrieve information from past interactions, self-evolve by modifying their goals and strategies based on feedback and communication logs, or dynamically generate new agents to address specific challenges (Nascimento et al., 2023).

LLM驱动的ABS系统中AI智能体能力获取对于实现动态学习和进化至关重要。这一过程依赖于各种类型的反馈和策略，使AI智能体能够有效适应。反馈通常是文本形式的，可以来自环境、AI智能体之间的交互或预定义模型，每种反馈都提供了关键信息，帮助AI智能体理解其行为的影响并适应复杂问题 (Wang et al., 2023)。在某些场景中，特别是当重点放在结果分析而非AI智能体规划时，可能不会提供反馈。为了增强能力，AI智能体可以使用记忆模块存储和检索过去交互中的信息，根据反馈和通信记录自我进化，修改其目标和策略，或者动态生成新的AI智能体以应对特定挑战 (Nascimento et al., 2023)。

3 Large Language Model Powered Agent-Based Simulation For Informal Economy

3 大语言模型驱动的人工智能体模拟非正规经济

Capturing the entire socio-economic dynamics of a modern monetary-based economy is extremely complex as it requires capturing highly integrated and ever-changing social, political, cultural, and technological dynamics that are reflected by economic activity (Niedzwiedz et al., 2012; Biswas and Nautiyal, 2023; Bouchaud, 2013). As such, in the proposed model we will focus on the minimal number of mechanisms and agent types required to obtain the central economic activity to sustain a relatively stable socio-economic infrastructure.

捕捉一个现代货币经济体的整个社会经济动态极为复杂，因为它需要捕捉由经济活动所反映的高度集成且不断变化的社会、政治、文化和技术动态 (Niedzwiedz et al., 2012; Biswas and Nautiyal, 2023; Bouchaud, 2013)。因此，在提出的模型中，我们将专注于维持相对稳定的社会经济基础设施所需的最少机制和智能体类型。

In this section, we first outline the economic theories that operated as the design motivation for the proposed model. Next, we outline the economic process occurring in the simulation. Finally, we formally define the individuals in the economy as the agents and the government as part of the simulation’s environment. In particular, we present the decision-making process utilized by the two types of agents. Fig. 1 presents a schematic view of the ABS design of the socio-economic dynamics.

在本节中，我们首先概述了作为所提出模型设计动机的经济理论。接着，我们概述了模拟中发生的经济过程。最后，我们将经济中的个体正式定义为AI智能体，将政府定义为模拟环境的一部分。特别是，我们介绍了两种类型AI智能体的决策过程。图 1 展示了社会经济学动态的ABS设计示意图。

Figure 1: A schematic view of the proposed ABS design of the socio-economic dynamics. The economy evolves into a population of individuals and a central government. The individuals get an income and should pay income taxes and participate in buy-sell interactions, and should pay sales taxes. The government collects taxes, as self-reported by the individuals, and uses them to fund both public goods and enforcement. The latter is used to validate the reported taxes and punishes individuals who did not pay taxes fully.

图 1: 所提议的 ABS 设计的社会经济动态示意图。经济演化为由个体和中央政府组成的人群。个体获得收入，应缴纳所得税并参与买卖互动，同时应缴纳销售税。政府收取个体自行申报的税款，并将其用于资助公共产品和执法。后者用于验证申报的税款，并惩罚未完全缴纳税款的个体。

3.1 Design motivation

3.1 设计动机

The source of the monetary-based economy with a central government is to tackle two main phenomenons naturally occurring in resource allocation problems with heterogeneous multi-agent scenarios: double coincidence of wants (Berentsen and Rocheteau, 2003) and provision of public goods and services (Anand, 2004). Namely, societies are agreeing to operate under a monetary-based economy with a central government to benefit from the ability to have a common agreement about the utility of goods while also that a central government can use a portion of their income (i.e., taxes) to provide more utility that each individual in the society could generate independently. According to “classical” economic theory, an informal economy emerges in such socio-economic conditions when individuals in the population agree with the monetary-based economy while disagreeing or exploiting the central government’s role by avoiding paying taxes while still enjoying the utility of public goods (Farhi and Gabaix, 2020; Crocker and Slemrod, 2005).

以中央政府为核心的货币经济体系的起源是为了解决异构多智能体场景中资源分配问题中自然出现的两个主要现象：需求的双方一致性（Berentsen 和 Rocheteau，2003）以及公共产品与服务的提供（Anand，2004）。即，社会同意在以中央政府为核心的货币经济体系下运作，以便从对商品效用的共同协议中获益，同时中央政府可以利用其收入的一部分（即税收）提供比社会中每个个体独立生成更多的效用。根据“古典”经济理论，当人口中的个体同意货币经济体系，但不同意或利用中央政府的角色，通过避免缴税同时仍然享受公共产品效用时，在这种社会经济条件下会出现非正规经济（Farhi 和 Gabaix，2020；Crocker 和 Slemrod，2005）。

Following this line of thought and in order to provide the minimal complexity model that can capture the informal economic activity emergence in terms of tax evasion, one needs to answer the following three questions: First, what actions do individuals in the population perform that are identified as economic-related actions? Second, how are such actions associated with taxation to the government? Third, what utility-modifying goods (causing positive utility) do individuals in the population achieve from the government? These questions are based on a formal economy and do not take into consideration that an informal economy occurs in parallel to a formal one. As such, once an informal economy emerges, a fourth question emerges as well - what mechanisms the government can use to prevent individuals from participating in the informal economy (causing negative utility for the individuals participating in the informal economy)?

按照这一思路，为了提供一个能捕捉到逃税行为中非正规经济活动的最小复杂度模型，需要回答以下三个问题：首先，人群中哪些行为被认定为与经济相关的行为？其次，这些行为如何与政府对税收的管理相关联？第三，人群中的个体从政府那里获得了哪些能提升效用的物品（产生正效用）？这些问题基于正规经济体系，并未考虑到非正规经济与正规经济是同时存在的。因此，一旦非正规经济出现，第四个问题也随之而来——政府可以采取哪些机制来阻止个人参与非正规经济（对参与非正规经济的个人产生负效用）？

Answering these questions is an active field of study with an expediently growing body of work (Inman and Rubinfeld, 1996; Eilat and Zinnes, 2002; Choi and Thum, 2005; Beckert, 2003). For our simulation, we focused on a relatively simplistic configuration. We assume that every individual in the economy receives income from their economic activity and can purchase goods and services accordingly. Corresponding to these two actions, the government can enforce income and sales taxes, respectively. The government uses its tax revenue to produce and supply an abstract utility (public goods) that is heterogeneous to the individuals in the population (Pauly, 1973; Groves and Ledyard, 1977).

回答这些问题是一个活跃的研究领域，相关研究正在迅速增加 (Inman and Rubinfeld, 1996; Eilat and Zinnes, 2002; Choi and Thum, 2005; Beckert, 2003)。在我们的模拟中，我们专注于一个相对简单的配置。我们假设经济中的每个人从他们的经济活动中获得收入，并可以相应地购买商品和服务。对应于这两个行为，政府可以分别征收所得税和销售税。政府利用其税收收入来生产和提供一种对人口中的个体具有异质性的抽象效用（公共品） (Pauly, 1973; Groves and Ledyard, 1977)。

Below, we formalize these ideas into a mathematical framework. Initially, we define the socio-economic environment as the economy using two mechanisms - economic transactions and taxation. In addition, the government’s enforcement and taxation reporting are integrated into the “rational” decision-making process of the government. The population of individuals (agents) is also formalized with their AI-driven decision-making process.

下面，我们将这些想法形式化为一个数学框架。首先，我们使用两种机制——经济交易和税收——来定义社会经济环境。此外，政府的执法和税务报告被整合到政府的“理性”决策过程中。个体（AI智能体）的群体也被形式化，包含他们由AI驱动的决策过程。

3.2 The economy

3.2 经济

The economy is based on two main mechanisms - economic transactions and taxation. For simplicity, economic transactions occur always between one or two agent(s) and are limited to income and buy-sell operations. The income is provided every $\theta_{i}\in\mathbb{N}$ steps in time and in amount $s_{i}\in\mathbb{R}^{+}$ for the $i_{t h}$ individual agent. We assume each agent’s income, if any, is fixed over time. The buy-sell operations occur for a list of goods, $G$ , where each agent has a desire, $d\in\mathbb{N}^{|G|}$ , to buy them. Like the individuals’ incomes, we assume that the prices of goods are constant over time and, given the prices of the goods, the supply satisfies the entire population’s demand for each good (or service). Similarly, the agent’s desire distribution $(d)$ is constant over time. Any monetary transaction is made instantly.

经济基于两种主要机制——经济交易和税收。为简化起见，经济交易总是发生在一个或两个AI智能体之间，仅限于收入和买卖操作。收入在每个 $\theta_{i}\in\mathbb{N}$ 时间步提供给第 $i_{t h}$ 个个体智能体，金额为 $s_{i}\in\mathbb{R}^{+}$。我们假设每个智能体的收入（如果有的话）是固定不变的。买卖操作针对一系列商品 $G$ 进行，每个智能体都有购买这些商品的欲望 $d\in\mathbb{N}^{|G|}$。与个体的收入类似，我们假设商品的价格是固定不变的，并且在给定商品价格的情况下，供应满足每个商品（或服务）的整个群体的需求。同样，智能体的欲望分布 $(d)$ 也是固定不变的。任何货币交易都是即时完成的。

The government collects taxes, and every economic transaction requires self-reporting of the amount of tax that the agent carrying out the activity must pay by law. The report not only includes the fact the transaction occurred but also the selling price, and therefore, the tax amount required to be paid by the agent. In a similar manner, income tax is taken from one’s income, as reported by the agent obtaining the income. The market structure we chose to use follows the assumption that the supply of each product or service is carried out under conditions of perfect competition so that firms’ profits are zero. Simply put, in perfectly competitive markets, firms are considered “price takers”, meaning they accept the market price as given and cannot influence it. This leads to firms producing at a level where price equals both marginal cost and average total cost, resulting in zero economic profit in the long run. This outcome, in theory, is due to the absence of barriers to entry and exit, allowing new firms to enter the market if existing firms are earning positive economic profits, which increases supply and drives prices down until only normal profits remain (Kolmar and Kolmar, 2022; Kreps, 2020).

政府征收税款，每笔经济交易都要求活动执行方按照法律规定自行申报应缴税款。申报不仅包括交易发生的事实，还包括销售价格，从而确定执行方应缴纳的税款金额。同样，所得税是从收入中获得方申报的收入中扣除的。我们选择使用的市场结构遵循的假设是，每种产品或服务的供应都在完全竞争的条件下进行，因此企业的利润为零。简而言之，在完全竞争市场中，企业被视为“价格接受者”，即它们接受市场价格且无法影响它。这导致企业在价格等于边际成本和平均总成本的水平上生产，长期来看经济利润为零。理论上，这种结果是由于没有进入和退出的壁垒，如果现有企业获得正的经济利润，新企业可以进入市场，这增加了供应并压低价格，直到只剩下正常利润 (Kolmar and Kolmar, 2022; Kreps, 2020)。

3.3 Government

3.3 政府

In our simulation, the government is operating as a knowledge-limited, central, and rational agent. The government’s primary objective is to optimize the welfare of the citizens (agents), which is reflected by maximizing the overall lifetime utility of all agents in the economy. By adjusting tax policies, allocating funds to the provision of public goods, and enforcing measures against informal economic activities, the government aims to achieve this objective. We separate the provision of public goods from enforcement actions to prevent tax evasion (which is considered a public good by itself) in order to determine the impact of changes in enforcement policy on the size of the informal economy in the sensitivity analyses we will conduct later.

在我们的模拟中，政府作为一个知识有限、集中且理性的智能体运作。政府的主要目标是优化公民（智能体）的福利，这体现在最大化经济中所有智能体的整体终身效用上。通过调整税收政策、分配资金用于公共产品的提供，以及采取措施打击非正规经济活动，政府旨在实现这一目标。我们将公共产品的提供与防止逃税的执法行动（这本身被视为一种公共产品）分开，以便在后续的敏感性分析中确定执法政策变化对非正规经济规模的影响。

Formally, the government is represented by the following tuple $(m,\mu,\lambda,\nu,\xi)$ where $m\in\mathbb{R}^{+}$ represents the government’s current budget; $\mu$ denotes the sales tax policy, indicating the rate at which tax is applied to goods and services prices within the economy; $\lambda$ denotes the income tax policy; $\nu$ reflects the government’s efficiency in converting tax revenues into public goods; and $\xi$ signifies the enforcement policy of informal economic-related activities. As such, the government’s objective takes the form

正式地，政府由以下元组表示 $(m,\mu,\lambda,\nu,\xi)$ ，其中 $m\in\mathbb{R}^{+}$ 代表政府的当前预算；$\mu$ 表示销售税政策，指示对经济中商品和服务价格征收的税率；$\lambda$ 表示所得税政策；$\nu$ 反映政府将税收转化为公共产品的效率；$\xi$ 表示与非正式经济活动相关的执法政策。因此，政府的目标形式为

where $T<\infty$ is the number of steps in time considered for the simulation, $m(t)$ is the government’s budget at the $t_{t h}$ step in time, $\rho,\in,(0,1)$ is a discount factor, and $u_{a}^{t}$ is a concave, continuous, non-decreasing utility function of the $a\in A$ agent in the population at time $t$ . $u_{a}^{t}$ is a function of $d\in\mathbb{N}^{\kappa}\subseteq G$ that details the list of goods the agents wish to acquire in each $\theta\in\mathbb{N}$ steps in time and for $\kappa\in\mathbb N$ private goods in the economy where $G$ is the list of all private goods in the economy. Each agent’s ability to purchase quantities of private goods is affected by the government’s tax policy (sales tax, $\mu$ , and income tax, $\lambda$ , policies). Furthermore, each agent’s utility function is affected by the quantities of public goods that the government provides, with these quantities being affected by the amount of tax and how these quantities are converted into benefits (utility) for the agents, denoted by $\nu$ .

其中 $T<\infty$ 是模拟中考虑的时间步数，$m(t)$ 是政府在时间步 $t_{t h}$ 的预算，$\rho,\in,(0,1)$ 是折现因子，$u_{a}^{t}$ 是时间 $t$ 时群体中 $a\in A$ 智能体的凹、连续、非递减效用函数。$u_{a}^{t}$ 是 $d\in\mathbb{N}^{\kappa}\subseteq G$ 的函数，详细描述了智能体在每个时间步 $\theta\in\mathbb{N}$ 和在拥有 $\kappa\in\mathbb N$ 种私人物品的经济体中希望获取的物品清单，其中 $G$ 是经济体中所有私人物品的清单。每个智能体购买私人物品的能力受到政府税收政策（销售税 $\mu$ 和所得税 $\lambda$）的影响。此外，每个智能体的效用函数受到政府提供的公共物品数量的影响，这些数量受到税收金额以及这些数量如何转化为智能体收益（效用）$\nu$ 的影响。

To this end, the sales tax policy $\lvert\mu:\mathbb{R}^{+}\to\mathbb{R}^{+})$ takes the form of a percent from the good’s price which is added on top and paid by the buyer agent to the seller agent. It is the responsibility of the seller’s agent to report the tax charged by the buyer’s agent. The sales tax percentage can be arbitrarily large, starting from zero percent, however constant across all the goods in the economy (Keen, 2013).

为此，销售税政策 $\lvert\mu:\mathbb{R}^{+}\to\mathbb{R}^{+})$ 采取从商品价格中提取一定百分比的形式，由买家AI智能体支付给卖家AI智能体。卖家AI智能体负责报告由买家AI智能体收取的税款。销售税百分比可以从零开始，任意大且恒定，适用于经济中的所有商品 (Keen, 2013)。

The income tax $\langle\lambda:\mathbb{R}^{+}\to\mathbb{R}^{+}$ ) can take one of two forms. First, a fixed income tax, which is a percent (ranging from $0%$ to $100%$ ) from each income, as applied in countries like Russia, Czech Republic, and Bulgaria (Ivanova et al., 2005; Vasilev, 2015); and secondly, a progressive income tax system where each “step” in the income is taxed with a different percentage (commonly monotonically increased percent), as applied in countries like Israel, Switzerland, and the United States (Pommerehne and Weck-Hannemann, 1996b; Kopczuk, 2005). For the latter case, the policy is represented by a list of tuples such that the first value indicates the income threshold and the second value is the taxation rate.

个人所得税 ( $\langle\lambda:\mathbb{R}^{+}\to\mathbb{R}^{+}$ ) 可以采取两种形式之一。第一种是固定税率，即对每笔收入按一定比例（从 $0%$ 到 $100%$ 不等）征税，如俄罗斯、捷克共和国和保加利亚等国家所采用 (Ivanova et al., 2005; Vasilev, 2015)；第二种是累进税制，即收入的每个“阶梯”按不同的税率征税（通常为单调递增的税率），如以色列、瑞士和美国等国家所采用 (Pommerehne and Weck-Hannemann, 1996b; Kopczuk, 2005)。对于后者，政策由一组元组表示，其中第一个值表示收入阈值，第二个值表示税率。

The public goods policy, $\nu:\mathbb{R}^{+}\rightarrow\mathbb{R}^{|\chi|}$ , is represented by the amount of money allocated to a list of public goods of size $\chi\in\mathbb N$ , such that each public good have some utility to each agent in the population. Such association is formally presented by a function $\nu_{i}:\mathbb{R}^{+}\to\mathbb{R}^{|A|}$ for the $i_{t h}$ public good and reflects the government’s efficiency in converting tax revenues into public goods. Different public goods have different utility to subsets of the population. For example, adding a road to some cities is very beneficial to the city’s residents, somewhat beneficial for individuals crossing the city, and not beneficial at all for individuals who do not use this road. For realism, we assume a linear utility increase with respect to the amount of funds invested by the government in each public good. Moreover, it is assumed that the utility distribution for each individual in the population is constant and the utility is obtained at each step in time.

公共物品政策，$\nu:\mathbb{R}^{+}\rightarrow\mathbb{R}^{|\chi|}$，通过将资金分配给大小为 $\chi\in\mathbb N$ 的公共物品列表来表示，使得每个公共物品对人口中的每个个体都有一定的效用。这种关联形式通过函数 $\nu_{i}:\mathbb{R}^{+}\to\mathbb{R}^{|A|}$ 来表示第 $i_{t h}$ 个公共物品，并反映了政府在将税收转化为公共物品中的效率。不同的公共物品对不同的人群子集有不同的效用。例如，在某些城市增加一条道路对城市居民非常有益，对穿越城市的个体有一定的益处，而对不使用这条道路的个体则没有益处。为了现实性，我们假设效用随着政府对每个公共物品的投资金额线性增加。此外，假设人口中每个个体的效用分布是恒定的，并且效用在每个时间步中获取。

The enforcement policy (i.e., tax evasion penalty policy) $\xi:\mathbb{R}^{+}\rightarrow A$ is a function that gets funding and returns the portion of the population the government is able to investigate. By investigating an individual, the real amount of taxation the individual should pay over its entire history is revealed. Any delta between the actual amount of taxes an individual paid in taxes compared to the amount the individual should have been to pay is denoted by $\psi$ . An individual with $\psi,>,0$ upon investigation is punished with a linearly proportional rate $\alpha\in\mathbb{R}^{+}$ of money taken while the historical taxes themselves are wavered. We assume that the subset of agents from the population is chosen randomly.

执法政策（即逃税处罚政策） $\xi:\mathbb{R}^{+}\rightarrow A$ 是一个函数，它获取资金并返回政府能够调查的人口比例。通过调查个人，该个人在整个历史中应缴纳的实际税款金额将被揭示。个人实际缴纳的税款金额与应缴金额之间的任何差异用 $\psi$ 表示。在调查时，对于 $\psi,>,0$ 的个人，将按照线性比例 $\alpha\in\mathbb{R}^{+}$ 的金额进行处罚，同时历史税款将被豁免。我们假设从人口中随机选择子集的智能体。

Importantly, all four policies are pre-defined and static over time.

重要的是，所有四个策略都是预先定义的，并且随时间保持静态。

3.4 Individuals

3.4 个体

We assume a fixed-size population of agents $(A)$ . Each agent in the population $(a,\in,A)$ is defined by a timed finite state machine (Al-Saawy et al., 2009) which is formally captured by the tuple $a:=(\beta,\theta,s,d,\zeta,\eta,v,\psi)$ where $\beta\in\mathbb{R}^{+}$ denotes the current amount of money the agent possesses; $\theta\in\mathbb{N}$ indicates the number of simulation steps between two salaries; $s\ \in\ \mathbb{R}^{+}$ indicates the amount of money the agent gets from income. Thus, any gap between the agent’s income (after income taxes) and the total expenditure on purchasing private goods is added to the amount available to the individual in the next period $\beta\in\mathbb{R}^{+}$ (i.e. savings); $d\in\mathbb{N}^{\kappa}\subseteq D$ details the list of goods the agents wishes to acquire in each $\theta\in\mathbb{N}$ steps in time and for $\kappa\in\mathbb N$ private goods in the economy where $D$ is the list of all private goods in the economy of size $K$ ; $\zeta\in\mathbb{R}^{+}$ measures the agent’s propensity to take risks in their economic activities, affecting their economic decisions; $\eta\in\mathbb{N}$ represents the planning horizon in terms of steps in time (indicating how far ahead the agent plans for future economic activities); $\upsilon\in\mathbb{R}^{+}$ indicates the cognitive ability of the agent, represented by the amount of noise the deep reinforcement learning (DRL) model receives during the agent’s learning process, where higher noise levels can simulate lower cognitive ability, leading to less precise decision-making; and $\psi$ is the agent’s personality, as reflected by a free text. Moreover, it is assumed that agents are fully aware of their state and the four government policies. Importantly, both interactions, the income, and the sell-buy are recorded by the agent. Namely, the income transactions are recorded as “Obtained an income $s$ at time $t^{\bullet}$ and the sell-buy transaction as “buy a product for a price $p'$ .

我们假设智能体的固定数量为 $(A)$。每个智能体 $(a,\in,A)$ 由一个时间有限状态机（Al-Saawy 等人，2009）定义，正式表示为元组 $a:=(\beta,\theta,s,d,\zeta,\eta,v,\psi)$，其中 $\beta\in\mathbb{R}^{+}$ 表示智能体当前拥有的资金量；$\theta\in\mathbb{N}$ 表示两次收入之间的模拟步骤数；$s\ \in\ \mathbb{R}^{+}$ 表示智能体从收入中获得的金额。因此，智能体收入（扣除所得税）与购买私人物品的总支出之间的差额将添加到下一期的可用资金量 $\beta\in\mathbb{R}^{+}$ 中（即储蓄）；$d\in\mathbb{N}^{\kappa}\subseteq D$ 详细列出了智能体在每个 $\theta\in\mathbb{N}$ 时间步骤中希望获得的物品列表，以及经济中 $\kappa\in\mathbb N$ 个私人物品，其中 $D$ 是经济中所有私人物品的列表，大小为 $K$；$\zeta\in\mathbb{R}^{+}$ 衡量了智能体在经济活动中的风险偏好，影响其经济决策；$\eta\in\mathbb{N}$ 表示规划的时间步长（表明智能体对未来经济活动的规划时间跨度）；$\upsilon\in\mathbb{R}^{+}$ 表示智能体的认知能力，由深度强化学习（DRL）模型在智能体学习过程中接收的噪声量表示，其中较高的噪声水平可以模拟较低的认知能力，导致决策不够精确；$\psi$ 是智能体的个性，通过自由文本反映。此外，假设智能体完全了解其状态和四项政府政策。重要的是，收入和买卖这两种交互都由智能体记录。即，收入交易记录为“在时间 $t^{\bullet}$ 获得收入 $s$”，买卖交易记录为“以价格 $p'$ 购买产品”。

The Individual’s decision-making process is divided into two: how much income taxes to pay and how much sales tax to pay. In order to perform these two decisions, the agents are provided with a combined LLM and DRL models. The DRL operates as the “rational” mind while the LLM operates as the subconscious of the agent. The agent’s state (including personality), the four government policies, and previous economic interactions (including income, sell-buy, and tax reports) are initially provided to an LLM model which is requested to return the amount of taxes the agent should pay - once for the income and once for sell-buy. The LLM model is based on the LLAMA-2 model, which is considered one of the best-performing open-source LLM models (Touvron et al., 2023). In particular, previous studies show LLAMA-2 produces promising results of economic reasoning, like in the case of these studies (Raman et al., 2024; Yu et al., 2024). Technically, the LLM is queried with a question of how much taxes the agent should report, formalized as

个体的决策过程分为两个部分：缴纳多少所得税和缴纳多少销售税。为了执行这两个决策，智能体被赋予了一个组合的大语言模型和深度强化学习模型（DRL）。DRL 充当“理性”思维，而大语言模型则充当智能体的潜意识。智能体的状态（包括个性）、四项政府政策以及之前的经济互动（包括收入、买卖和税务报告）首先被输入到一个大语言模型中，该模型被要求返回智能体应缴纳的税款金额——一次针对收入，一次针对买卖。大语言模型基于 LLAMA-2 模型，该模型被认为是性能最佳的开源大语言模型之一（Touvron 等人，2023）。特别是，先前的研究表明，LLAMA-2 在经济推理方面取得了令人鼓舞的成果，如这些研究所示（Raman 等人，2024；Yu 等人，2024）。从技术上讲，大语言模型被询问智能体应报告的税款金额，问题被形式化为

Figure 2: A schematic view of the decision process of an individual agent. Income and buy-sell transactions occurring to and by the agent which needs to report and pay taxes to the government. The decision process starts with an LLM which produces an initial suggestion for the amount of taxes the agent should pay by taking into account the agent’s state, personality, historical actions, and government policies. The same information with the LLM’s suggestion and the risk-loving factor is used by the DRL model to produce the final decision.

图 2: 个体智能体的决策过程示意图。收入和买卖交易发生在智能体之间，智能体需要向政府报告并缴纳税款。决策过程从一个大语言模型开始，该模型通过考虑智能体的状态、个性、历史行为和政府政策，生成智能体应缴税款的初步建议。相同的信息与大语言模型的建议和风险偏好因子一起被用于深度强化学习 (DRL) 模型，以生成最终决策。

follows:

如下：

LLM Prompt

大语言模型 (LLM) 提示

What is the amount of taxes I should pay? Make sure to return a single positive number.

我应缴纳的税款金额是多少？请确保返回一个正数。

This information, as well as the inputs to the LLM, is then provided to a DRL model. Specifically, the DRL is based on the Deep QNetwork (DQN) algorithm (Fan et al., 2020). We chose DQN for two main reasons. First, DQN has an off-policy learning mechanism, meaning it can learn from past experiences stored in a replay buffer. This ability to reuse past data makes it more sample-efficient compared to on-policy methods, which discard past data once used. Moreover, the replay buffer allows DQN to break the correlation between consecutive samples by shuffling them, which leads to more stable and efficient learning. Second, DQN uses a target network, which is a delayed copy of the Q-network used to predict target Q-values. This helps to stabilize training by reducing the correlations between the action-value estimates and the target values, mitigating the risk of divergence and making learning more robust. Fig. 2 presents a schematic view of the decision-making process and the feedback loop. A more detailed description of the model’s training and inference is provided in the Appendix.

这些信息以及大语言模型的输入随后被提供给深度强化学习 (DRL) 模型。具体来说，DRL 基于深度 Q 网络 (DQN) 算法 (Fan et al., 2020)。我们选择 DQN 主要有两个原因。首先，DQN 具有离策略学习机制，这意味着它可以从存储在回放缓冲区中的过去经验中学习。与丢弃过去数据的在策略方法相比，这种重用过去数据的能力使其更具样本效率。此外，回放缓冲区通过打乱样本顺序，使 DQN 能够打破连续样本之间的相关性，从而实现更稳定和高效的学习。其次，DQN 使用目标网络，它是 Q 网络的延迟副本，用于预测目标 Q 值。这有助于通过减少动作值估计与目标值之间的相关性来稳定训练，从而降低发散风险并使学习更加稳健。图 2 展示了决策过程和反馈循环的示意图。模型的训练和推理的详细描述见附录。

4 Experiments

4 实验

In this section, we outline the experiments conducted using the proposed model to investigate the emergence of the informal economy and its properties. First, we set the model’s parameters following as closely as possible the socio-economic configuration of the US. Second, we define the evaluation metrics used to evaluate the informal economy size and properties. Finally, we outline the experimental rationale as well as the statistical analysis applied to the simulations’ results.

在本节中，我们概述了使用所提出模型进行的实验，以研究非正式经济的出现及其特性。首先，我们尽可能按照美国社会经济结构设置模型参数。其次，我们定义了用于评估非正式经济规模和特性的评价指标。最后，我们概述了实验原理以及对模拟结果进行的统计分析。

4.1 Model parameters

4.1 模型参数

In order to implement the proposed model, one is required to establish the parameter values and define the government policies. We decided to adopt the case of the US which is considered the leading global economy (Reuveny and Thompson, 2001), comprising an estimated informal economy of approximately $7%$ of its GDP as of $2023^{2}$ .

为了实现所提出的模型，需要设定参数值并定义政府政策。我们决定采用美国这一被视为全球领先经济体的案例 [Reuveny and Thompson, 2001]，截至 2023 年，美国非正式经济约占其 GDP 的 $7%^{2}$。

Income. Based on the 2024 Current Population Survey Annual Social and Economic Supplements (CPS ASEC) conducted by the Census Bureau3, the 2023 household income deciles in the US is presented in Table 1. Thus, the annual income value range in the simulation was determined to be between $^\mathrm{518,980}$ and $^{\S316,100}$ , divided into deciles.

基于美国人口普查局2024年《当前人口调查年度社会和经济补充》(CPS ASEC)数据的美国2023年家庭收入10分位数如表1所示。因此，模拟中的年收入值范围确定为51,898美元至316,100美元，并划分为10分位数。

Table 1: Income at selected percentiles in 2023 dollars, US.

表 1: 2023 年美元收入百分位数

Decile	1	2	3	4	5	6	7	8	9	10
Income (USD)	18,980	33,000	47,910	62,200	80,610	101,000	127,300	165,300	234,900	316,100

Goods. The Bureau of Labor Statistics (BLS) in the US produces the Consumer Price Index (CPI) as a measure of price change faced by consumers. For use alongside the published indexes, BLS publishes the relative importance (RI) of the 204 components in CPI, which is the expenditure weight of an individual component expressed as a percentage of all items within the $\mathrm{{U.S^{4}}}$ . Accordingly, the number of goods in the consumer basket in 2023 is 204 and their normalized prices are as shown in Table 6 in the appendix.

商品。美国劳工统计局 (BLS) 编制了消费者价格指数 (CPI) 作为衡量消费者面临的价格变化的指标。BLS 还发布了 CPI 中 204 个组成部分的相对重要性 (RI)，即每个组成部分的支出权重，表示为所有项目在 $\mathrm{{U.S^{4}}}$ 中的百分比。因此，2023 年消费者篮子中的商品数量为 204 种，其归一化价格如附录中的表 6 所示。

Income tax. In 2023, total federal receipts were $\mathbb{S}4.4$ trillion, about 16.5 percent of gross domestic product (GDP) of the US. The largest sources of revenues are the individual income tax and payroll taxes, followed by the corporate income tax, customs duties, and excise taxes. To cover any shortfalls between revenues and spending, the government issues debt. The federal government collects taxes on the wages and salaries earned by individuals, income from investments (for example, interest, dividends, and capital gains), and other income. Individual income taxes are the largest single source of federal revenues, constituting around one-half of all receipts. As a percentage of GDP, individual income taxes have ranged from 6 to 10 percent over the past 50 years, averaging around 8 percent of GDP. Tax liabilities vary considerably by income. Both employers and employees contribute payroll taxes, also known as social insurance taxes. Payroll taxes are the second-largest component of federal revenues and account for approximately one-third of total tax receipts, or approximately 6 percent of GDP. Payroll taxes help fund Social Security, Medicare, and unemployment insurance. For Social Security, employers and employees each contribute 6.2 percent of every paycheck, up to a maximum amount $\mathcal{S}168{,}600$ in 2024). For Medicare, employers and employees each contribute an additional 1.45 percent, with no income limit. The Affordable Care Act added another 0.91 percent in payroll taxes on earnings over $\mathbb{S}200{,}000$ for individuals or $\mathbb{S}250{,}000$ for couples. Employers also pay the federal unemployment tax, which finances state-run unemployment insurance programs. The government collects taxes on the profits of corporations. In 2022, most corporate income was taxed at 21 percent at the federal level (before adjustments). When combined with state and local corporate taxes, the average statutory tax rate was 25.8 percent, although most corporations pay less than the statutory rate because of exemptions, deductions, and other adjustments to income. Corporate taxes amount to approximately 9.9 percent of all tax revenues, or approximately 1.6 percent of GDP. Taxes on certain goods such as tobacco, alcohol, and motor fuels also contribute to federal revenues. Those excise taxes are imposed at the point of sale and add to the prices that consumers pay for such goods. Revenues from excise taxes amount to approximately 2 percent of all tax revenues, or approximately 0.3 percent of GDP. The government collects revenues from duties and tariffs on imports. Those revenues amount to approximately 2 percent of all tax revenues or approximately 0.3 percent of GDP. Federal revenues that come from other sources — such as estate and gift taxes and the deposit of earnings from the Federal Reserve System, among others — amount to approximately 2 percent of all tax revenues, or approximately 0.3 percent of GDP. In summary, Table 2 presents the federal income tax rates for a single taxpayer in the US for $2023^{5}$ .

所得税。2023年，联邦总收入为 $\mathbb{S}4.4$ 万亿美元，约占美国国内生产总值 (GDP) 的16.5%。收入的最大来源是个人所得税和工资税，其次是企业所得税、关税和消费税。为了弥补收入与支出之间的任何短缺，政府会发行债务。联邦政府对个人赚取的工资和薪金、投资收入（例如利息、股息和资本收益）以及其他收入征税。个人所得税是联邦收入的最大单一来源，约占所有收入的一半。在过去50年中，个人所得税占GDP的比例在6%到10%之间，平均约为GDP的8%。税务责任因收入差异很大。雇主和雇员都需缴纳工资税，也称为社会保险税。工资税是联邦收入的第二大组成部分，约占税收总额的三分之一，或约占GDP的6%。工资税用于资助社会保险、医疗保险和失业保险。对于社会保险，雇主和雇员各缴纳每份工资的6.2%，2024年最高收入为 $\mathcal{S}168{,}600$ 。对于医疗保险，雇主和雇员各额外缴纳1.45%，无收入上限。《平价医疗法案》对个人收入超过 $\mathbb{S}200{,}000$ 或夫妻收入超过 $\mathbb{S}250{,}000$ 的部分额外征收0.91%的工资税。雇主还需缴纳联邦失业税，该税用于资助州立失业保险计划。政府对企业的利润征税。2022年，大多数企业收入在联邦层面按21%的税率征税（调整前）。与州和地方企业所得税结合后，平均法定税率为25.8%，但由于免税、扣除和其他收入调整，大多数企业支付的税率低于法定税率。企业所得税约占所有税收收入的9.9%，或约占GDP的1.6%。某些商品（如烟草、酒精和机动车燃料）的税收也为联邦收入做出贡献。这些消费税在销售时征收，并增加了消费者为这些商品支付的价格。消费税收入约占所有税收收入的2%，或约占GDP的0.3%。政府对进口商品征收关税和税款。这些收入约占所有税收收入的2%，或约占GDP的0.3%。来自其他来源的联邦收入（如遗产税和赠与税，以及联邦储备系统的收益存款等）约占所有税收收入的2%，或约占GDP的0.3%。表 2 总结了2023年美国单身纳税人的联邦所得税税率。

Sales taxes Sales tax in the United States is a consumption-based tax applied at the state and local levels, serving as a significant revenue source for public services such as education, infrastructure, and public safety (Alshira’h et al., 2020). It is governed primarily by state laws, with 45 states and the District of Columbia imposing a statewide sales tax, while five states—Alaska, Delaware, Montana, New Hampshire, and Oregon—do not. Many states allow local governments to levy additional sales taxes, resulting in widely varying combined rates, sometimes exceeding $10%$ . Sales tax generally applies to tangible personal property and selected services, although exemptions for essentials like groceries and prescription drugs are common. Compliance requires businesses to collect sales tax at the point of sale and remit it to the appropriate tax authorities, based on their physical or economic ”nexus” within the state. Use tax complements sales tax for goods purchased tax-free in other jurisdictions but used locally. Following the 2018 Supreme Court decision in South Dakota v. Wayfair, Inc., states gained broader authority to mandate sales tax collection from out-of-state and online retailers, addressing the challenges posed by e-commerce. Tax rates range from $2.9%$ in Colorado to $7.25%$ in California, with significant variation depending on local rates. States may also offer sales tax holidays for specific items like school supplies or energy-efficient appliances, alongside permanent exemptions for certain goods and services. Chapter 7 of the

[论文翻译]基于双重大语言模型和深度强化学习驱动的智能体仿真研究税务规避现象

原文地址：https://arxiv.org/pdf/2501.18177

Investigating Tax Evasion Emergence Using Dual Large Language Model and Deep Reinforcement Learning Powered Agent-based Simulation

基于双重大语言模型和深度强化学习驱动的智能体仿真研究税务规避现象

Abstract

摘要

1 Introduction

1 引言

2 相关工作

2.1 The economic rationale behind tax evasion

2.2 Large language models

2.2 大语言模型

2.3 Deep reinforcement learning

2.3 深度强化学习

2.4 Agent-based simulation

2.4 基于AI智能体的模拟 (Agent-based simulation)

2.4.1 Agents-Environment interface

2.4.1 智能体-环境交互界面

2.4.2 Agents personality

2.4.2 AI智能体个性

2.4.3 Agents capabilities acquisition

2.4.3 AI智能体能力获取

3 Large Language Model Powered Agent-Based Simulation For Informal Economy

3 大语言模型驱动的人工智能体模拟非正规经济

3.1 Design motivation

3.1 设计动机

3.2 The economy

3.2 经济

3.3 Government

3.4 Individuals

follows:

LLM Prompt

大语言模型 (LLM) 提示

4 Experiments

4 实验

4.1 Model parameters

[论文翻译]基于双重大语言模型和深度强化学习驱动的智能体仿真研究税务规避现象

原文地址：https://arxiv.org/pdf/2501.18177

Investigating Tax Evasion Emergence Using Dual Large Language Model and Deep Reinforcement Learning Powered Agent-based Simulation

基于双重大语言模型和深度强化学习驱动的智能体仿真研究税务规避现象

Abstract

摘要

1 Introduction

1 引言

2 Related Work

2 相关工作

2.1 The economic rationale behind tax evasion

2.2 Large language models

2.2 大语言模型

2.3 Deep reinforcement learning

2.3 深度强化学习

2.4 Agent-based simulation

2.4 基于AI智能体的模拟 (Agent-based simulation)

2.4.1 Agents-Environment interface

2.4.1 智能体-环境交互界面

2.4.2 Agents personality

2.4.2 AI智能体个性

2.4.3 Agents capabilities acquisition

2.4.3 AI智能体能力获取

3 Large Language Model Powered Agent-Based Simulation For Informal Economy

3 大语言模型驱动的人工智能体模拟非正规经济

3.1 Design motivation

3.1 设计动机

3.2 The economy

3.2 经济

3.3 Government

3.4 Individuals

follows:

LLM Prompt

大语言模型 (LLM) 提示

4 Experiments

4 实验

4.1 Model parameters