[论文翻译]ARTEMIS-DA:面向数据分析多步洞察合成的高级推理与转换引擎


原文地址:https://arxiv.org/pdf/2412.14146v3


ARTEMIS-DA: An Advanced Reasoning and Transformation Engine for Multi-Step Insight Synthesis in Data Analytics

ARTEMIS-DA:面向数据分析多步洞察合成的高级推理与转换引擎


Figure 1: ARTEMIS-DA showcasing $\boldsymbol{\mathcal{Q}}\boldsymbol{I}$ . advanced reasoning for complex queries, $\boldsymbol{\mathcal{Q}}2$ . predictive modeling for text classification, $\boldsymbol{\mathcal{Q}3}$ . data visualization for insights, and $\pmb{Q4}$ . efficient data transformation for manipulating datasets.

图 1: ARTEMIS-DA 展示 $\boldsymbol{\mathcal{Q}}\boldsymbol{I}$ . 复杂查询的高级推理, $\boldsymbol{\mathcal{Q}}2$ . 文本分类的预测建模, $\boldsymbol{\mathcal{Q}3}$ . 洞察数据可视化, 以及 $\pmb{Q4}$ . 高效数据集操作的数据转换。

ABSTRACT

摘要

This paper presents the Advanced Reasoning and Transformation Engine for Multi-Step Insight Synthesis in Data Analytics (ARTEMIS-DA), a novel framework designed to augment Large Language Models (LLMs) for solving complex, multi-step data analytics tasks. ARTEMIS-DA integrates three core components: the Planner, which dissects complex user queries into structured, sequential instructions encompassing data preprocessing, transformation, predictive modeling, and visualization; the Coder, which dynamically generates and executes Python code to implement these instructions; and the Grapher, which interprets generated visualization s to derive actionable insights. By orchestrating the collaboration between these components, ARTEMIS-DA effectively manages sophisticated analytical workflows involving advanced reasoning, multi-step transformations, and synthesis across diverse data modalities. The framework achieves state-of-the-art (SOTA) performance on benchmarks such as WikiTable Questions and TabFact, demonstrating its ability to tackle intricate analytical tasks with precision and adaptability. By combining the reasoning capabilities of LLMs with automated code generation and execution and visual analysis, ARTEMIS-DA offers a robust, scalable solution for multi-step insight synthesis, addressing a wide range of challenges in data analytics.

本文提出了一种用于多步骤数据分析洞察合成的高级推理与转换引擎(ARTEMIS-DA),这是一种旨在增强大语言模型(LLM)解决复杂多步骤数据分析任务能力的新型框架。ARTEMIS-DA整合了三个核心组件:规划器(Planner)将复杂用户查询分解为包含数据预处理、转换、预测建模和可视化的结构化顺序指令;编码器(Coder)动态生成并执行Python语言代码来实现这些指令;图表解析器(Grapher)通过解读生成的可视化结果来获取可操作的见解。通过协调这些组件之间的协作,ARTEMIS-DA能有效管理涉及高级推理、多步骤转换以及跨多种数据模态合成的复杂分析工作流。该框架在WikiTable Questions和TabFact等基准测试中达到了最先进(SOTA)性能,展示了其精确且自适应地处理复杂分析任务的能力。通过将大语言模型的推理能力与自动化代码生成执行及视觉分析相结合,ARTEMIS-DA为多步骤洞察合成提供了一个强大、可扩展的解决方案,解决了数据分析领域的诸多挑战。

1 Introduction

1 引言

The advent of Large Language Models (LLMs), such as GPT-3 [1], GPT-4 [15], and Llama 3 [5], has revolutionized the fields of artificial intelligence (AI) and natural language processing (NLP). These models have demonstrated remarkable capabilities in complex interpretation, reasoning, and generating human-like language, achieving success in diverse applications such as language translation, sum mari z ation, content generation, and questionanswering. Their ability to process and generate coherent text has also made them powerful tools for tasks requiring nuanced understanding and reasoning. However, while LLMs have been extensively explored in these domains, their potential in transforming the field of data analytics remains under utilized. The inherent reasoning and code-generation capabilities of LLMs suggest immense promise for enabling non-technical users to interact with complex datasets using natural language, bridging the gap between advanced analytics and accessibility.

大语言模型(LLM)如GPT-3[1]、GPT-4[15]和Llama 3[5]的出现,彻底改变了人工智能(AI)和自然语言处理(NLP)领域。这些模型在复杂解释、推理和生成类人语言方面展现出卓越能力,在语言翻译、摘要生成、内容创作和问答系统等多样化应用中取得成功。其处理并生成连贯文本的能力,也使它们成为需要细致理解和推理任务的强大工具。然而,尽管大语言模型在这些领域已得到广泛探索,它们在变革数据分析领域的潜力仍未得到充分利用。大语言模型固有的推理和代码生成能力,预示着让非技术用户通过自然语言与复杂数据集交互的巨大前景,从而弥合高级分析与可访问性之间的鸿沟。

Recent efforts, including TableLLM [12], CABINET [18], and Chain-of-Table [22], have begun to explore this intersection. These studies focus on table-based question answering and reasoning tasks, leveraging LLMs to interpret and respond to queries grounded in structured datasets. While these works demonstrate the potential of

近期研究,包括TableLLM [12]、CABINET [18]和Chain-of-Table [22],已开始探索这一交叉领域。这些研究聚焦于基于表格的问答和推理任务,利用大语言模型(LLM)来解析和回应基于结构化数据集的查询。尽管这些工作展现了...

LLMs for data analytics, they primarily address singlestep tasks, leaving multi-step analytical processes—such as complex data transformation, predictive modeling, and visualization—relatively unexplored. These tasks often require sequential reasoning, complex task decomposition, and precise execution across diverse operations, presenting challenges that current frameworks are not yet equipped to manage effectively.

大语言模型在数据分析中的应用主要针对单步任务,而多步骤分析流程(如复杂数据转换、预测建模和可视化)仍相对未被探索。这些任务通常需要顺序推理、复杂任务分解以及跨多样化操作的精准执行,当前框架尚无法有效应对这些挑战。

To address this gap, we propose the Advanced Reasoning and Transformation Engine for Multi-Step Insight Synthesis in Data Analytics (ARTEMIS-DA), a comprehensive framework explicitly designed to unlock the potential of LLMs for advanced multi-step data analytics tasks. ARTEMIS-DA introduces a tri-component architecture consisting of a Planner, a Coder, and a Grapher, each playing a crucial, complementary role. The Planner acts as the framework’s coordinator, interpreting complex user queries and decomposing them into a sequence of structured instructions tailored to the dataset and analytical goals. These instructions encompass tasks such as data pre-processing, transformation, predictive analysis, and visualization. By leveraging the natural language understanding and reasoning capabilities of LLMs, the Planner ensures clarity and structure in task execution, addressing the intricacies of multi-step analytical workflows.

为解决这一空白,我们提出了高级推理与转换引擎(ARTEMIS-DA)——一个专为释放大语言模型在多步骤数据分析任务中的潜力而设计的综合框架。ARTEMIS-DA采用三组件架构:规划器(Planner)、编码器(Coder)和绘图器(Grapher),每个组件都承担着关键且互补的角色。规划器作为框架协调中枢,负责解析复杂用户查询并将其分解为适应数据集和分析目标的结构化指令序列,涵盖数据预处理、转换、预测分析和可视化等任务。通过利用大语言模型的自然语言理解与推理能力,规划器确保多步骤分析工作流执行时的清晰性与结构性。

The Coder, in turn, translates the Planner’s instructions into executable Python code, dynamically generating and executing the necessary operations within a Python environment. Whether performing data transformations, training predictive models, or creating visualization s, the Coder bridges the gap between high-level task specifications and low-level computational execution. This dynamic interplay between the Planner and Coder components enables ARTEMIS-DA to handle analytical tasks autonomously, requiring minimal user intervention.

程序员 (Coder) 将规划器 (Planner) 的指令转化为可执行的 Python语言 代码,在 Python 环境中动态生成并执行必要操作。无论是执行数据转换、训练预测模型还是创建可视化图表,程序员都能弥合高层级任务描述与低层级计算执行之间的鸿沟。规划器与程序员组件间的这种动态交互,使 ARTEMIS-DA 能够以最少用户干预自主处理分析任务。

The Grapher is another key component of ARTEMISDA, analyzing the generated graphs and visualization s to extract valuable insights. By interpreting visual representations of data, the Grapher enables a deeper understanding of the underlying patterns and trends. The Grapher works in close coordination with the Planner and Coder, facilitating the seamless integration of data analysis, visualization, and insight extraction into a unified framework. Together, these components empower ARTEMIS-DA to deliver actionable insights, facilitating natural language interactions with complex datasets and enhancing accessibility for users without programming expertise.

Grapher是ARTEMIS-DA的另一核心组件,通过分析生成的图表和可视化结果来提取有价值的洞见。该组件通过解读数据的视觉呈现,帮助用户更深入地理解底层模式和趋势。Grapher与Planner、Coder紧密协作,将数据分析、可视化和洞见提取无缝整合到统一框架中。这些组件共同使ARTEMIS-DA能够提供可操作的见解,支持用户以自然语言交互方式处理复杂数据集,并降低非编程专业人士的使用门槛。

ARTEMIS-DA’s effectiveness is underscored by its Stateof-the-Art (SOTA) performance on benchmarks such as WikiTable Questions [17] and TabFact [3]. These results highlight its ability to manage nuanced analytical tasks requiring advanced multi-step reasoning. Beyond answering structured dataset queries, ARTEMIS-DA demonstrates versatility in transforming datasets, visualizing results, and conducting predictive modeling, positioning it as a transformative tool in LLM-powered data analytics.

ARTEMIS-DA的有效性通过其在WikiTable Questions [17]和TabFact [3]等基准测试中的最先进(SOTA)性能得到凸显。这些结果彰显了其处理需要高级多步推理的细致分析任务的能力。除了回答结构化数据集查询外,ARTEMIS-DA还展现了在转换数据集、可视化结果以及进行预测建模方面的多功能性,使其成为大语言模型(LLM)驱动数据分析领域的变革性工具。

The remainder of this paper is structured as follows: Section 2 reviews related work on LLM applications in data analytics and automated task decomposition. Section 3 presents the architecture of ARTEMIS-DA, focusing on its components and their respective roles. Section 4 evaluates the framework’s performance on established benchmarks, while Section 5 demonstrates ARTEMIS-DA’s capabilities in generating visualization s and predictive modeling. Finally, Section 6 concludes with insights into future directions for advancing LLM-powered data analytics and further enhancing LLMs for other complex tasks that require multi-step reasoning.

本文其余部分的结构如下:第2节回顾了大语言模型(LLM)在数据分析和自动化任务分解方面的相关研究。第3节介绍ARTEMIS-DA的架构,重点阐述其组件及各自功能。第4节评估该框架在标准基准测试中的表现,第5节则展示ARTEMIS-DA在生成可视化视图和预测建模方面的能力。最后,第6节总结未来发展方向,包括推进基于大语言模型的数据分析技术,以及进一步优化大语言模型以应对其他需要多步推理的复杂任务。

Through the development of ARTEMIS-DA, we contribute a significant advancement to the emerging field of LLM-driven data analytics, offering a fully integrated end-to-end system capable of managing complex, multistep analytical queries with minimal user intervention. This framework aims to redefine the accessibility, efficiency, and scope of data analytics, establishing a new paradigm for LLM-assisted insight synthesis.

通过开发ARTEMIS-DA,我们为大语言模型(LLM)驱动数据分析这一新兴领域做出了重要贡献,提供了一个完全集成的端到端系统,能够以最少的用户干预管理复杂的多步骤分析查询。该框架旨在重新定义数据分析的可访问性、效率和范围,为大语言模型辅助的洞察合成建立新范式。

2 Related Works

2 相关工作

In recent years, Large Language Models (LLMs) have demonstrated promise in addressing complex tasks in natural language processing and reasoning. However, when applied to structured data, such as large tables, unique challenges arise, requiring specialized frameworks and adaptations. Recent studies have made significant strides in enhancing LLMs’ reasoning capabilities with tabular data, providing the foundation for the ARTEMIS-DA framework proposed in this paper.

近年来,大语言模型(LLM)在解决自然语言处理和推理领域的复杂任务方面展现出潜力。然而当应用于结构化数据(如大型表格)时,会面临独特挑战,需要专门的框架和适配方案。最新研究在提升大语言模型处理表格数据的推理能力方面取得重大进展,这为本文提出的ARTEMIS-DA框架奠定了基础。

Tabular Reasoning with Pre-trained Language Models. Traditional approaches with pre-trained language models such as TaBERT[27], TAPAS[7], TAPEX[11], ReasTAP[30], and PASTA[6] were developed to combine free-form questions with structured tabular data. These models achieved moderate success by integrating tablebased and textual training, enhancing tabular reasoning capabilities. However, their ability to generalize under table perturbations remains limited[2, 31]. Solutions such as LETA[31] and LATTICE[21] addressed these limitations using data augmentation and order-invariant attention. However, they require white-box access to models, making them incompatible with SOTA black-box LLMs.

基于预训练语言模型的表格推理。传统方法如TaBERT[27]、TAPAS[7]、TAPEX[11]、ReasTAP[30]和PASTA[6]旨在将自由形式问题与结构化表格数据相结合。这些模型通过整合基于表格和文本的训练,提升了表格推理能力,取得了中等程度的成功。然而,它们在表格扰动下的泛化能力仍然有限[2, 31]。LETA[31]和LATTICE[21]等解决方案通过数据增强和顺序不变注意力机制解决了这些限制,但需要白盒访问模型,因此无法兼容最先进的(SOTA)黑盒大语言模型。

Table-Specific Architectures for LLMs. Table-specific models further refined the use of structured tabular data, emphasizing row and column positioning such as Table Former[25] which introduced positional embeddings to capture table structure, mitigating the impact of structural perturbations. Despite these advances, frameworks like StructGPT[9] highlighted that generic LLMs still struggle with structured data unless enhanced with explicit symbolic reasoning. Recently proposed frameworks such as AutoGPT[20] and Data Copilot[29] began addressing table-specific challenges by incorporating advanced reasoning techniques. However, their performance remains constrained across diverse scenarios due to their reliance on generic programming for task execution.

表 1: 大语言模型的表格专用架构

表格专用模型进一步优化了结构化表格数据的使用,强调行列定位。例如 Table Former[25] 引入了位置嵌入 (positional embeddings) 来捕捉表格结构,从而减轻结构扰动的影响。尽管取得了这些进展,但 StructGPT[9] 等框架指出,通用大语言模型仍难以处理结构化数据,除非通过显式符号推理进行增强。最近提出的框架如 AutoGPT[20] 和 Data Copilot[29] 开始通过整合高级推理技术来解决表格专用挑战,但由于依赖通用编程来执行任务,其性能在不同场景中仍受限。

Noise Reduction in Table-Based Question Answering. Handling noise in large, complex tables is another active research area. CABINET[18] introduced a Content Relevance-Based Noise Reduction strategy, significantly improving LLM performance by employing an Unsupervised Relevance Scorer (URS). By filtering irrelevant information and focusing on content relevant to user queries, CABINET achieved notable accuracy improvements on datasets like WikiTable Questions[17] and FeTaQA[13], underscoring the importance of noise reduction in reliable table-based reasoning.

基于表格问答的噪声消除。处理大型复杂表格中的噪声是另一个活跃的研究领域。CABINET[18]提出了一种基于内容相关性的噪声消除策略,通过采用无监督相关性评分器(URS)显著提升了大语言模型的性能。该方法通过过滤无关信息并聚焦与用户查询相关的内容,在WikiTable Questions[17]和FeTaQA[13]等数据集上实现了显著的准确率提升,印证了噪声消除对可靠表格推理的重要性。

Few-Shot and Zero-Shot Learning for Tabular Reasoning. Few-shot and zero-shot learning methods have shown considerable promise in tabular understanding tasks. Chain-of-Thought (CoT) prompting[23] improved LLMs’ sequential reasoning capabilities. Building on this, studies such as[4, 26] integrated symbolic reasoning into CoT frameworks, enhancing query decomposition and understanding. However, these techniques are not tailored to tabular structures, leading to performance gaps. Chain-of-Table[22] addressed these limitations by introducing an iterative approach to transforming table contexts, enabling more effective reasoning over structured data and achieving state-of-the-art results on benchmarks such as WikiTable Questions[17] and TabFact[3].

少样本和零样本学习在表格推理中的应用。少样本和零样本学习方法在表格理解任务中展现出显著潜力。思维链 (Chain-of-Thought, CoT) 提示技术[23]提升了大语言模型的序列推理能力。基于此,[4, 26]等研究将符号推理融入CoT框架,强化了查询分解与理解能力。然而这些技术未针对表格结构进行优化,导致性能差距。Chain-of-Table[22]通过引入迭代式表格上下文转换方法解决了这些局限,实现对结构化数据更高效的推理,并在WikiTable Questions[17]和TabFact[3]等基准测试中取得最优结果。

Programmatic Approaches to Tabular Question Answering. Leveraging programmatic techniques has significantly advanced the field of table-based question answering by enabling models to interpret and process structured data more effectively. Text-to-SQL models such as TAPEX[11] and OmniTab[10] trained large language models (LLMs) to translate natural language questions into SQL operations, demonstrating the potential for automated interaction with tabular datasets. Despite their promise, these models struggled with noisy and large tables due to limited query comprehension. Programmatically enhanced solutions, such as Binder[4] and LEVER[14], improved performance by generating and verifying SQL or Python code. However, their reliance on single-pass code generation restricted their adaptability to queries requiring dynamic reasoning.

基于编程方法的表格问答
利用编程技术显著推动了基于表格的问答领域发展,使模型能更高效地解析和处理结构化数据。TAPEX[11] 和 OmniTab[10] 等文本转SQL (Text-to-SQL) 模型通过训练大语言模型将自然语言问题转换为SQL操作,展现了与表格数据集自动化交互的潜力。尽管前景广阔,这些模型因查询理解能力有限,在处理噪声多或规模大的表格时表现欠佳。Binder[4] 和 LEVER[14] 等编程增强方案通过生成并验证SQL或Python代码提升了性能,但其依赖单次代码生成的特性限制了应对需要动态推理的查询时的适应性。

Building on these advancements and addressing the limitations of previous models, we introduce the Advanced Reasoning and Transformation Engine for MultiStep Insight Synthesis in Data Analytics (ARTEMISDA). Unlike prior models, ARTEMIS-DA features a tricomponent architecture consisting of a Planner, a Coder, and a Grapher, which collaborative ly address complex, multi-step analytical queries. The Planner generates dynamic task sequences tailored to specific datasets and queries, encompassing data transformation, predictive analysis, and visualization. The Coder translates these sequences into Python code and executes them in real time. Finally, the Grapher extracts actionable insights from the visualization s produced by the Coder, enhancing result inter pre t ability and ensuring the seamless integration of visual insights into user workflows.

基于这些进步并针对先前模型的局限性,我们推出了用于数据分析中多步洞察合成的先进推理与转换引擎(ARTEMISDA)。与之前模型不同,ARTEMIS-DA采用三组件架构,包含规划器(Planner)、编码器(Coder)和绘图器(Grapher),协同处理复杂的多步骤分析查询。规划器生成针对特定数据集和查询定制的动态任务序列,涵盖数据转换、预测分析和可视化。编码器将这些序列转换为Python语言代码并实时执行。最后,绘图器从编码器生成的可视化结果中提取可操作洞察,提升结果可解释性,并确保视觉洞察无缝融入用户工作流程。

This seamless integration of the Planner, Coder, and Grapher components enables ARTEMIS-DA to tackle intricate tasks requiring sequential reasoning, synthesis, and visualization, achieving state-of-the-art performance on benchmarks such as WikiTable Questions[17] and TabFact[3]. ARTEMIS-DA extends LLM functionality in data analytics, providing a robust, automated solution that empowers technical and non-technical users to interact with complex datasets through natural language.

Planner、Coder和Grapher组件的无缝集成使ARTEMIS-DA能够处理需要顺序推理、综合和可视化的复杂任务,在WikiTable Questions[17]和TabFact[3]等基准测试中实现了最先进的性能。ARTEMIS-DA扩展了大语言模型在数据分析中的功能,提供了一个强大的自动化解决方案,使技术用户和非技术用户都能通过自然语言与复杂数据集交互。

3 ARTEMIS-DA Framework

3 ARTEMIS-DA 框架

The Advanced Reasoning and Transformation Engine for Multi-Step Insight Synthesis in Data Analytics (ARTEMIS-DA) is designed to address the challenges of complex data analytics by combining advanced reasoning capabilities with dynamic, real-time code generation, execution and visual analysis. The framework, shown in Figure 2, consists of three core components: the Planner, the Coder and the Grapher. Working in unison, these components decompose complex queries into sequential tasks, automatically generate and execute the required code for each task, and synthesize insights based on generated graphs with minimal user intervention.

高级推理与多步洞察合成引擎(ARTEMIS-DA)通过将先进推理能力与动态实时代码生成、执行及可视化分析相结合,旨在解决复杂数据分析的挑战。如图 2 所示,该框架包含三个核心组件:规划器 (Planner)、编码器 (Coder) 和绘图器 (Grapher)。这些组件协同工作,将复杂查询分解为顺序任务,自动生成并执行每个任务所需的代码,并基于生成的图表以最少用户干预合成洞察。

For the experiments in this paper, all three components utilize the LLaMA 3 70B model[5], which demonstrated superior performance across benchmarks. The Grapher component also employs the LLaMA 3.2 Vision 90B model for understanding generated graphs. However, the framework is model-agnostic and can be adapted to work with other state-of-the-art large language models (LLMs), such as GPT-4[15] or Mixtral-8x7B[8].

本文实验中,所有三个组件均采用LLaMA 3 70B模型[5],该模型在各项基准测试中展现出卓越性能。Grapher组件还额外使用LLaMA 3.2 Vision 90B模型来理解生成的图表。但该框架具有模型无关性,可适配其他前沿大语言模型(如GPT-4[15]或Mixtral-8x7B[8])。

The following sections provide an in-depth exploration of the Planner, Coder and Grapher components, detailing their roles and interactions. Together, they exemplify ARTEMIS-DA’s ability to streamline end-to-end data analytics workflows effectively and efficiently.

以下章节将深入探讨Planner、Coder和Grapher组件,详细说明它们的作用及交互方式。这些组件共同体现了ARTEMIS-DA高效简化端到端数据分析工作流程的能力。

3.1 Planner Component

3.1 规划器组件

The Planner serves as the central reasoning and taskdecomposition unit within the ARTEMIS-DA framework, converting user queries into structured sequences of tasks. Its primary responsibilities include parsing user inputs, organizing them into a logical workflow, and interpreting the outputs of executed code and generated visual insights to guide subsequent steps in the analytical process. The Planner is adept at managing complex, multifaceted queries that require diverse operations, such as data transformation, predictive modeling, and visualization. By leveraging the advanced reasoning capabilities of large language models (LLMs), it decomposes intricate requirements into optimized task sequences aligned with the input data, prior outputs, and specific details of the user’s query, ensuring clarity for seamless execution by the Coder and Grapher as required.

Planner(规划器)作为ARTEMIS-DA框架中的核心推理与任务分解单元,负责将用户查询转换为结构化任务序列。其主要职责包括解析用户输入、将其组织为逻辑工作流,并解释执行代码的输出与生成的可视化洞察以指导分析流程的后续步骤。该组件擅长处理需要多样化操作(如数据转换、预测建模和可视化)的复杂多维查询。通过利用大语言模型(LLM)的高级推理能力,它能将复杂需求分解为与输入数据、先前输出及用户查询细节相匹配的优化任务序列,确保Coder(编码器)和Grapher(图表生成器)按需无缝执行。


Figure 2: ARTEMIS-DA Framework

图 2: ARTEMIS-DA 框架

For instance, when tasked with analyzing sales patterns and forecasting future trends, the Planner systematically outlines essential steps, such as data pre-processing, feature engineering, model training, evaluation and forecasting, organizing them into a coherent workflow. This process is dynamically tailored to the dataset’s properties and the nuances of the query, enabling an adaptive approach to multi-step analytical tasks. Each task specification is then relayed to the Coder or Grapher in sequence, ensuring a smooth and collaborative workflow across the components of the ARTEMIS-DA framework.

例如,当需要分析销售模式并预测未来趋势时,Planner会系统性地列出关键步骤(如数据预处理、特征工程、模型训练、评估与预测),并将其组织成连贯的工作流。该流程会根据数据集特性和查询细节进行动态调整,从而实现对多步骤分析任务的适应性处理。随后,每个任务说明会依次传递给Coder或Grapher,确保ARTEMIS-DA框架各组件间形成流畅的协作工作流。

marization to advanced predictive modeling and visualization. For instance, when the user requests predictive analysis on time-series data, the Planner decomposes the query into multiple simple steps such as splitting the data, training a suitable model, and visualizing the results. The Coder generates and executes code for each step, with the Planner overseeing all steps in the process. This iterative exchange between the Planner and Coder ensures accurate execution of each analytical step, maintaining a continuous feedback loop until the analysis is fully completed.

从摘要到高级预测建模和可视化的全面数据分析任务。例如,当用户请求对时间序列数据进行预测分析时,规划器(Planner)会将查询分解为多个简单步骤,包括数据拆分、训练合适模型和结果可视化。编码器(Coder)为每个步骤生成并执行代码,规划器则监督整个流程中的所有步骤。规划器与编码器之间的这种迭代交互确保了每个分析步骤的准确执行,通过持续反馈循环直至分析完全完成。

3.3 Grapher Component

3.3 图表组件

The Grapher serves as a critical component for deriving actionable insights from visual data, responding to instructions generated by the Planner. Upon receiving a directive to analyze a generated graph from the Planner, the Grapher processes the visual output and provides insights in a structured question-and-answer format. Its analytical capabilities encompass a broad spectrum, ranging from basic data extraction from plots to advanced interpretation and trend analysis of complex plots.

绘图器 (Grapher) 是从视觉数据中提取可操作见解的关键组件,负责响应规划器 (Planner) 生成的指令。当收到分析生成图表的指令时,绘图器会处理视觉输出,并以结构化问答形式提供见解。其分析能力涵盖广泛领域,包括从图表中提取基础数据,到对复杂图表进行高级解读和趋势分析。

For instance, if the Planner requests an analysis of timeseries data, and the Coder generates a corresponding line plot, the Grapher can identify trends, detect anomalies, and highlight significant observations. This feedback enables the Planner to refine its understanding and produce more nuanced insights. Additionally, the Grapher supports a wide variety of graph types, including bar charts, pie charts, scatter plots, and heatmaps, among others, ensuring versatility across diverse data visualization needs. By seamlessly integrating graph interpretation into the workflow, the Grapher enhances the framework’s capacity to provide meaningful, data-driven conclusions.

例如,若规划器 (Planner) 请求分析时间序列数据,而编码器 (Coder) 生成了相应的折线图,绘图器 (Grapher) 便能识别趋势、检测异常并突出关键观察结果。这种反馈使规划器能优化其理解并产生更细致的洞察。此外,绘图器支持多种图表类型,包括柱状图、饼图、散点图和热力图等,确保满足多样化的数据可视化需求。通过将图表解读无缝集成到工作流中,绘图器增强了该框架提供有意义、数据驱动结论的能力。

3.4 Workflow and Interaction between Components

3.4 组件间的工作流程与交互

The ARTEMIS-DA framework employs a systematic workflow to process user queries with precision and efficiency. This process, illustrated in Figure 3, highlights the seamless interaction between the Planner, Coder, and Grapher components as outlined below:

ARTEMIS-DA框架采用系统化工作流程来精准高效地处理用户查询。如图3所示,该流程突出了Planner、Coder与Grapher组件之间的无缝协作,具体如下:

3.2 Coder Component

3.2 编码器组件

Following the Planner’s generation of a structured task sequence, the Coder translates these tasks into executable Python code. It processes instructions for each step—such as loading data, creating visualization s, or training predictive models—producing con textually relevant and functionally precise code tailored to the task’s requirements. Operating in real time within a Python environment, the Coder executes each code snippet, generating intermediate outputs that are fed back to the Planner for further analysis and informed decision-making.

在规划器生成结构化任务序列后,编码器将这些任务转换为可执行的Python语言代码。它处理每个步骤的指令——例如加载数据、创建可视化或训练预测模型——生成与任务需求相匹配、上下文相关且功能精确的代码。编码器在Python语言环境中实时运行,执行每个代码片段,生成中间输出并反馈给规划器进行进一步分析和决策制定。

The Coder’s capabilities encompass a broad range of analytical operations, from basic data cleaning and sum

程序员的能力涵盖广泛的分析操作,从基础的数据清洗和求和

  1. Input: The workflow begins when the user submits a dataset and a natural language query describing their analytical objectives. The Coder starts by generating the Python code to load the dataset and display the column types and the first five rows of the dataset.
  2. 输入:工作流始于用户提交数据集及描述分析目标的自然语言查询。Coder首先生成Python代码加载数据集,并显示列类型及数据集前五行。
  3. Decomposition: The Planner analyzes the query and decomposes it into a structured sequence of tasks, leveraging outputs from df.info() and df.head() to extract actionable context. For instance, a query to compare media categories might involve counting the number of TV shows and movies, creating a pie chart for visualization, and analyzing the generated chart for insights. Each task is methodically assigned to the Coder or Grapher as appropriate.
  4. 分解:Planner分析查询并将其分解为结构化的任务序列,利用df.info()和df.head()的输出提取可操作上下文。例如,比较媒体类别的查询可能涉及统计电视剧和电影的数量、创建饼图进行可视化,以及分析生成的图表以获取洞察。每个任务会被系统性地分配给Coder或Grapher。


Figure 3: Workflow of the ARTEMIS-DA framework showcasing its multi-component collaboration.

图 3: ARTEMIS-DA框架的工作流程,展示其多组件协作。

  1. Execution: The Coder translates the Planner’s instructions into executable Python code. Tasks are executed sequentially, producing intermediate outputs. In the sample query, the Coder generates the code to count the specified categories, generate the pie chart, and prepare the visual output for analysis.
  2. 执行:Coder 将 Planner 的指令转换为可执行的 Python语言 代码。任务按顺序执行,生成中间输出。在示例查询中,Coder 生成用于统计指定类别、生成饼图以及准备可视化输出以进行分析的代码。
  3. Analysis: The Grapher processes the Planner’s instructions to derive insights from generated visuals. In the sample query, the Grapher analyzes the pie chart to calculate and report the proportions of TV shows and movies, presenting insights in a structured format.
  4. 分析:Grapher处理Planner的指令,从生成的视觉内容中提取洞察。在示例查询中,Grapher分析饼图以计算并报告电视节目和电影的比例,以结构化格式呈现洞察结果。
  5. Feedback Loop: Intermediate outputs, such as computed values, visualization s, and insights, are returned to the Planner. The Planner evaluates these results to determine if additional steps are necessary. If further actions are required, new instructions are dynamically generated for the Coder or Grapher, maintaining a responsive feedback loop to achieve the task.
  6. 反馈循环:计算值、可视化和洞察等中间输出结果会返回给规划模块 (Planner)。规划模块评估这些结果以判断是否需要额外步骤。如需进一步操作,则会动态生成新的指令给编码模块 (Coder) 或绘图模块 (Grapher),通过响应式反馈循环持续优化任务执行。
  7. Final iz ation: Once all tasks are completed, the Planner aggregates the results, refines the insights, and compiles the final output. The final result, enhanced with additional insights if necessary, is presented to the user along with a tag, indicating the successful conclusion of the workflow.
  8. 最终定稿:当所有任务完成后,规划器(Planner)会汇总结果、提炼洞察并汇编最终输出。必要时会加入额外洞察进行增强的最终成果将呈现给用户,并附上标签以表明工作流已成功完成。

3.5 Advantages of the ARTEMIS-DA Framework

3.5 ARTEMIS-DA框架的优势

The ARTEMIS-DA framework represents a significant advancement in data analytics by integrating sophisticated natural language understanding with precise computational execution and visual insight synthesis. Its tricomponent architecture—comprising the Planner, Coder, and Grapher—enables the framework to efficiently handle complex, multi-step analytical tasks, all while providing an intuitive interface suitable for users across various levels of technical expertise.

ARTEMIS-DA框架通过将复杂的自然语言理解、精确的计算执行与可视化洞察合成相结合,代表了数据分析领域的重大进步。其三组件架构——包括规划器(Planner)、编码器(Coder)和绘图器(Grapher)——使该框架能够高效处理复杂的多步骤分析任务,同时为不同技术水平的用户提供直观的界面。

ARTEMIS-DA achieves state-of-the-art performance on challenging datasets such as WikiTable Questions[17], TabFact[3], and FeTaQA[13], demonstrating its versatility in extracting comprehensive, data-driven insights. The framework’s ability to dynamically interpret user queries, generate and execute Python code in real-time, and analyze generated visuals further enhances its adaptability and practical value. These features collectively position ARTEMIS-DA as a cutting-edge solution for addressing the modern challenges of data analytics with precision, efficiency, and user-centered design.

ARTEMIS-DA 在 WikiTable Questions[17]、TabFact[3] 和 FeTaQA[13] 等具有挑战性的数据集上实现了最先进的性能,展现了其提取全面数据驱动洞察的通用性。该框架能动态解释用户查询、实时生成并执行 Python语言 代码、分析生成的可视化结果,进一步提升了适应性和实用价值。这些特性共同使 ARTEMIS-DA 成为以精准性、高效性和用户为中心的设计应对现代数据分析挑战的前沿解决方案。

4 Experiments and Evaluation

4 实验与评估

This section provides an overview of the benchmark datasets utilized for evaluation, the metrics employed to compare the performance of the models, the results achieved by the ARTEMIS-DA framework, and a comprehensive analysis of these results.

本节概述了用于评估的基准数据集、比较模型性能的指标、ARTEMIS-DA框架取得的成果以及对这些成果的全面分析。

4.1 Datasets

4.1 数据集

The ARTEMIS-DA framework is evaluated on three table-based reasoning datasets: WikiTable Questions [17], TabFact [3], and FeTaQA [13]. We evaluate ARTEMISDA exclusively on the test sets of these datasets without any training or fine-tuning on the training sets. The details of each dataset are as follows:

ARTEMIS-DA框架在三个基于表格的推理数据集上进行了评估:WikiTable Q