[论文翻译]: 基于大语言模型的算法设计平台


原文地址:https://arxiv.org/pdf/2412.17287v1


: A Platform for Algorithm Design with Large Language Model

: 基于大语言模型的算法设计平台

Abstract

摘要

We introduce LLM4AD, a unified Python platform for algorithm design (AD) with large language models (LLMs). LLM4AD is a generic framework with modular i zed blocks for search methods, algorithm design tasks, and LLM interface. The platform integrates numerous key methods and supports a wide range of algorithm design tasks across various domains including optimization, machine learning, and scientific discovery. We have also designed a unified evaluation sandbox to ensure a secure and robust assessment of algorithms. Additionally, we have compiled a comprehensive suite of support resources, including tutorials, examples, a user manual, online resources, and a dedicated graphical user interface (GUI) to enhance the usage of LLM4AD. We believe this platform will serve as a valuable tool for fostering future development in the merging research direction of LLM-assisted algorithm design.

我们介绍了LLM4AD,一个用于大语言模型(LLMs)算法设计(AD)的统一Python平台。LLM4AD是一个通用框架,包含模块化的搜索方法、算法设计任务和LLM接口。该平台集成了众多关键方法,并支持跨多个领域的广泛算法设计任务,包括优化、机器学习和科学发现。我们还设计了一个统一的评估沙盒,以确保算法的安全和稳健评估。此外,我们编制了一套全面的支持资源,包括教程、示例、用户手册、在线资源和专用的图形用户界面(GUI),以增强LLM4AD的使用。我们相信,该平台将成为促进LLM辅助算法设计这一新兴研究方向未来发展的宝贵工具。

Keywords: Algorithm design, large language models, optimization, machine learning, scientific discovery

关键词:算法设计,大语言模型,优化,机器学习,科学发现

1 Introduction

1 引言

Algorithms are pivotal in solving diverse problems across various fields such as industry, economics, healthcare, and technology (Kleinberg, 2006; Cormen et al., 2022). Traditionally, algorithm design has been a labor-intensive process requiring deep expertise. In the last three years, the use of large language models for algorithm design (LLM4AD) has emerged as a promising research area with the potential to fundamentally transform how algorithms are designed, optimized, and implemented (Liu et al., 2024b). The remarkable capabilities and flexibility of LLMs have shown potential in enhancing the algorithm design process, including performance prediction (Hao et al., 2024), heuristic generation (Liu et al., 2024a), code optimization (Hemberg et al., 2024), and even the creation of new algorithmic concepts (Girotra et al., 2023). This approach not only reduces the human effort required in the design phase but also boosts the creativity and efficiency of the solutions produced (Liu et al., 2024a; Romera-Paredes et al., 2024).

算法在解决工业、经济、医疗和技术等各个领域的多样化问题中起着关键作用 (Kleinberg, 2006; Cormen et al., 2022)。传统上,算法设计是一个需要深厚专业知识的劳动密集型过程。在过去三年中,使用大语言模型进行算法设计 (LLM4AD) 已成为一个具有前景的研究领域,有可能从根本上改变算法的设计、优化和实施方式 (Liu et al., 2024b)。大语言模型的卓越能力和灵活性在增强算法设计过程中显示出潜力,包括性能预测 (Hao et al., 2024)、启发式生成 (Liu et al., 2024a)、代码优化 (Hemberg et al., 2024),甚至新算法概念的创建 (Girotra et al., 2023)。这种方法不仅减少了设计阶段所需的人力投入,还提高了所生成解决方案的创造力和效率 (Liu et al., 2024a; Romera-Paredes et al., 2024)。

Despite the rapid emergence of LLM4AD methods (Liu et al., 2024b) and the expanding range of application domains (Romera-Paredes et al., 2024; Liu et al., 2024a; Ye et al., 2024; Yao et al., 2024b; Guo et al., 2024a,b), this area faces three challenges:

尽管LLM4AD方法(Liu et al., 2024b)迅速涌现,应用领域也在不断扩大(Romera-Paredes et al., 2024; Liu et al., 2024a; Ye et al., 2024; Yao et al., 2024b; Guo et al., 2024a,b),但这一领域仍面临三大挑战:

This paper introduces LLM4AD, a unified Python library for LLM-based algorithm design that addresses these gaps. The platform integrates numerous key methods and supports a wide range of algorithm design tasks across various domains, including optimization, machine learning, and scientific discovery. We have also designed a unified evaluation sandbox to ensure a secure and robust assessment of algorithms. Additionally, we have compiled a comprehensive suite of support resources, including tutorials, examples, a user manual, online resources, and a graphical user interface (GUI) to enhance the usability of LLM4AD. We believe this platform will serve as a valuable tool by fostering usage and comparison in the emerging research direction on LLM-based algorithm design. The code is available at: https://github.com/Optima-CityU/LLM4AD.

本文介绍了LLM4AD,一个用于基于大语言模型(LLM)算法设计的统一Python库,旨在填补这些空白。该平台集成了众多关键方法,并支持跨多个领域的广泛算法设计任务,包括优化、机器学习和科学发现。我们还设计了一个统一的评估沙盒,以确保算法的安全和稳健评估。此外,我们编制了一套全面的支持资源,包括教程、示例、用户手册、在线资源和图形用户界面(GUI),以增强LLM4AD的可用性。我们相信,该平台将成为一个有价值的工具,促进基于大语言模型算法设计这一新兴研究方向的使用和比较。代码可在以下网址获取:https://github.com/Optima-CityU/LLM4AD

2 LLM4AD

2 大语言模型在自动驾驶中的应用 (LLM4AD)

2.1 Framework

2.1 框架

As illustrated in Figure 1, the platform consists of three blocks: 1) Search methods, 2) LLM interface, and 3) Task evaluation interface.

如图 1 所示,该平台由三个模块组成:1) 搜索方法,2) 大语言模型接口,3) 任务评估接口。

• Search methods: We build the pipeline with an iterative search framework, in which a population is maintained and elite algorithms are survived.

• 搜索方法:我们使用迭代搜索框架构建管道,其中维护一个种群并保留精英算法。

– Multiple Objectives: The task of designing algorithms may involve one or more objectives, such as optimizing performance and efficiency. Our approach incorporates both single-objective and multi-objective search methods. – Population Size: In many search methods, e.g., neighbourhood search methods, the population size can be set to one.

– 多目标:设计算法的任务可能涉及一个或多个目标,例如优化性能和效率。我们的方法结合了单目标和多目标搜索方法。
– 种群大小:在许多搜索方法中,例如邻域搜索方法,种群大小可以设置为1。


Figure 1: LLM4AD platform overview.

图 1: LLM4AD 平台概览。

2.2 Usage

2.2 使用

2.2.1 Script Usage

2.2.1 脚本使用

• Run: Run the search process. Logs will be recorded and displayed according to the Profiler settings.

• 运行:运行搜索过程。日志将根据 Profiler 设置进行记录和显示。

One example script is as follows.

一个示例脚本如下。

|     |     |     |
| --- | --- | --- |
|     |     | from llm4ad.task.optimization.online_bin_packing import 0BPEvaluation |
|     |     | from llm4ad.tools.llm.llm_api_https import HttpsApi |
|     |     | from llm4ad.method.eoh import EoH, EoHProfiler |
|     |     |     |
| 5   | def | main(): |
| 6   |     | llm = HttpsApi(host="xxx" |
| 7   |     | key="sk-xxx" |
| 8   |     | model="xxx" |
| 9   |     | timeout=20) |
| 10  |     |     |
|     |     | task = OBPEvaluation() |
| 11  |     |     |
| 12  | method = EoH(llm=llm, |     |
| 13 14 |     | profiler=EoHProfiler(log-dir='logs/eoh', log-style='simple'), |
| 15  |     | evaluation=task, |
| 16  |     | max_sample_nums=20, |
| 17  |     | max_generations=10, |
| 18  |     | pop_size=4, |
| 19  |     | num_samplers=1, |
| 20  |     | num_evaluators=1, |
| 21  |     | debug_mode=False) |
| 22  |     |     |
| 23  | method.run() |     |
| 24  |     |     |
| 25  | if __name__ == '__main__': |     |
| 26  | main() |     |

2.2.2 GUI Usage

2.2.2 GUI 使用

LLM4AD provides an easy-to-use graphical user interface (GUI). Through this GUI, users can easily configure settings, execute experiments, and monitor results without any coding knowledge. This interface simplifies user interaction, making the LLM4AD platform more accessible and easier to use.

LLM4AD 提供了一个易于使用的图形用户界面 (GUI)。通过这个 GUI,用户可以轻松配置设置、执行实验并监控结果,而无需任何编程知识。该界面简化了用户交互,使 LLM4AD 平台更易于访问和使用。

The GUI is launched by executing the run gui.py Python script. As shown in Figure 2, the main window of GUI includes six components: 1): Menu bar; 2): Configuration panel; 3): Results dashboard; 4): Run button; 5): Stop button; 6): Log files button.

通过执行 run_gui.py Python 脚本启动 GUI。如图 2 所示,GUI 的主窗口包括六个组件:1) 菜单栏;2) 配置面板;3) 结果仪表盘;4) 运行按钮;5) 停止按钮;6) 日志文件按钮。

The Menu bar offers quick access to various resources, such as documentation or the website of the LLM4AD platform, through clickable buttons that redirect users to the relevant pages. To conduct experiments via the GUI, users should

菜单栏提供了快速访问各种资源的途径,例如文档或 LLM4AD 平台的网站,通过可点击的按钮将用户重定向到相关页面。要通过 GUI 进行实验,用户应

• Set up LLM interface. Set up the parameters of the LLM interface in the Configuration panel. These parameters include the internet protocol (IP) address of the application programming interface (API) provider, an API key, and the name of the LLM.

• 设置大语言模型 (LLM) 接口。在配置面板中设置大语言模型接口的参数。这些参数包括应用程序编程接口 (API) 提供商的互联网协议 (IP) 地址、API 密钥以及大语言模型的名称。


Figure 2: Graphical user interface (GUI) for LLM4AD.

图 2: LLM4AD 的图形用户界面 (GUI)

• Set up Search method and Algorithm design task. Users can also select the search method and the algorithm design task by clicking. For the chosen method and task, specific parameters such as max samples (the maximum number of LLM invocations) can be configured.

• 设置搜索方法和算法设计任务。用户也可以通过点击选择搜索方法和算法设计任务。对于选定的方法和任务,可以配置特定参数,例如最大样本数(大语言模型调用的最大次数)。

After setting all configurations, the experiment can be started by clicking the Run button. The Results dashboard then displays the experimental results such as the convergence curve of the objective values and the currently best-performing algorithm along with its corresponding objective value. During an experiment, users can stop the process using the Stop button or access detailed experimental results through the Log files button.

设置所有配置后,可以通过点击运行按钮开始实验。结果仪表板随后会显示实验结果,例如目标值的收敛曲线以及当前表现最佳的算法及其对应的目标值。在实验过程中,用户可以使用停止按钮停止进程,或通过日志文件按钮访问详细的实验结果。

The current version of GUI only supports conducting experiments with a single method under a single LLM configuration each time. In the future, we plan to extend the GUI to enable batch experiments.

当前版本的 GUI 每次仅支持在单一的大语言模型 (LLM) 配置下使用单一方法进行实验。未来,我们计划扩展 GUI 以支持批量实验。

2.3 Search Methods

2.3 搜索方法

Search methods are crucial for effective LLM-based algorithm design. Recent studies have shown that standalone LLMs, even when enhanced with various prompt engineering techniques, are often insufficient for many algorithm design tasks (Zhang et al., 2024a). We have integrated a variety of search methods, including simple sampling, commonly used single-objective evolutionary search methods, multi-objective evolutionary search, and various neighborhood searches.

搜索方法对于基于大语言模型 (LLM) 的算法设计至关重要。最近的研究表明,即使通过各种提示工程 (prompt engineering) 技术增强,独立的大语言模型在许多算法设计任务中仍然不足 (Zhang et al., 2024a)。我们整合了多种搜索方法,包括简单采样、常用的单目标进化搜索方法、多目标进化搜索以及各种邻域搜索。

• Single-objective Search

单目标搜索

Multi-objective Search

多目标搜索

– Multi-objective evolutionary search: MEoH (Yao et al., 2024a), NSGA-II (Deb et al., 2002), MOEA/D (Zhang and Li, 2007)

– 多目标进化搜索:MEoH (Yao et al., 2024a), NSGA-II (Deb et al., 2002), MOEA/D (Zhang and Li, 2007)

An abstract base method is provided to modularize the essential format and functions of these methods, maintaining flexibility to facilitate easy extension and implementation of custom search methods by users.

提供了一个抽象基方法,用于模块化这些方法的基本格式和功能,保持灵活性,以便用户轻松扩展和实现自定义搜索方法。

Each method is equipped with three profilers: 1) base profiler, 2) Tensor board profiler, and 3) Weights & Biases (wandb) profiler, to meet diverse user requirements.

每种方法都配备了三种分析器:1) 基础分析器,2) TensorBoard 分析器,以及 3) Weights & Biases (wandb) 分析器,以满足不同用户的需求。

2.4 Evaluation Interface and Tasks

2.4 评估界面与任务

2.4.1 Tasks

2.4.1 任务

As illustrated in Figure 1, LLM4AD is applicable to a broad range of algorithm design domains including

如图 1 所示,LLM4AD 适用于广泛的算法设计领域,包括

• Optimization: combinatorial optimization (Liu et al., 2024a; Ye et al., 2024), continuous optimization, surrogate-based optimization (Yao et al., 2024b). • Machine learning: agent design (Hu et al., 2024), computer vision (Guo et al., 2024a). • Science discovery: biology (Shojaee et al., 2024), chemistry, physics, fluid dynamics (Zhang et al., 2024b) and Feynman Equation (Matsubara et al., 2022). • Others: game theory, mathematics (Romera-Paredes et al., 2024), etc.

• 优化:组合优化 (Liu et al., 2024a; Ye et al., 2024)、连续优化、基于代理的优化 (Yao et al., 2024b)。
• 机器学习:智能体设计 (Hu et al., 2024)、计算机视觉 (Guo et al., 2024a)。
• 科学发现:生物学 (Shojaee et al., 2024)、化学、物理学、流体动力学 (Zhang et al., 2024b) 和费曼方程 (Matsubara et al., 2022)。
• 其他:博弈论、数学 (Romera-Paredes et al., 2024) 等。

As illustrated in Table 1, the platform includes a diverse collection of over 20 tasks (there will be 160+ tasks soon) from various domains such as optimization, machine learning, and scientific discovery. These tasks are quick to evaluate and have clearly defined formulations for easy comparison.

如表 1 所示,该平台包含了来自优化、机器学习和科学发现等多个领域的 20 多个任务(很快将增加到 160 多个任务)。这些任务评估速度快,且具有明确的公式定义,便于比较。

2.4.2 Examples

2.4.2 示例

We also offer a variety set of example algorithm design tasks. These examples are used for 1) demonstrating different settings and 2) showcasing more complex tasks on local algorithm design tasks.

我们还提供了一系列示例算法设计任务。这些示例用于:1) 展示不同的设置;2) 展示本地算法设计任务中更复杂的任务。

2.4.3 Evaluation Sandbox

2.4.3 评估沙盒

A secure evaluation sandbox is provided, enabling the safe and configurable evaluation of generated code. This includes optional optimization s and safety features such as timeout handling and protected division.

提供了一个安全的评估沙箱,支持对生成代码进行安全且可配置的评估。这包括可选的优化和安全功能,例如超时处理和受保护的除法。

Table 1: Algorithm design tasks in LLM4AD.

There are $\mathbf{160+}$ tasks added or being added (marked with $*$ ).

表 1: LLM4AD 中的算法设计任务。

任务类型 任务名称(缩写)
优化 带容量约束的车辆路径问题 (CVRP, 2 任务), 开放车辆路径问题 (OVRP, 2 任务), 在线装箱问题 (OBP, 1 任务), 旅行商问题 (TSP, 2 任务), 带时间窗的车辆路径问题 (VRPTW, 2 任务), 可接受集 (SET, 1 任务), 流水车间调度问题* (FSSP, 2 任务), 进化算法* (EA, 1 任务), 多目标进化算法 (MEA, 1 任务), 最大割问题* (MCP, 1 任务), 背包问题* (MKP, 1 任务), 基于代理的优化 (1 任务)
机器学习 Acrobot (ACRO, 1 任务), Mountain Car (CAR, 1 任务), Moon Lander (ML, 1 任务), Cart Pole (CARP, 1 任务), Mountain Car Continuous* (CARC, 1 任务), Pendulum* (PEN, 1 任务)
科学发现 细菌生长 (BACT, 1 任务), 非线性振荡器 (OSC, 2 任务), 材料应力行为 (MSB, 1 任务), 常微分方程 (ODE, 16 任务), SRSD-Feynman 简单集* (SRSD-E, 30 任务), SRSD-Feynman 中等集* (SRSD-M, 40 任务), SRSD-Feynman 困难集* (SRSD-H, 50 任务)

目前已有或正在添加的任务数量为 $\mathbf{160+}$ 个(标记为 $*$ 的任务)。

2.5 LLM Interface

2.5 大语言模型接口

We have provided a general LLM interface tailored for iterative algorithm search. This interface supports two types of demo interactions:

我们提供了一个专为迭代算法搜索定制的大语言模型接口。该接口支持两种类型的演示交互:

Both interfaces are modular i zed to ensure efficiency and control, with features including parallel processing, time control, and failure detection.

两个接口都进行了模块化设计,以确保效率和控制,功能包括并行处理、时间控制和故障检测。

3 Benchmark Results

3 基准测试结果

3.1 Settings

3.1 设置

We choose four search methods in our platform with consistent benchmark settings. We initialize all compared methods with the respective template algorithm on each problem. Table 2 summarizes the benchmark hyper-parameter settings.

我们在平台上选择了四种搜索方法,并采用一致的基准设置。我们在每个问题上使用各自的模板算法初始化所有比较方法。表 2 总结了基准超参数设置。

We investigate a subset of nine algorithm design tasks provided by our platform, encompassing machine learning, combinatorial optimization, and scientific discovery scenarios. The included tasks are summarized in Table 3.

我们调查了平台提供的九种算法设计任务,涵盖了机器学习、组合优化和科学发现场景。包含的任务总结在表 3 中。