[论文翻译]: 基于大语言模型的算法设计平台


原文地址:https://arxiv.org/pdf/2412.17287v1


: A Platform for Algorithm Design with Large Language Model

: 基于大语言模型的算法设计平台

Abstract

摘要

We introduce LLM4AD, a unified Python platform for algorithm design (AD) with large language models (LLMs). LLM4AD is a generic framework with modular i zed blocks for search methods, algorithm design tasks, and LLM interface. The platform integrates numerous key methods and supports a wide range of algorithm design tasks across various domains including optimization, machine learning, and scientific discovery. We have also designed a unified evaluation sandbox to ensure a secure and robust assessment of algorithms. Additionally, we have compiled a comprehensive suite of support resources, including tutorials, examples, a user manual, online resources, and a dedicated graphical user interface (GUI) to enhance the usage of LLM4AD. We believe this platform will serve as a valuable tool for fostering future development in the merging research direction of LLM-assisted algorithm design.

我们介绍了LLM4AD,一个用于大语言模型(LLMs)算法设计(AD)的统一Python平台。LLM4AD是一个通用框架,包含模块化的搜索方法、算法设计任务和LLM接口。该平台集成了众多关键方法,并支持跨多个领域的广泛算法设计任务,包括优化、机器学习和科学发现。我们还设计了一个统一的评估沙盒,以确保算法的安全和稳健评估。此外,我们编制了一套全面的支持资源,包括教程、示例、用户手册、在线资源和专用的图形用户界面(GUI),以增强LLM4AD的使用。我们相信,该平台将成为促进LLM辅助算法设计这一新兴研究方向未来发展的宝贵工具。

Keywords: Algorithm design, large language models, optimization, machine learning, scientific discovery

关键词:算法设计,大语言模型,优化,机器学习,科学发现

1 Introduction

1 引言

Algorithms are pivotal in solving diverse problems across various fields such as industry, economics, healthcare, and technology (Kleinberg, 2006; Cormen et al., 2022). Traditionally, algorithm design has been a labor-intensive process requiring deep expertise. In the last three years, the use of large language models for algorithm design (LLM4AD) has emerged as a promising research area with the potential to fundamentally transform how algorithms are designed, optimized, and implemented (Liu et al., 2024b). The remarkable capabilities and flexibility of LLMs have shown potential in enhancing the algorithm design process, including performance prediction (Hao et al., 2024), heuristic generation (Liu et al., 2024a), code optimization (Hemberg et al., 2024), and even the creation of new algorithmic concepts (Girotra et al., 2023). This approach not only reduces the human effort required in the design phase but also boosts the creativity and efficiency of the solutions produced (Liu et al., 2024a; Romera-Paredes et al., 2024).

算法在解决工业、经济、医疗和技术等各个领域的多样化问题中起着关键作用 (Kleinberg, 2006; Cormen et al., 2022)。传统上,算法设计是一个需要深厚专业知识的劳动密集型过程。在过去三年中,使用大语言模型进行算法设计 (LLM4AD) 已成为一个具有前景的研究领域,有可能从根本上改变算法的设计、优化和实施方式 (Liu et al., 2024b)。大语言模型的卓越能力和灵活性在增强算法设计过程中显示出潜力,包括性能预测 (Hao et al., 2024)、启发式生成 (Liu et al., 2024a)、代码优化 (Hemberg et al., 2024),甚至新算法概念的创建 (Girotra et al., 2023)。这种方法不仅减少了设计阶段所需的人力投入,还提高了所生成解决方案的创造力和效率 (Liu et al., 2024a; Romera-Paredes et al., 2024)。

Despite the rapid emergence of LLM4AD methods (Liu et al., 2024b) and the expanding range of application domains (Romera-Paredes et al., 2024; Liu et al., 2024a; Ye et al., 2024; Yao et al., 2024b; Guo et al., 2024a,b), this area faces three challenges:

尽管LLM4AD方法(Liu et al., 2024b)迅速涌现,应用领域也在不断扩大(Romera-Paredes et al., 2024; Liu et al., 2024a; Ye et al., 2024; Yao et al., 2024b; Guo et al., 2024a,b),但这一领域仍面临三大挑战:

This paper introduces LLM4AD, a unified Python library for LLM-based algorithm design that addresses these gaps. The platform integrates numerous key methods and supports a wide range of algorithm design tasks across various domains, including optimization, machine learning, and scientific discovery. We have also designed a unified evaluation sandbox to ensure a secure and robust assessment of algorithms. Additionally, we have compiled a comprehensive suite of support resources, including tutorials, examples, a user manual, online resources, and a graphical user interface (GUI) to enhance the usability of LLM4AD. We believe this platform will serve as a valuable tool by fostering usage and comparison in the emerging research direction on LLM-based algorithm design. The code is available at: https://github.com/Optima-CityU/LLM4AD.

本文介绍了LLM4AD,一个用于基于大语言模型(LLM)算法设计的统一Python库,旨在填补这些空白。该平台集成了众多关键方法,并支持跨多个领域的广泛算法设计任务,包括优化、机器学习和科学发现。我们还设计了一个统一的评估沙盒,以确保算法的安全和稳健评估。此外,我们编制了一套全面的支持资源,包括教程、示例、用户手册、在线资源和图形用户界面(GUI),以增强LLM4AD的可用性。我们相信,该平台将成为一个有价值的工具,促进基于大语言模型算法设计这一新兴研究方向的使用和比较。代码可在以下网址获取:https://github.com/Optima-CityU/LLM4AD

2 LLM4AD

2 大语言模型在自动驾驶中的应用 (LLM4AD)

2.1 Framework

2.1 框架

As illustrated in Figure 1, the platform consists of three blocks: 1) Search methods, 2) LLM interface, and 3) Task evaluation interface.

如图 1 所示,该平台由三个模块组成:1) 搜索方法,2) 大语言模型接口,3) 任务评估接口。

• Search methods: We build the pipeline with an iterative search framework, in which a population is maintained and elite algorithms are survived.

• 搜索方法:我们使用迭代搜索框架构建管道,其中维护一个种群并保留精英算法。

– Multiple Objectives: The task of designing algorithms may involve one or more objectives, such as optimizing performance and efficiency. Our approach incorporates both single-objective and multi-objective search methods. – Population Size: In many search methods, e.g., neighbourhood search methods, the population size can be set to one.

– 多目标:设计算法的任务可能涉及一个或多个目标,例如优化性能和效率。我们的方法结合了单目标和多目标搜索方法。
– 种群大小:在许多搜索方法中,例如邻域搜索方法,种群大小可以设置为1。


Figure 1: LLM4AD platform overview.

图 1: LLM4AD 平台概览。

2.2 Usage

2.2 使用

2.2.1 Script Usage

2.2.1 脚本使用

• Run: Run the search process. Logs will be recorded and displayed according to the Profiler settings.

• 运行:运行搜索过程。日志将根据 Profiler 设置进行记录和显示。

One example script is as follows.

一个示例脚本如下。

|     |     |     |
| --- | --- | --- |
|     |     | from llm4ad.task.optimization.online_bin_packing import 0BPEvaluation |
|     |     | from llm4ad.tools.llm.llm_api_https import HttpsApi |
|     |     | from llm4ad.method.eoh import EoH, EoHProfiler |
|     |     |     |
| 5   | def | main(): |
| 6   |     | llm = HttpsApi(host="xxx" |
| 7   |     | key="sk-xxx" |
| 8   |     | model="xxx" |
| 9   |     | timeout=20) |
| 10  |     |     |
|     |     | task = OBPEvaluation() |
| 11  |     |     |
| 12  | method = EoH(llm=llm, |     |
| 13 14 |     | profiler=EoHProfiler(log-dir='logs/eoh', log-style='simple'), |
| 15  |     | evaluation=task, |
| 16  |     | max_sample_nums=20, |
| 17  |     | max_generations=10, |
| 18  |     | pop_size=4, |
| 19  |     | num_samplers=1, |
| 20  |     | num_evaluators=1, |
| 21  |     | debug_mode=False) |
| 22  |     |     |
| 23  | method.run() |     |
| 24  |     |     |
| 25  | if __name__ == '__main__': |     |
| 26  | main() |     |

2.2.2 GUI Usage

2.2.2 GUI 使用

LLM4AD provides an easy-to-use graphical user interface (GUI). Through this GUI, users can easily configure settings, execute experiments, and monitor results without any coding knowledge. This interface simplifies user interaction, making the LLM4AD platform more accessible and easier to use.

LLM4AD 提供了一个易于使用的图形用户界面 (GUI)。通过这个 GUI,用户可以轻松配置设置、执行实验并监控结果,而无需任何编程知识。该界面简化了用户交互,使 LLM4AD 平台更易于访问和使用。

The GUI is launched by executing the run gui.py Python script. As shown in Figure 2, the main window of GUI includes six components: 1): Menu bar; 2): Configuration panel; 3): Results dashboard; 4): Run button; 5): Stop button; 6): Log files button.

通过执行 run_gui.py Python 脚本启动 GUI。如图 2 所示,GUI 的主窗口包括六个组件:1) 菜单栏;2) 配置面板;3) 结果仪表盘;4) 运行按钮;5) 停止按钮;6) 日志文件按钮。

The Menu bar offers quick access to various resources, such as documentation or the website of the LLM4AD platform, through clickable buttons that redirect users to the relevant pages. To conduct experiments via the GUI, users should

菜单栏提供了快速访问各种资源的途径,例如文档或 LLM4AD 平台的网站,通过可点击的按钮将用户重定向到相关页面。要通过 GUI 进行实验,用户应

• Set up LLM interface. Set up the parameters of the LLM interface in the Configuration panel. These parameters include the internet protocol (IP) address of the application programming interface (API) provider, an API key, and the name of the LLM.

• 设置大语言模型 (LLM) 接口。在配置面板中设置大语言模型接口的参数。这些参数包括应用程序编程接口 (API) 提供商的互联网协议 (IP) 地址、API 密钥以及大语言模型的名称。


Figure 2: Graphical user interface (GUI) for LLM4AD.

图 2: LLM4AD 的图形用户界面 (GUI)

• Set up Search method and Algorithm design task. Users can also select the search method and the algorithm design task by clicking. For the chosen method and task, specific parameters such as max samples (the maximum number of LLM invocations) can be configured.

• 设置搜索方法和算法设计任务。用户也可以通过点击选择搜索方法和算法设计任务。对于选定的方法和任务,可以配置特定参数,例如最大样本数(大语言模型调用的最大次数)。

After setting all configurations, the experiment can be started by clicking the Run button. The Results dashboard then displays the experimental results such as the convergence curve of the objective values and the currently best-performing algorithm along with its corresponding objective value. During an experiment, users can stop the process using the Stop button or access detailed experimental results through the Log files button.

设置所有配置后,可以通过点击运行按钮开始实验。结果仪表板随后会显示实验结果,例如目标值的收敛曲线以及当前表现最佳的算法及其对应的目标值。在实验过程中,用户可以使用停止按钮停止进程,或通过日志文件按钮访问详细的实验结果。

The current version of GUI only supports conducting experiments with a single method under a single LLM configuration each time. In the future, we plan to extend the GUI to enable batch experiments.

当前版本的 GUI 每次仅支持在单一的大语言模型 (LLM) 配置下使用单一方法进行实验。未来,我们计划扩展 GUI 以支持批量实验。

2.3 Search Methods

2.3 搜索方法

Search methods are crucial for effective LLM-based algorithm design. Recent studies have shown that standalone LLMs, even when enhanced with various prompt engineering techniques, are often insufficient for many algorithm design tasks (Zhang et al., 2024a). We have integrated a variety of search methods, including simple sampling, commonly used single-objective evolutionary search methods, multi-objective evolutionary search, and various neighborhood searches.

搜索方法对于基于大语言模型 (LLM) 的算法设计至关重要。最近的研究表明,即使通过各种提示工程 (prompt engineering) 技术增强,独立的大语言模型在许多算法设计任务中仍然不足 (Zhang et al., 2024a)。我们整合了多种搜索方法,包括简单采样、常用的单目标进化搜索方法、多目标进化搜索以及各种邻域搜索。

• Single-objective Search

单目标搜索

Multi-objective Search

多目标搜索

– Multi-objective evolutionary search: MEoH (Yao et al., 2024a), NSGA-II (Deb et al., 2002), MOEA/D (Zhang and Li, 2007)

– 多目标进化搜索:MEoH (Yao et al., 2024a), NSGA-II (Deb et al., 2002), MOEA/D (Zhang and Li, 2007)

An abstract base method is provided to modularize the essential format and functions of these methods, maintaining flexibility to facilitate easy extension and implementation of custom search methods by users.

提供了一个抽象基方法,用于模块化这些方法的基本格式和功能,保持灵活性,以便用户轻松扩展和实现自定义搜索方法。

Each method is equipped with three profilers: 1) base profiler, 2) Tensor board profiler, and 3) Weights & Biases (wandb) profiler, to meet diverse user requirements.

每种方法都配备了三种分析器:1) 基础分析器,2) TensorBoard 分析器,以及 3) Weights & Biases (wandb) 分析器,以满足不同用户的需求。

2.4 Evaluation Interface and Tasks

2.4 评估界面与任务

2.4.1 Tasks

2.4.1 任务

As illustrated in Figure 1, LLM4AD is applicable to a broad range of algorithm design domains including

如图 1 所示,LLM4AD 适用于广泛的算法设计领域,包括

• Optimization: combinatorial optimization (Liu et al., 2024a; Ye et al., 2024), continuous optimization, surrogate-based optimization (Yao et al., 2024b). • Machine learning: agent design (Hu et al., 2024), computer vision (Guo et al., 2024a). • Science discovery: biology (Shojaee et al., 2024), chemistry, physics, fluid dynamics (Zhang et al., 2024b) and Feynman Equation (Matsubara et al., 2022). • Others: game theory, mathematics (Romera-Paredes et al., 2024), etc.

• 优化:组合优化 (Liu et al., 2024a; Ye et al., 2024)、连续优化、基于代理的优化 (Yao et al., 2024b)。
• 机器学习:智能体设计 (Hu et al., 2024)、计算机视觉 (Guo et al., 2024a)。
• 科学发现:生物学 (Shojaee et al., 2024)、化学、物理学、流体动力学 (Zhang et al., 2024b) 和费曼方程 (Matsubara et al., 2022)。
• 其他:博弈论、数学 (Romera-Paredes et al., 2024) 等。

As illustrated in Table 1, the platform includes a diverse collection of over 20 tasks (there will be 160+ tasks soon) from various domains such as optimization, machine learning, and scientific discovery. These tasks are quick to evaluate and have clearly defined formulations for easy comparison.

如表 1 所示,该平台包含了来自优化、机器学习和科学发现等多个领域的 20 多个任务(很快将增加到 160 多个任务)。这些任务评估速度快,且具有明确的公式定义,便于比较。

2.4.2 Examples

2.4.2 示例

We also offer a variety set of example algorithm design tasks. These examples are used for 1) demonstrating different settings and 2) showcasing more complex tasks on local algorithm design tasks.

我们还提供了一系列示例算法设计任务。这些示例用于:1) 展示不同的设置;2) 展示本地算法设计任务中更复杂的任务。

2.4.3 Evaluation Sandbox

2.4.3 评估沙盒

A secure evaluation sandbox is provided, enabling the safe and configurable evaluation of generated code. This includes optional optimization s and safety features such as timeout handling and protected division.

提供了一个安全的评估沙箱,支持对生成代码进行安全且可配置的评估。这包括可选的优化和安全功能,例如超时处理和受保护的除法。

Table 1: Algorithm design tasks in LLM4AD.

There are $\mathbf{160+}$ tasks added or being added (marked with $*$ ).

表 1: LLM4AD 中的算法设计任务。

任务类型 任务名称(缩写)
优化 带容量约束的车辆路径问题 (CVRP, 2 任务), 开放车辆路径问题 (OVRP, 2 任务), 在线装箱问题 (OBP, 1 任务), 旅行商问题 (TSP, 2 任务), 带时间窗的车辆路径问题 (VRPTW, 2 任务), 可接受集 (SET, 1 任务), 流水车间调度问题* (FSSP, 2 任务), 进化算法* (EA, 1 任务), 多目标进化算法 (MEA, 1 任务), 最大割问题* (MCP, 1 任务), 背包问题* (MKP, 1 任务), 基于代理的优化 (1 任务)
机器学习 Acrobot (ACRO, 1 任务), Mountain Car (CAR, 1 任务), Moon Lander (ML, 1 任务), Cart Pole (CARP, 1 任务), Mountain Car Continuous* (CARC, 1 任务), Pendulum* (PEN, 1 任务)
科学发现 细菌生长 (BACT, 1 任务), 非线性振荡器 (OSC, 2 任务), 材料应力行为 (MSB, 1 任务), 常微分方程 (ODE, 16 任务), SRSD-Feynman 简单集* (SRSD-E, 30 任务), SRSD-Feynman 中等集* (SRSD-M, 40 任务), SRSD-Feynman 困难集* (SRSD-H, 50 任务)

目前已有或正在添加的任务数量为 $\mathbf{160+}$ 个(标记为 $*$ 的任务)。

2.5 LLM Interface

2.5 大语言模型接口

We have provided a general LLM interface tailored for iterative algorithm search. This interface supports two types of demo interactions:

我们提供了一个专为迭代算法搜索定制的大语言模型接口。该接口支持两种类型的演示交互:

Both interfaces are modular i zed to ensure efficiency and control, with features including parallel processing, time control, and failure detection.

两个接口都进行了模块化设计,以确保效率和控制,功能包括并行处理、时间控制和故障检测。

3 Benchmark Results

3 基准测试结果

3.1 Settings

3.1 设置

We choose four search methods in our platform with consistent benchmark settings. We initialize all compared methods with the respective template algorithm on each problem. Table 2 summarizes the benchmark hyper-parameter settings.

我们在平台上选择了四种搜索方法,并采用一致的基准设置。我们在每个问题上使用各自的模板算法初始化所有比较方法。表 2 总结了基准超参数设置。

We investigate a subset of nine algorithm design tasks provided by our platform, encompassing machine learning, combinatorial optimization, and scientific discovery scenarios. The included tasks are summarized in Table 3.

我们调查了平台提供的九种算法设计任务,涵盖了机器学习、组合优化和科学发现场景。包含的任务总结在表 3 中。

We use a diverse set of eight open-source and closed-source general-purposed LLMs, i.e., Llama-3.1-8B, Yi-34b-Chat, GLM-3-Turbo, Claude-3-Haiku, Doubao-pro-4k, GPT-3.5- Turbo, GPT-4o-Mini, and Qwen-Turbo, and compare their results on nine automated algorithm design tasks.

我们使用了八种开源和闭源的通用大语言模型 (LLM),即 Llama-3.1-8B、Yi-34b-Chat、GLM-3-Turbo、Claude-3-Haiku、Doubao-pro-4k、GPT-3.5-Turbo、GPT-4o-Mini 和 Qwen-Turbo,并在九个自动化算法设计任务上比较了它们的结果。

Table 2: Summary of benchmark settings.

表 2: 基准设置总结

设置描述
最大函数评估次数 (#FE) 种群大小 (用于 EoH) 岛屿数量, 每个提示的样本数量 s (用于 FunSearch) 每个实验的独立运行次数 2,000 10 10, 4
每个算法的最大评估时间 (用于处理无效算法,如无限循环) 50 秒

Table 3: Tested algorithm design tasks.

表 3: 测试的算法设计任务

任务类型 任务名称 (缩写)
机器学习 Acrobot (ACRO), Mountain Car (CAR)
组合优化 Capacitated Vehicle Routing Problem (CVRP), Online Bin Packing (OBP), Traveling Salesman Problem (TSP), Vehicle Routing Problem with Time Window (VRPTW)
科学发现 Bacterial Growth (BACT), Admissible Sets (SET), Nonlinear Oscillators (OSC)

3.2 Results on Different Tasks

3.2 不同任务的结果

Fig. 3 demonstrates the convergence curve of the performance of the top-1 algorithms generated by GPT-4o-Mini. The performance is measured by the objective score on each task. The mean and standard deviation performance over three independent runs are denoted as markers and shaded areas, respectively. We draw observations from the experimental results that:

图 3 展示了由 GPT-4o-Mini 生成的 top-1 算法的性能收敛曲线。性能通过每个任务的目标得分来衡量。三次独立运行的平均性能和标准差分别用标记和阴影区域表示。我们从实验结果中得出以下观察:

Table 4: Overview of the LLMs evaluated in the experiment. We use performance on “HumanEval” and “MMLU” to indicate the capabilities of LLMs on code and general knowledge.

表 4: 实验中评估的大语言模型概览。我们使用“HumanEval”和“MMLU”的表现来指示大语言模型在代码和通用知识方面的能力。

模型 开源 HumanEval MMLU
Llama-3.1-8B (Dubey et al.,2024) 72.6 69.4
Yi-34b-Chat t (Young et al.,2024) 75.2 76.8
GLM-3-Turbo (GLM et al.,2024) 70.1 74.3
Claude-3-Haiku (Anthropic,2024) × 75.9 75.2
Doubao-pro-4k × 73.0 78.0
GPT-3.5-Turbo (Ye et al., 2023) × 60.3 70.0
GPT-4o-Mini (Achiam et al., 2023) X 87.2 82.0
Qwen-Turbo (Yang et al., 2024) × 86.6 86.1

3.3 Results with Different LLMs

3.3 不同大语言模型的结果

Fig. 4 and Fog. 5 compare the performance of various LLMs on three tasks. We can summarize from the results that:

图 4 和图 5 比较了各种大语言模型 (LLM) 在三个任务上的性能。我们可以从结果中总结出:

4 Extensibility

4 可扩展性

4.1 Add New Methods

4.1 添加新方法

LLM4AD platform encourages users to perform further optimization and customization to existing algorithm design methods. Our support for developers can be summarized in two perspectives as follows:

LLM4AD平台鼓励用户对现有算法设计方法进行进一步优化和定制。我们对开发者的支持可以从以下两个角度进行总结:

• LLM4AD has fully open-sourced code implementations for various evolutionary searchbased algorithm design methods for reference. The source code incorporates implementations of various population management strategies such as genetic algorithm (implemented in EoH), island model (implemented in FunSearch), and 1+1 search (implemented in (1+1)-EPS)). In addition, LLM4AD has integrated diverse test cases (algorithm design tasks) and LLMs with unified interfaces, enabling prompt validation and debugging during development.

• LLM4AD 完全开源了各种基于进化搜索的算法设计方法的代码实现,供参考。源代码包含了多种种群管理策略的实现,例如遗传算法(在 EoH 中实现)、岛屿模型(在 FunSearch 中实现)和 1+1 搜索(在 (1+1)-EPS 中实现)。此外,LLM4AD 还集成了多样化的测试用例(算法设计任务)和具有统一接口的大语言模型,便于在开发过程中进行快速验证和调试。


Figure 3: Convergence curve comparison on the performance (measured by fitness score) of the top-1 algorithms generated by GPT-4o-Mini. The performance averaged over three independent runs are denoted with markers (lower the better), while the standard deviations of scores are highlighted with the shaded regions.

图 3: GPT-4o-Mini 生成的 top-1 算法在性能(通过适应度得分衡量)上的收敛曲线对比。三次独立运行的平均性能用标记表示(越低越好),而得分的标准差用阴影区域突出显示。

• LLM4AD provides useful tools and APIs to help code manipulation, secure evaluation, and task definition. Besides, LLM4AD has released elaborated documents $\cdot^{1}$ as well as jupyter-notebooks for each module, aiming to demonstrate and visualize the functionality and effect of each module and each API. We believe this will foster in-depth comprehension of our search and code-manipulating pipeline.

• LLM4AD 提供了有用的工具和 API,帮助进行代码操作、安全评估和任务定义。此外,LLM4AD 还发布了详细的文档 $\cdot^{1}$ 以及每个模块的 jupyter-notebooks,旨在展示和可视化每个模块和每个 API 的功能和效果。我们相信这将促进对我们搜索和代码操作管道的深入理解。


Figure 4: Convergence curve comparison on the performance of the top-1 heuristics generated by various LLMs. The mean score aggregated over three independent runs are denoted with markers (lower the better), while the standard deviations of scores are highlighted with the shaded regions.

图 4: 不同大语言模型生成的 top-1 启发式算法性能的收敛曲线对比。三次独立运行的平均得分用标记表示(越低越好),而得分的标准差用阴影区域突出显示。


Figure 5: Radar plot on the performance of the top-1 algorithm generated by the EoH method using different LLMs. The radius of each vertex is calculated by the normalized fitness value over three independent runs; hence, a smaller radius/enclosed area indicates better performance.

图 5: 使用不同大语言模型生成的 EoH 方法中 top-1 算法性能的雷达图。每个顶点的半径由三次独立运行的归一化适应度值计算得出;因此,半径/封闭区域越小表示性能越好。

4.2 Add New Tasks

4.2 添加新任务

LLM4AD is designed to be easily applicable to customized algorithm design tasks and has unified the evaluation interface for each algorithm design task. There are two major steps to apply LLM4AD to a specified algorithm design task:

LLM4AD 旨在轻松适用于定制化算法设计任务,并为每个算法设计任务统一了评估接口。将 LLM4AD 应用于特定算法设计任务主要分为两个步骤:

• Extend the llm4ad.base.Evaluation interface class and override the evaluate program() method, which defines how to measure the objective score of a searched algorithm. Users can also restrict the timeout seconds during evaluation and perform numba.jit acceleration through setting the corresponding arguments. • Specify an executable template program that comprises Python packages import, a function call with input/output types, a doc-string with the exact meaning of each argument, and a function body to show an example implementation. Since the template program will be assembled to the prompt content in the search process later, providing informative and precise doc-string is therefore required.

• 扩展 llm4ad.base.Evaluation 接口类并重写 evaluate program() 方法,该方法定义了如何衡量搜索算法的目标分数。用户还可以通过设置相应的参数来限制评估期间的超时秒数,并使用 numba.jit 进行加速。
• 指定一个可执行的模板程序,该程序包括 Python 包的导入、带有输入/输出类型的函数调用、包含每个参数确切含义的文档字符串 (doc-string) 以及展示示例实现的函数体。由于模板程序将在后续的搜索过程中组装到提示内容中,因此需要提供信息丰富且精确的文档字符串。

Once the modified evaluation instance is passed into the search method, LLM4AD will automatically invoke the specified evaluation method and perform a secure evaluation (prevent algorithms that may be harmful to the search pipeline, e.g., abort the search or endless loop) of each algorithm code.

一旦修改后的评估实例被传入搜索方法,LLM4AD 将自动调用指定的评估方法,并对每个算法代码进行安全评估(防止可能对搜索管道有害的算法,例如中止搜索或无限循环)。

4.3 Add New LLM Sampler

4.3 添加新的大语言模型采样器

A sampler defines and specifies the method to access an LLM. For instance, users can choose to either query remote LLMs (e.g., GPT-4o) using HTTPS requests, or infer a locally deployed LLM using inference libraries (e.g., transformers, and vLLM ). To increase the extensibility of samplers, LLM4AD defines an interface llm4ad.base.Sampler where the draw sample() function leaves unimplemented. Users are able to customize their sampler by overriding the method and passing the user-defined-sampler instance to a search method.

采样器定义并指定了访问大语言模型 (LLM) 的方法。例如,用户可以选择使用 HTTPS 请求查询远程 LLM(如 GPT-4o),或使用推理库(如 transformers 和 vLLM)推理本地部署的 LLM。为了提高采样器的可扩展性,LLM4AD 定义了一个接口 llm4ad.base.Sampler,其中 draw_sample() 函数未实现。用户可以通过重写该方法并将用户自定义的采样器实例传递给搜索方法来自定义采样器。

5 Conclusion

5 结论

In conclusion, LLM4AD stands as a comprehensive and unified Python platform tailored for the design of algorithms using large language models. It features a generic framework with modular i zed components, including search methods, algorithm design tasks, and an LLM interface, catering to a broad spectrum of domains such as optimization, machine learning, and scientific discovery. The platform is enriched with a robust evaluation sandbox to ensure secure and reliable algorithm assessment, alongside a wealth of support resources like tutorials, examples, a user manual, online resources, and a dedicated GUI. These elements collectively enhance the user experience and utility of LLM4AD. We are confident that LLM4AD will significantly contribute to the advancement and standardization of LLM-based algorithm design, promoting extensive usage and facilitating comparative research in this emerging field. Through these efforts, LLM4AD aims to accelerate innovation and exploration in algorithm design, leveraging the capabilities of large language models.

总之,LLM4AD 是一个专为使用大语言模型设计算法而量身定制的全面且统一的 Python 平台。它具有一个通用框架,包含模块化组件,如搜索方法、算法设计任务和大语言模型接口,适用于优化、机器学习和科学发现等多个领域。该平台配备了一个强大的评估沙盒,以确保算法评估的安全性和可靠性,同时还提供了丰富的支持资源,如教程、示例、用户手册、在线资源和专用 GUI。这些元素共同提升了 LLM4AD 的用户体验和实用性。我们相信,LLM4AD 将极大地推动基于大语言模型的算法设计的进步和标准化,促进其广泛应用,并推动这一新兴领域的比较研究。通过这些努力,LLM4AD 旨在利用大语言模型的能力,加速算法设计的创新和探索。

6 Acknowledgment

6 致谢

Thank Zhiling Mao, Shunyu Yao, Yiming Yao, Ping Guo, and Zhiyuan Yang for their suggestion and help in the development of LLM4AD.

感谢毛志凌、姚顺宇、姚一鸣、郭平和杨志远在 LLM4AD 开发过程中提供的建议和帮助。

References

参考文献

阅读全文(20积分)