MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking

MyGO Multiplex CoT: 通过双重思维链实现大语言模型自我反思的方法

Abstract

摘要

Recent advancements in large language models (LLMs) have demonstrated their impressive abilities in various reasoning and decision-making tasks. However, the quality and coherence of the reasoning process can still benefit from enhanced introspection and selfreflection. In this paper, we introduce Multiplex CoT (Chain of Thought), a method that enables LLMs to simulate a form of self-review while reasoning, by initiating double Chain of Thought (CoT) thinking. Multiplex CoT leverages the power of iterative reasoning, where the model generates an initial chain of thought and subsequently critiques and refines this reasoning with a second round of thought generation. This recursive approach allows for more coherent, logical, and robust answers, improving the overall decision-making process. We demonstrate how this method can be effectively implemented using simple prompt engineering in existing LLM architectures, achieving an effect similar to that of the Learning-Refinement Model (LRM) without the need for additional training. Additionally, we present a practical guide for implementing the method in Google Colab, enabling easy integration into real-world applications.

近年来，大语言模型(LLM)在各种推理和决策任务中展现出令人印象深刻的能力。然而，推理过程的质量和连贯性仍可通过增强自省(selfreflection)得到提升。本文提出Multiplex CoT(思维链)方法，通过启动双重思维链(Chain of Thought, CoT)思考，使大语言模型在推理过程中模拟自我审视机制。该方法利用迭代推理的力量：模型首先生成初始思维链，随后通过第二轮思考对该推理过程进行批判性修正。这种递归方法能产生更连贯、更符合逻辑且更稳健的答案，从而优化整体决策流程。我们展示了如何通过简单的提示工程(prompt engineering)在现有大语言模型架构中有效实现该方法，无需额外训练即可达到类似学习-精炼模型(Learning-Refinement Model, LRM)的效果。此外，我们还提供了在Google Colab中实现该方法的实用指南，便于实际应用集成。

1 Introduction

1 引言

Large language models (LLMs) have revolutionized natural language processing (NLP) by excelling in tasks ranging from translation to text generation. However, these models often struggle with producing coherent, logical reasoning when faced with complex decision-making scenarios. One of the key limitations of LLMs is their inability to critically reflect on their own thought process, which can lead to inconsistencies and errors in the final output. While recent research has explored methods for improving reasoning in LLMs, including Chain of Thought (CoT) reasoning and fine-tuning approaches, there is still room for improvement in terms of the model’s ability to refine and critique its own reasoning.

大语言模型(LLM)通过从翻译到文本生成等任务的卓越表现，彻底改变了自然语言处理(NLP)领域。然而，面对复杂的决策场景时，这些模型往往难以产生连贯、逻辑严密的推理。大语言模型的关键局限之一在于无法批判性地反思自身的思维过程，这可能导致最终输出的不一致性和错误。尽管近期研究探索了包括思维链(CoT)推理和微调方法在内的改进方案，但模型在完善和批判自身推理能力方面仍有提升空间。

In this paper, we propose Multiplex CoT, a novel method for enhancing LLM reasoning by prompting the model to perform a self-reflection process. The technique involves generating an initial CoT and then initiating a second round of reasoning, which critiques and refines the initial chain of thought. By employing this iterative process, the model can simulate a form of self-review, leading to more coherent and logical outputs. Importantly, this method does not require additional training but instead utilizes a simple prompt engineering approach, making it easy to implement in existing LLM architectures.

本文提出了一种名为Multiplex CoT的新方法，通过促使大语言模型进行自我反思过程来增强其推理能力。该技术首先生成初始思维链(CoT)，随后启动第二轮推理对初始思维链进行批判性修正。通过这种迭代过程，模型能够模拟自我审查机制，从而产生更连贯且符合逻辑的输出。值得注意的是，该方法无需额外训练，仅采用简单的提示工程(prompt engineering)策略，可轻松集成到现有大语言模型架构中。

2 Background

2 背景

2.1 Chain of Thought (CoT) Reasoning

2.1 思维链 (Chain of Thought, CoT) 推理

Chain of Thought (CoT) reasoning has been proposed as a technique to improve the logical coherence of LLM outputs. The method involves prompting the model to produce a step-bystep sequence of thoughts, which guides the reasoning process and helps the model arrive at more

思维链 (Chain of Thought, CoT) 推理作为一种提升大语言模型输出逻辑连贯性的技术被提出。该方法通过提示模型生成逐步推理的思维序列，从而引导推理过程，帮助模型得出更

accurate conclusions. CoT has been shown to significantly improve performance in tasks that require complex reasoning, such as mathematical problem-solving and commonsense reasoning.

准确结论。思维链 (CoT) 已被证明能显著提升需要复杂推理的任务表现，例如数学解题和常识推理。

2.2 学习优化模型 (LRM)

Learning-Refinement Models (LRM) aim to improve model performance by iterative ly refining the outputs through multiple training steps. These models typically involve a feedback loop where the initial predictions are revised based on some form of error analysis or critique. While LRM-based approaches have proven effective in certain contexts, they often require additional training and fine-tuning, which can be computationally expensive and time-consuming.

学习优化模型 (Learning-Refinement Models, LRM) 旨在通过多次训练步骤迭代优化输出结果来提升模型性能。这类模型通常包含一个反馈循环，即根据某种形式的错误分析或评估对初始预测进行修正。尽管基于LRM的方法在某些场景下已被证明有效，但它们通常需要额外的训练和微调，这可能带来高昂的计算成本和时间消耗。

2.3 Self-Reflection in AI

2.3 AI中的自我反思

Self-reflection is a cognitive process in which an agent reviews its own reasoning to identify errors or inconsistencies. While traditional LLMs are not equipped for self-reflection, recent work in meta-learning and reinforcement learning has explored ways to enable models to reflect on their actions. This line of research has shown promise in improving decision-making processes, particularly in scenarios where error correction or refinement is crucial.

自我反思是一种认知过程，AI智能体通过回顾自身推理来识别错误或不一致。虽然传统大语言模型不具备自我反思能力，但近期元学习和强化学习领域的研究探索了让模型反思其行为的方法。这一研究方向在改进决策过程方面展现出潜力，尤其在错误修正或优化至关重要的场景中。

3 Multiplex CoT: A Double Chain of Thought Approach

3 Multiplex CoT: 双重思维链方法

Multiplex CoT combines the benefits of CoT reasoning with a self-reflection mechanism. The process is outlined as follows:

多路复用思维链 (Multiplex CoT) 结合了思维链推理与自省机制的优势，其流程如下：

This two-phase process mimics human-like self-reflection, where the first phase involves generating ideas, and the second phase focuses on evaluating and refining those ideas. The method is designed to work seamlessly with existing LLM architectures, without requiring any changes to the underlying model parameters.

这个两阶段过程模拟了人类自我反思的方式，第一阶段负责生成想法，第二阶段专注于评估和改进这些想法。该方法设计为与现有大语言模型架构无缝协作，无需修改底层模型参数。

3.1 Prompt Engineering

3.1 提示工程 (Prompt Engineering)

To implement Multiplex CoT, we utilize a simple prompt engineering technique. By structuring the input prompt to request both the initial reasoning and a follow-up critique, the model is able to generate and refine its reasoning within the same inference cycle. The prompt is designed to encourage the model to "think twice" about its initial response, leading to better overall performance.

为实现多重思维链 (Multiplex CoT)，我们采用了一种简单的提示工程技巧。通过构建输入提示来同时请求初始推理和后续批判，模型能够在同一推理周期内生成并优化其推理过程。该提示设计旨在鼓励模型对其初始回答进行"二次思考"，从而提升整体表现。

3.2 Example

3.2 示例

Consider the following example: Prompt:

考虑以下示例：提示：

Please solve the following problem: What is the capital of France? First, generate a Chain of Thought for how you would arrive at the answer. Then, review your answer and critique it. If you find any inconsistencies or errors, correct them and provide the final answer.

请解决以下问题：法国的首都是哪里？首先，生成一个思考链来说明如何得出答案。然后，检查你的答案并进行评价。如果发现任何不一致或错误，请纠正并提供最终答案。

The model might respond with: Step 1 (Initial CoT):

模型可能会回应：步骤1（初始思维链）：

4 Mathematical Analysis of Multiplex CoT: Refining the Reasoning Process

4 多重思维链(CoT)的数学分析：推理过程优化

To mathematically validate the effectiveness of Multiplex CoT in improving the reasoning quality of large language models (LLMs), we will introduce formal definitions for the concepts of logical consistency, coherence, and error correction rate. These metrics provide a quantitative way to assess the impact of self-reflection on reasoning quality.

为了从数学上验证多重思维链 (Multiplex CoT) 在提升大语言模型推理质量方面的有效性，我们将对逻辑一致性、连贯性和纠错率等概念进行形式化定义。这些指标为量化评估自我反思对推理质量的影响提供了依据。

4.1 Logical Consistency and Coherence

4.1 逻辑一致性与连贯性

The key advantage of Multiplex CoT lies in its ability to iterative ly improve the reasoning process by reviewing and refining the initial output. We define logical consistency as the number of valid logical connections between consecutive reasoning steps. If $s_{i}$ represents the $i$ -th step in the Chain of Thought, and $\mathbb{I}(s_{i},s_{i+1})$ is an indicator function that returns 1 if there is a logical connection between $s_{i}$ and $s_{i+1}$ , then the logical consistency $C$ for a single Chain of Thought is:

Multiplex CoT的关键优势在于其能通过审查和优化初始输出，迭代改进推理过程。我们将逻辑一致性定义为连续推理步骤间有效逻辑连接的数量。若$s_{i}$表示思维链(Chain of Thought)中的第$i$步，且$\mathbb{I}(s_{i},s_{i+1})$是指示函数（当$s_{i}$与$s_{i+1}$存在逻辑连接时返回1），则单条思维链的逻辑一致性$C$为：

$$
C_{\mathrm{CoT}}=\sum_{i=1}^{n-1}\mathbb{I}(s_{i},s_{i+1})
$$

where $n$ is the total number of steps in the reasoning chain. A higher value of $\zeta_{\mathrm{CoT}}$ indicates that the reasoning steps are logically consistent and connected.

其中 $n$ 是推理链中的总步数。$\zeta_{\mathrm{CoT}}$ 值越高，表明推理步骤在逻辑上越一致且连贯。

When applying Multiplex CoT, a second round of reasoning is conducted, which critiques and refines the initial reasoning. We define the coherence of the reasoning process as the degree of alignment between the initial and refined reasoning steps. The coherence $H$ can be quantified as:

应用多路复用思维链 (Multiplex CoT) 时，会进行第二轮推理，对初始推理进行批判和优化。我们将推理过程的一致性定义为初始推理步骤与优化后步骤之间的匹配程度。一致性 $H$ 可量化为:

$$
H=\frac{\sum_{i=1}^{n}{\mathbb{I}(s_{i},s_{i}^{\mathrm{refined}})}}{n}
$$

where $s_{i}^{\mathrm{refined}}$ is the corresponding statement in the refined reasoning chain, and $\mathbb{I}(s_{i},s_{i}^{\mathrm{refined}})$ is 1 if the statement in the second round is consistent with the original reasoning. Coherence measures how well the second round of reasoning preserves the logic of the initial thought while refining it.

其中 $s_{i}^{\mathrm{refined}}$ 是精炼推理链中的对应陈述，当第二轮陈述与原始推理一致时 $\mathbb{I}(s_{i},s_{i}^{\mathrm{refined}})$ 取值为1。连贯性衡量第二轮推理在精炼过程中对初始思路逻辑的保持程度。

The overall improvement in reasoning due to Multiplex CoT can be defined as:

由于Multiplex CoT带来的整体推理提升可定义为：

$$
{\mathrm{Improvement in Reasoning~Quality}}={\frac{C_{\mathrm{Refined}}-C_{\mathrm{CoT}}}{C_{\ma

[论文翻译]MyGO Multiplex CoT: 通过双重思维链实现大语言模型自我反思的方法

原文地址：https://arxiv.org/pdf/2501.13117v1