MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking
MyGO Multiplex CoT: 通过双重思维链实现大语言模型自我反思的方法
Abstract
摘要
Recent advancements in large language models (LLMs) have demonstrated their impressive abilities in various reasoning and decision-making tasks. However, the quality and coherence of the reasoning process can still benefit from enhanced introspection and selfreflection. In this paper, we introduce Multiplex CoT (Chain of Thought), a method that enables LLMs to simulate a form of self-review while reasoning, by initiating double Chain of Thought (CoT) thinking. Multiplex CoT leverages the power of iterative reasoning, where the model generates an initial chain of thought and subsequently critiques and refines this reasoning with a second round of thought generation. This recursive approach allows for more coherent, logical, and robust answers, improving the overall decision-making process. We demonstrate how this method can be effectively implemented using simple prompt engineering in existing LLM architectures, achieving an effect similar to that of the Learning-Refinement Model (LRM) without the need for additional training. Additionally, we present a practical guide for implementing the method in Google Colab, enabling easy integration into real-world applications.
近年来,大语言模型(LLM)在各种推理和决策任务中展现出令人印象深刻的能力。然而,推理过程的质量和连贯性仍可通过增强自省(selfreflection)得到提升。本文提出Multiplex CoT(思维链)方法,通过启动双重思维链(Chain of Thought, CoT)思考,使大语言模型在推理过程中模拟自我审视机制。该方法利用迭代推理的力量:模型首先生成初始思维链,随后通过第二轮思考对该推理过程进行批判性修正。这种递归方法能产生更连贯、更符合逻辑且更稳健的答案,从而优化整体决策流程。我们展示了如何通过简单的提示工程(prompt engineering)在现有大语言模型架构中有效实现该方法,无需额外训练即可达到类似学习-精炼模型(Learning-Refinement Model, LRM)的效果。此外,我们还提供了在Google Colab中实现该方法的实用指南,便于实际应用集成。
1 Introduction
1 引言
Large language models (LLMs) have revolutionized natural language processing (NLP) by excelling in tasks ranging from translation to text generation. However, these models often struggle with producing coherent, logical reasoning when faced with complex decision-making scenarios. One of the key limitations of LLMs is their inability to critically reflect on their own thought process, which can lead to inconsistencies and errors in the final output. While recent research has explored methods for improving reasoning in LLMs, including Chain of Thought (CoT) reasoning and fine-tuning approaches, there is still room for improvement in terms of the model’s ability to refine and critique its own reasoning.
大语言模型(LLM)通过从翻译到文本生成等任务的卓越表现,彻底改变了自然语言处理(NLP)领域。然而,面对复杂的决策场景时,这些模型往往难以产生连贯、逻辑严密的推理。大语言模型的关键局限之一在于无法批判性地反思自身的思维过程,这可能导致最终输出的不一致性和错误。尽管近期研究探索了包括思维链(CoT)推理和微调方法在内的改进方案,但模型在完善和批判自身推理能力方面仍有提升空间。
In this paper, we propose Multiplex CoT, a novel method for enhancing LLM reasoning by prompting the model to perform a self-reflection process. The technique involves generating an initial CoT and then initiating a second round of reasoning, which critiques and refines the initial chain of thought. By employing this iterative process, the model can simulate a form of self-review, leading to more coherent and logical outputs. Importantly, this method does not require additional training but instead utilizes a simple prompt engineering approach, making it easy to implement in existing LLM architectures.
本文提出了一种名为Multiplex CoT的新方法,通过促使大语言模型进行自我反思过程来增强其推理能力。该技术首先生成初始思维链(CoT),随后启动第二轮推理对初始思维链进行批判性修正。通过这种迭代过程,模型能够模拟自我审查机制,从而产生更连贯且符合逻辑的输出。值得注意的是,该方法无需额外训练,仅采用简单的提示工程(prompt engineering)策略,可轻松集成到现有大语言模型架构中。
2 Background
2 背景
2.1 Chain of Thought (CoT) Reasoning
2.1 思维链 (Chain of Thought, CoT) 推理
Chain of Thought (CoT) reasoning has been proposed as a technique to improve the logical coherence of LLM outputs. The method involves prompting the model to produce a step-bystep sequence of thoughts, which guides the reasoning process and helps the model arrive at more
思维链 (Chain of Thought, CoT) 推理作为一种提升大语言模型输出逻辑连贯性的技术被提出。该方法通过提示模型生成逐步推理的思维序列,从而引导推理过程,帮助模型得出更
accurate conclusions. CoT has been shown to significantly improve performance in tasks that require complex reasoning, such as mathematical problem-solving and commonsense reasoning.
准确结论。思维链 (CoT) 已被证明能显著提升需要复杂推理的任务表现,例如数学解题和常识推理。
2.2 Learning-Refinement Models (LRM)
2.2 学习优化模型 (LRM)
Learning-Refinement Models (LRM) aim to improve model performance by iterative ly refining the outputs through multiple training steps. These models typically involve a feedback loop where the initial predictions are revised based on some form of error analysis or critique. While LRM-based approaches have proven effective in certain contexts, they often require additional training and fine-tuning, which can be computationally expensive and time-consuming.
学习优化模型 (Learning-Refinement Models, LRM) 旨在通过多次训练步骤迭代优化输出结果来提升模型性能。这类模型通常包含一个反馈循环,即根据某种形式的错误分析或评估对初始预测进行修正。尽管基于LRM的方法在某些场景下已被证明有效,但它们通常需要额外的训练和微调,这可能带来高昂的计算成本和时间消耗。
2.3 Self-Reflection in AI
2.3 AI中的自我反思
Self-reflection is a cognitive process in which an agent reviews its own reasoning to identify errors or inconsistencies. While traditional LLMs are not equipped for self-reflection, recent work in meta-learning and reinforcement learning has explored ways to enable models to reflect on their actions. This line of research has shown promise in improving decision-making processes, particularly in scenarios where error correction or refinement is crucial.
自我反思是一种认知过程,AI智能体通过回顾自身推理来识别错误或不一致。虽然传统大语言模型不具备自我反思能力,但近期元学习和强化学习领域的研究探索了让模型反思其行为的方法。这一研究方向在改进决策过程方面展现出潜力,尤其在错误修正或优化至关重要的场景中。
3 Multiplex CoT: A Double Chain of Thought Approach
3 Multiplex CoT: 双重思维链方法
Multiplex CoT combines the benefits of CoT reasoning with a self-reflection mechanism. The process is outlined as follows:
多路复用思维链 (Multiplex CoT) 结合了思维链推理与自省机制的优势,其流程如下:
This two-phase process mimics human-like self-reflection, where the first phase involves generating ideas, and the second phase focuses on evaluating and refining those ideas. The method is designed to work seamlessly with existing LLM architectures, without requiring any changes to the underlying model parameters.
这个两阶段过程模拟了人类自我反思的方式,第一阶段负责生成想法,第二阶段专注于评估和改进这些想法。该方法设计为与现有大语言模型架构无缝协作,无需修改底层模型参数。
3.1 Prompt Engineering
3.1 提示工程 (Prompt Engineering)
To implement Multiplex CoT, we utilize a simple prompt engineering technique. By structuring the input prompt to request both the initial reasoning and a follow-up critique, the model is able to generate and refine its reasoning within the same inference cycle. The prompt is designed to encourage the model to "think twice" about its initial response, leading to better overall performance.
为实现多重思维链 (Multiplex CoT),我们采用了一种简单的提示工程技巧。通过构建输入提示来同时请求初始推理和后续批判,模型能够在同一推理周期内生成并优化其推理过程。该提示设计旨在鼓励模型对其初始回答进行"二次思考",从而提升整体表现。
3.2 Example
3.2 示例
Consider the following example: Prompt:
考虑以下示例:提示:
Please solve the following problem: What is the capital of France? First, generate a Chain of Thought for how you would arrive at the answer. Then, review your answer and critique it. If you find any inconsistencies or errors, correct them and provide the final answer.
请解决以下问题:法国的首都是哪里?首先,生成一个思考链来说明如何得出答案。然后,检查你的答案并进行评价。如果发现任何不一致或错误,请纠正并提供最终答案。
The model might respond with: Step 1 (Initial CoT):
模型可能会回应:步骤1(初始思维链):
Step 2 (Review and Refinement):
步骤 2 (Review and Refinement):
4 Mathematical Analysis of Multiplex CoT: Refining the Reasoning Process
4 多重思维链(CoT)的数学分析:推理过程优化
To mathematically validate the effectiveness of Multiplex CoT in improving the reasoning quality of large language models (LLMs), we will introduce formal definitions for the concepts of logical consistency, coherence, and error correction rate. These metrics provide a quantitative way to assess the impact of self-reflection on reasoning quality.
为了从数学上验证多重思维链 (Multiplex CoT) 在提升大语言模型推理质量方面的有效性,我们将对逻辑一致性、连贯性和纠错率等概念进行形式化定义。这些指标为量化评估自我反思对推理质量的影响提供了依据。
4.1 Logical Consistency and Coherence
4.1 逻辑一致性与连贯性
The key advantage of Multiplex CoT lies in its ability to iterative ly improve the reasoning process by reviewing and refining the initial output. We define logical consistency as the number of valid logical connections between consecutive reasoning steps. If $s_{i}$ represents the $i$ -th step in the Chain of Thought, and $\mathbb{I}(s_{i},s_{i+1})$ is an indicator function that returns 1 if there is a logical connection between $s_{i}$ and $s_{i+1}$ , then the logical consistency $C$ for a single Chain of Thought is:
Multiplex CoT的关键优势在于其能通过审查和优化初始输出,迭代改进推理过程。我们将逻辑一致性定义为连续推理步骤间有效逻辑连接的数量。若$s_{i}$表示思维链(Chain of Thought)中的第$i$步,且$\mathbb{I}(s_{i},s_{i+1})$是指示函数(当$s_{i}$与$s_{i+1}$存在逻辑连接时返回1),则单条思维链的逻辑一致性$C$为:
$$
C_{\mathrm{CoT}}=\sum_{i=1}^{n-1}\mathbb{I}(s_{i},s_{i+1})
$$
$$
C_{\mathrm{CoT}}=\sum_{i=1}^{n-1}\mathbb{I}(s_{i},s_{i+1})
$$
where $n$ is the total number of steps in the reasoning chain. A higher value of $\zeta_{\mathrm{CoT}}$ indicates that the reasoning steps are logically consistent and connected.
其中 $n$ 是推理链中的总步数。$\zeta_{\mathrm{CoT}}$ 值越高,表明推理步骤在逻辑上越一致且连贯。
When applying Multiplex CoT, a second round of reasoning is conducted, which critiques and refines the initial reasoning. We define the coherence of the reasoning process as the degree of alignment between the initial and refined reasoning steps. The coherence $H$ can be quantified as:
应用多路复用思维链 (Multiplex CoT) 时,会进行第二轮推理,对初始推理进行批判和优化。我们将推理过程的一致性定义为初始推理步骤与优化后步骤之间的匹配程度。一致性 $H$ 可量化为:
$$
H=\frac{\sum_{i=1}^{n}{\mathbb{I}(s_{i},s_{i}^{\mathrm{refined}})}}{n}
$$
where $s_{i}^{\mathrm{refined}}$ is the corresponding statement in the refined reasoning chain, and $\mathbb{I}(s_{i},s_{i}^{\mathrm{refined}})$ is 1 if the statement in the second round is consistent with the original reasoning. Coherence measures how well the second round of reasoning preserves the logic of the initial thought while refining it.
其中 $s_{i}^{\mathrm{refined}}$ 是精炼推理链中的对应陈述,当第二轮陈述与原始推理一致时 $\mathbb{I}(s_{i},s_{i}^{\mathrm{refined}})$ 取值为1。连贯性衡量第二轮推理在精炼过程中对初始思路逻辑的保持程度。
The overall improvement in reasoning due to Multiplex CoT can be defined as:
由于Multiplex CoT带来的整体推理提升可定义为:
$$
{\mathrm{Improvement in Reasoning~Quality}}={\frac{C_{\mathrm{Refined}}-C_{\mathrm{CoT}}}{C_{\mathrm{CoT}}}}\times100
$$
$$
{\mathrm{Improvement in Reasoning~Quality}}={\frac{C_{\mathrm{Refined}}-C_{\mathrm{CoT}}}{C_{\mathrm{CoT}}}}\times100
$$
where $C_{\mathrm{Refined}}$ is the logical consistency score after refinement. This metric quantifies the improvement as a percentage, reflecting the gain in reasoning quality.
其中 $C_{\mathrm{Refined}}$ 表示精炼后的逻辑一致性分数。该指标以百分比形式量化改进程度,反映推理质量的提升。
4.2 Error Correction Rate
4.2 纠错率
One of the primary benefits of Multiplex CoT is its ability to correct errors in the initial reasoning chain during the second round of thought generation. We define the error correction rate $E_{\mathrm{corr}}$ as the proportion of errors identified and corrected in the second round of reasoning. Let $E_{\mathrm{initial}}$ represent the number of errors in the initial chain of thought, and $E$ corrected represent the number of errors corrected during the review. The error correction rate can be calculated as:
Multiplex CoT 的主要优势之一在于其能够在第二轮思考生成过程中纠正初始推理链中的错误。我们将纠错率 $E_{\mathrm{corr}}$ 定义为第二轮推理中识别并纠正的错误比例。设 $E_{\mathrm{initial}}$ 表示初始思维链中的错误数量,$E$ corrected 表示审查过程中纠正的错误数量。纠错率计算公式为:
$$
E_{\mathrm{corr}}=\frac{E_{\mathrm{corrected}}}{E_{\mathrm{initial}}}\times100
$$
$$
E_{\mathrm{corr}}=\frac{E_{\mathrm{corrected}}}{E_{\mathrm{initial}}}\times100
$$
A higher value of $E_{\mathrm{corr}}$ indicates that Multiplex CoT is effective at identifying and rectifying mistakes made during the first round of reasoning, leading to a more accurate final output.
$E_{\mathrm{corr}}$ 值越高,表明 Multiplex CoT 能有效识别并纠正首轮推理中的错误,从而得到更准确的最终输出。
4.3 Iterative Refinement and its Impact on Error Correction
4.3 迭代优化及其对纠错的影响
To further analyze the impact of iterative refinement, we introduce a recursive function for reasoning quality across multiple rounds. Let $C^{(k)}$ denote the logical consistency score after the $k$ -th round of reasoning. Initially, at $k=1$ , the model produces a chain of thought with consistency ${\cal C}^{(1)}={\cal C}_{\mathrm{CoT}}$ . After the second round of reasoning, the model refines its output, and the consistency score improves to $C^{(2)}=C_{\mathrm{Refined}}$ . We can generalize the improvement in consistency after $k$ rounds of reasoning as:
为了进一步分析迭代优化(iterative refinement)的影响,我们引入了一个递归函数来评估多轮推理的质量。设 $C^{(k)}$ 表示第 $k$ 轮推理后的逻辑一致性得分。初始阶段($k=1$),模型生成的思维链一致性为 ${\cal C}^{(1)}={\cal C}_{\mathrm{CoT}}$。经过第二轮推理后,模型优化了输出,一致性得分提升至 $C^{(2)}=C_{\mathrm{Refined}}$。经过 $k$ 轮推理后,一致性改进可表示为:
$$
C^{(k)}=C^{(k-1)}+\delta_{k}
$$
$$
C^{(k)}=C^{(k-1)}+\delta_{k}
$$
where $\delta_{k}$ represents the change in consistency from the $(k-1)$ -th to the $k$ -th round. In the case of Multiplex CoT, the first two rounds provide significant improvements, with diminishing returns observed as additional rounds of reasoning are performed.
其中 $\delta_{k}$ 表示从第 $(k-1)$ 轮到第 $k$ 轮一致性的变化量。在多重思维链 (Multiplex CoT) 情况下,前两轮带来显著提升,但随着推理轮次增加,改进幅度逐渐减小。
The total improvement after $K$ rounds of reasoning can be expressed as the cumulative sum of consistency changes:
经过 $K$ 轮推理后的总改进可表示为一致性变化的累积总和:
$$
{\mathrm{Total~Improvement}}=\sum_{k=1}^{K}\delta_{k}
$$
$$
{\mathrm{Total~Improvement}}=\sum_{k=1}^{K}\delta_{k}
$$
In practice, we observe that the most significant improvements occur in the first few rounds of reasoning. This behavior is consistent with the Multiplex CoT approach, where the second round of self-reflection provides substantial refinement to the reasoning process.
在实践中,我们观察到最显著的改进发生在最初几轮推理中。这一行为与Multiplex CoT方法一致,其中第二轮自我反思会对推理过程产生实质性优化。
4.4 Quantitative Validation of Multiplex CoT
4.4 多重思维链的定量验证
To validate the impact of Multiplex CoT, we conducted a series of experiments across various tasks. For each task, we measured both the logical consistency and error correction rate before and after applying Multiplex CoT. Below is a summary of the findings for the arithmetic problem-solving task.
为验证多重思维链(Multiplex CoT)的影响,我们在多项任务中开展了一系列实验。针对每项任务,我们分别测量了应用多重思维链前后的逻辑一致性指标与纠错率。以下是算术解题任务的实验结果汇总:
In this example, the Multiplex CoT approach improved logical consistency by 7%, while the error correction rate was $15%$ , indicating that the model was able to identify and correct a significant proportion of mistakes during the self-reflection phase.
在这个例子中,Multiplex CoT方法将逻辑一致性提高了7%,而纠错率达到$15%$,表明模型在自省阶段能够识别并纠正相当比例的错误。
Table 1: Performance of Multiplex CoT on Arithmetic Problem-Solving
表 1: Multiplex CoT 在算术问题求解中的性能表现
| 任务 | CcoT | CRefined | Ecorr | 推理质量提升 |
|---|---|---|---|---|
| 算术问题求解 | 85% | 92% | 15% | +7% |
4.5 Extension to Other Tasks
4.5 扩展到其他任务
We also evaluated the impact of Multiplex CoT on tasks beyond arithmetic, such as commonsense reasoning, ethical decision-making, and logical puzzles. Table 2 summarizes the performance across these tasks.
我们还评估了多路复用思维链 (Multiplex CoT) 在算术以外的任务上的影响,例如常识推理、伦理决策和逻辑谜题。表 2: 总结了这些任务的性能表现。
Table 2: Performance of Multiplex CoT on Various Tasks
表 2: 多路复用思维链 (Multiplex CoT) 在不同任务上的表现
| 任务 | CoT | MCoT | 逻辑一致性提升 | 纠错率 |
|---|---|---|---|---|
| 常识推理 | 78% | 85% | +9% | 12% |
| 伦理决策 | 74% | 81% | +10% | 18% |
| 逻辑谜题 | 82% | 90% | +10% | 20% |
As shown, Multiplex CoT consistently improves the logical consistency of reasoning across all tasks, with significant error correction rates observed in ethical decision-making and logical puzzles. These results highlight the effectiveness of Multiplex CoT in tasks requiring multistep reasoning and critical analysis.
如图所示,Multiplex CoT 在所有任务中持续提升了推理的逻辑一致性,尤其在伦理决策和逻辑谜题中观察到显著的纠错率。这些结果凸显了 Multiplex CoT 在需要多步推理和批判性分析任务中的有效性。
5 Conclusion
5 结论
In this section, we provided a detailed mathematical analysis of Multiplex CoT, quantifying its impact on logical consistency, coherence, and error correction rates. The findings demonstrate that Multiplex CoT significantly enhances the reasoning process of large language models by improving both the quality of reasoning and the model’s ability to self-correct. Through iterative refinement, Multiplex CoT outperforms traditional single-phase Chain of Thought reasoning, providing a more robust approach for tasks requiring logical rigor and critical reflection.
本节对Multiplex CoT进行了详细的数学分析,量化了其对逻辑一致性、连贯性和纠错率的影响。研究结果表明,Multiplex CoT通过提升推理质量和模型自校正能力,显著增强了大语言模型的推理过程。通过迭代优化,Multiplex CoT在需要严格逻辑和批判性反思的任务中,表现优于传统的单阶段思维链(Chain of Thought)推理,提供了更稳健的解决方案。
The combination of theoretical insights and experimental results confirms that Multiplex CoT offers a scalable and effective method for improving LLM performance, making it a valuable tool for applications requiring accurate, coherent, and consistent reasoning.
理论与实验结果的结合证实,Multiplex CoT 提供了一种可扩展且有效的方法来提升大语言模型 (LLM) 性能,使其成为需要精准、连贯且一致推理应用的宝贵工具。
