[论文翻译]TradExpert:利用专家混合大语言模型革新交易


原文地址:https://aiqianji.com/blog/user/931?tab=articles


TradExpert: Revolutionizing Trading with Mixture of Expert LLMs

TradExpert:利用专家混合大语言模型革新交易

Abstract

摘要

The integration of Artificial Intelligence (AI) in the financial domain has opened new avenues for quantitative trading, particularly through the use of Large Language Models (LLMs). However, the challenge of effectively synthesizing insights from diverse data sources and integrating both structured and unstructured data persists. This paper presents Trade Expert, a novel framework that employs a mix of experts (MoE) approach, using four specialized LLMs, each analyzing distinct sources of financial data, including news articles, market data, alpha factors, and fundamental data. The insights of these expert LLMs are further synthesized by a General Expert LLM to make a final prediction or decision. With specific prompts, Trade Expert can be switched between the prediction mode and the ranking mode for stock movement prediction and quantitative stock trading, respectively. In addition to existing benchmarks, we also release a large-scale financial dataset to comprehensively evaluate Trade Expert’s effectiveness. Our experimental results demonstrate Trade Expert’s superior performance across all trading scenarios.

人工智能 (AI) 在金融领域的集成为量化交易开辟了新的途径,特别是通过大语言模型 (LLMs) 的使用。然而,如何有效地从多样化的数据源中综合见解,并将结构化和非结构化数据整合起来,仍然是一个挑战。本文提出了 Trade Expert,这是一个采用专家混合 (MoE) 方法的新框架,使用了四个专门的大语言模型,每个模型分析不同的金融数据源,包括新闻文章、市场数据、阿尔法因子和基本面数据。这些专家大语言模型的见解由通用专家大语言模型进一步综合,以做出最终的预测或决策。通过特定的提示,Trade Expert 可以在预测模式和排名模式之间切换,分别用于股票走势预测和量化股票交易。除了现有的基准测试外,我们还发布了一个大规模的金融数据集,以全面评估 Trade Expert 的有效性。我们的实验结果表明,Trade Expert 在所有交易场景中均表现出色。

Introduction

引言

The fusion of artificial intelligence with financial analytics has spawned a new era of innovation, particularly with the infusion of Large Language Models (LLMs) into the realm of finance. These models, which have formerly excelled in natural language processing (NLP) tasks, are now being tailored to decode the complex and cryptic narratives of financial data. This adaptation is driven by a crucial insight: Financial markets are not just numbers-crunching engines but complicated information systems where the subtleties of news articles, reports, and economic indicators interweave to influence market dynamics.

人工智能与金融分析的融合催生了一个创新的新时代,特别是随着大语言模型 (Large Language Models, LLMs) 在金融领域的应用。这些模型在自然语言处理 (Natural Language Processing, NLP) 任务中表现出色,现在正被定制用于解码金融数据中复杂且隐晦的叙述。这种适应源于一个关键的洞察:金融市场不仅仅是数字处理的引擎,而是复杂的信息系统,其中新闻文章、报告和经济指标的微妙交织影响着市场动态。

Before the advent of LLMs, traditional financial models (Zeng et al. 2023; Yang et al. 2020; Liu et al. 2020; Baek and $\mathrm{Kim},2018,\$ ), primarily relied on quantitative methods such as statistical analysis, time series forecasting, and econometric models. These models often struggled to incorporate unstructured data such as news articles or financial reports without manual intervention. As a result, the development of LLMs tailored for financial applications has progressed rapidly. Initial ventures into this domain repurposed general LLMs such as GPTs (Brown 2020;

在大语言模型 (LLM) 出现之前,传统的金融模型 (Zeng et al. 2023; Yang et al. 2020; Liu et al. 2020; Baek and $\mathrm{Kim},2018,\$ ) 主要依赖于定量方法,如统计分析、时间序列预测和计量经济学模型。这些模型通常难以在没有人工干预的情况下整合非结构化数据,如新闻文章或财务报告。因此,针对金融应用定制的大语言模型发展迅速。该领域的初步尝试重新利用了通用的大语言模型,如 GPT (Brown 2020;


Figure 1: Illustration of traditional, LLM-based, and MoE LLMs-based financial models with diverse financial data sources.

图 1: 传统、基于大语言模型 (LLM) 和基于混合专家 (MoE) 大语言模型的金融模型与多样化金融数据源的示意图。

Achiam et al. 2023) and LLaMAs (Touvron et al. 2023a,b; AI $@$ Meta 2024) to interpret financial texts. However, more specialized language models such as FinBERT (Araci 2019), Bloomberg GP T (Wu et al. 2023), and FinGPT (Yang, Liu, and Wang 2023) have since evolved, demonstrating enhanced proficiency in understanding and predicting market movements from unstructured data. These models were specifically fine-tuned or pre-trained on vast amounts of financial corpus. This extensive training on domain-specific datasets has allowed them to better capture typical patterns in the financial corpus. Despite these advancements, the challenge remains to effectively synthesize insights from diverse data sources like historical stock prices, alpha factors, fundamental data, news articles, etc. In addition, integration of the deluge of unstructured financial texts with structured quantitative metrics still remains to be investigated with language models.

Achiam 等人 (2023) 和 LLaMAs (Touvron 等人 2023a,b; AI @ Meta 2024) 使用大语言模型来解释金融文本。然而,更专业的语言模型如 FinBERT (Araci 2019)、Bloomberg GPT (Wu 等人 2023) 和 FinGPT (Yang, Liu 和 Wang 2023) 已经发展起来,展示了在从非结构化数据中理解和预测市场走势方面的增强能力。这些模型专门针对大量金融语料进行了微调或预训练。这种对领域特定数据集的广泛训练使它们能够更好地捕捉金融语料中的典型模式。尽管取得了这些进展,但如何有效地从历史股价、阿尔法因子、基本面数据、新闻文章等多样化数据源中综合洞察仍然是一个挑战。此外,如何将大量非结构化金融文本与结构化定量指标整合,仍然需要在大语言模型中进行进一步研究。

To this end, we propose the TradExpert framework, which stands at the confluence of these challenges. It leverages a Mixture of Experts (MoE) approach (Eigen, Ranzato, and Sutskever 2013; Du et al. 2022; Shen et al. 2023), involving multiple LLMs each specialized in distinct facets of fi- nancial data—news articles, market data, alpha factors, and fundamental data. This not only enhances the model’s ability to process diverse data modalities but also allows for a more nuanced understanding of how different factors interact to influence market trends. Figure 1 illustrates differences among traditional, LLM-based, and MoE LLMsbased financial models. In TradeExpet, each expert works with a distinct focus and produces specialized reports, which are finally summarized and analyzed by a general expert, just like the structured division of labor seen in the real world. Specifically, TradExpert employs specialized LLMs to first independently analyze different data sources, then integrates these analyses via another LLM that synthesizes insights to predict market movements and inform trading strategies. Innovatively, we utilize a reprogramming mechanism to convert time series data to embeddings aligned with LLMs. In addition, we propose two modes for the General Expert LLM, prediction mode and ranking mode, for stock movement prediction and stock trading strategies, respectively. In ranking mode, we innovative ly let the LLM serve as a comparator within a relaxed sorting algorithm, enabling the selection of the Top-K ranked stocks for trading. To compre hens iv ely evaluate our method, we have also collected a large-scale datasets, which will be publicly released.

为此,我们提出了TradExpert框架,该框架位于这些挑战的交汇点。它利用了专家混合(MoE)方法(Eigen, Ranzato, and Sutskever 2013; Du et al. 2022; Shen et al. 2023),涉及多个大语言模型,每个模型专门处理金融数据的不同方面——新闻文章、市场数据、阿尔法因子和基本面数据。这不仅增强了模型处理多种数据模态的能力,还允许更细致地理解不同因素如何相互作用以影响市场趋势。图1展示了传统、基于大语言模型和基于MoE的大语言模型金融模型之间的差异。在TradExpert中,每个专家都有不同的关注点并生成专门的报告,最终由一位通用专家进行总结和分析,就像现实世界中的结构化分工一样。具体来说,TradExpert使用专门的大语言模型首先独立分析不同的数据源,然后通过另一个大语言模型整合这些分析,以预测市场动向并为交易策略提供信息。创新地,我们利用重新编程机制将时间序列数据转换为与大语言模型对齐的嵌入。此外,我们为通用专家大语言模型提出了两种模式,预测模式和排名模式,分别用于股票动向预测和股票交易策略。在排名模式中,我们创新地让大语言模型在宽松排序算法中充当比较器,从而能够选择排名前K的股票进行交易。为了全面评估我们的方法,我们还收集了一个大规模的数据集,并将公开发布。

Oue contributions can be summarized as follows.

我们的贡献可以总结如下。

Related Work

相关工作

Financial Language Models have significantly advanced in recent years, blending NLP techniques with financial analytics to extract meaningful insights from vast amounts of unstructured financial data. To begin with, FinBert (Araci 2019) is a financial domain-specific variant of BERT, pretrained on a large corpus of financial communications. In 2023, Bloomberg GP T (Wu et al. 2023) emerged as a 50- billion-parameter model trained on a vast financial dataset. FLANG (Shah et al. 2022) introduced a financial language model with specialized masking and objectives. Astock (Zou et al. 2022) provided a platform for studying NLP-aided stock auto-trading algorithms on the Chinese market. BBTFinT5 (Lu et al. 2023) advanced Chinese financial NLP with a large-scale pre-trained model. FinMA (Xie et al. 2023) showcased a model fine-tuned on a multi-task instruction datasets. FinGPT (Yang, Liu, and Wang 2023) provided an open-source framework for financial LLMs. InvestLM (Yang, Tang, and Tam 2023) showed the effectiveness of instruction tuning for investment-related tasks. FinReport (Li et al. 2024b) introduced a system for automatic financial report generation. Lastly, AlphaFin (Li et al. 2024a) integrated retrieval-augmented generation techniques for financial analysis. Collectively, these works demonstrate the evolution of financial NLP models and benchmarks, advancing the capabilities of LLMs in financial applications.

近年来,金融语言模型取得了显著进展,将自然语言处理 (NLP) 技术与金融分析相结合,从大量非结构化金融数据中提取有意义的见解。首先,FinBert (Araci 2019) 是 BERT 的金融领域特定变体,预训练于大量金融通信语料库。2023 年,Bloomberg GPT (Wu et al. 2023) 作为一个拥有 500 亿参数的模型出现,训练于庞大的金融数据集。FLANG (Shah et al. 2022) 引入了一种具有专门掩码和目标的金融语言模型。Astock (Zou et al. 2022) 提供了一个平台,用于研究在中国市场上基于 NLP 的股票自动交易算法。BBTFinT5 (Lu et al. 2023) 通过大规模预训练模型推进了中文金融 NLP 的发展。FinMA (Xie et al. 2023) 展示了一个在多任务指令数据集上微调的模型。FinGPT (Yang, Liu, and Wang 2023) 提供了一个开源框架用于金融大语言模型。InvestLM (Yang, Tang, and Tam 2023) 展示了指令微调在投资相关任务中的有效性。FinReport (Li et al. 2024b) 引入了一个用于自动生成财务报告的系统。最后,AlphaFin (Li et al. 2024a) 集成了检索增强生成技术用于金融分析。总的来说,这些工作展示了金融 NLP 模型和基准的演变,推动了大语言模型在金融应用中的能力提升。

Integration of Text and Financial Data has also been rapidly developed for stock movement prediction. StockNet (Xu and Cohen 2018) developed a deep generative model that jointly exploits text and price signals for stock movement prediction. SLOT (Soun et al. 2022) improved upon this by using self-supervised learning to handle sparse and noisy tweet data, capturing multi-level price trends. CH-RNN (Wu et al. 2018) introduced a hybrid deep sequential modeling approach that leverages social text for stock prediction, incorporating cross-modal attention mechanisms. More recently, studies (Lopez-Lira and Tang 2023; Chen et al. 2023) have explored the use of ChatGPT for stock movement prediction, comparing its performance with traditional state-of-the-art models. These works collectively demonstrate the increasing sophistication of models that integrate text and financial data, highlighting the potential for improving trading scenarios.

文本与金融数据的整合在股票走势预测领域也得到了快速发展。StockNet (Xu and Cohen 2018) 开发了一种深度生成模型,该模型联合利用文本和价格信号进行股票走势预测。SLOT (Soun et al. 2022) 通过使用自监督学习来处理稀疏且嘈杂的推文数据,捕捉多层次的价格趋势,从而改进了这一方法。CH-RNN (Wu et al. 2018) 引入了一种混合深度序列建模方法,利用社交文本进行股票预测,并融入了跨模态注意力机制。最近,研究 (Lopez-Lira and Tang 2023; Chen et al. 2023) 探索了使用 ChatGPT 进行股票走势预测,并将其性能与传统的最先进模型进行了比较。这些工作共同展示了整合文本和金融数据的模型日益复杂化,突显了其在改善交易场景中的潜力。

Problem Definition

问题定义

In this study, we aim to trade stocks using a framework that incorporates Large Language Models (LLMs).

在本研究中,我们旨在利用一个结合了大语言模型 (LLM) 的框架进行股票交易。

Task 1: Stock movement prediction is a fundamental challenge in quantitative trading, which involves the prediction of future price trends based on multifaceted data sources. Formally, let $\textstyle D={(x_{i},y_{i})}{i=1}^{N}$ denote our dataset, where $x{i}$ represents the input vector for the $i$ -th stock on day $t$ , and $y_{i};\in;{\mathrm{Rise},\mathrm{Fall}}$ is the corresponding label indicating whether the stock price will rise or fall on day $t+1$ . The input $x_{i}$ can be expressed as:

任务 1: 股票走势预测是量化交易中的一个基本挑战,它涉及基于多方面数据源预测未来价格趋势。形式上,令 $\textstyle D={(x_{i},y_{i})}{i=1}^{N}$ 表示我们的数据集,其中 $x{i}$ 表示第 $i$ 只股票在第 $t$ 天的输入向量,$y_{i};\in;{\mathrm{Rise},\mathrm{Fall}}$ 是对应的标签,表示股票价格在第 $t+1$ 天是上涨还是下跌。输入 $x_{i}$ 可以表示为:

x_{i}=\{\mathrm{News}_{i},\mathrm{Market}_{i},\mathrm{Factors}_{i},\mathrm{Fundamental}_{i}\}

Our objective is to learn a predictive function $f$ parameterized by $\theta$ such that $f_{\theta}(x_{i}),\approx,y_{i}$ , where $f_{\theta}$ is modeled using LLMs. The model outputs a binary prediction, “Rise” or “Fall” indicating the predicted stock price movement.

我们的目标是学习一个由 $\theta$ 参数化的预测函数 $f$,使得 $f_{\theta}(x_{i}),\approx,y_{i}$,其中 $f_{\theta}$ 使用大语言模型 (LLM) 进行建模。模型输出一个二元预测,“上涨”或“下跌”,表示预测的股票价格走势。

Task 2: Stock trading simulation involves evaluating the performance of Buy-and-Hold strategy based on the TopK ranked stocks sorted by TradExpert. This task simulates real-market trading scenarios to assess the profitability and risk of TradExpert using metrics including Annualized Return (AR), Sharpe Ratio (SR), Annualized Volatility (AV), and Maximum Drawdown (MD).

任务 2: 股票交易模拟涉及基于 TradExpert 排序的 TopK 股票评估买入持有策略的表现。该任务模拟真实市场交易场景,使用年化收益率 (AR)、夏普比率 (SR)、年化波动率 (AV) 和最大回撤 (MD) 等指标评估 TradExpert 的盈利能力和风险。


Figure 2: TradExpert operates by processing distinct sources of financial data such as news texts, market data, alpha factors, and fundamental data through specialized expert LLMs. Then their reports are sumarized and sent to a General Expert which delivers the final outputs: (1) prediction of stock movement with prediction mode, (2) which of the two stocks is better or worse with ranking mode.

图 2: TradExpert 通过专门的专家大语言模型处理不同的金融数据源,如新闻文本、市场数据、阿尔法因子和基本面数据。然后,它们的报告被汇总并发送给通用专家,通用专家提供最终输出:(1) 使用预测模式预测股票走势,(2) 使用排名模式判断两只股票中哪只更好或更差。

Datasets

数据集

In this study, we collected a comprehensive datasets encompassing various data sources including four primary components: News, Market Data, Alpha Factors, and Fundamental Data. The period covered by all data sources spans 4 years from January 1, 2020, to December 31, 2023.

在本研究中,我们收集了一个包含多种数据源的综合数据集,涵盖四个主要组成部分:新闻、市场数据、Alpha因子和基本面数据。所有数据源的时间跨度为2020年1月1日至2023年12月31日,共4年。

Stastics

统计

News is collected from several reputable financial news sources, including Yahoo Finance, Reuters, Investor Place, Globe News wire, The Motley Fool, etc. This dataset comprises a total of 524,995 news articles for stocks on S&P 500 list, with an average word count of 596.4 words per article. Each news article is associated with a list of related stock tickers.

新闻从多个知名财经新闻来源收集,包括 Yahoo Finance、Reuters、Investor Place、Globe News wire、The Motley Fool 等。该数据集共包含 524,995 篇关于标普 500 指数成分股的新闻文章,每篇文章的平均字数为 596.4 字。每篇新闻文章都与一系列相关股票代码相关联。

Market Data consists of historical daily OHLCV records for stocks on S&P 500 list. This dataset includes a total of 481,484 records, offering a detailed view of the stocks’ trading activity over the specified period.

市场数据由标普500指数成分股的历史日线OHLCV(开盘价、最高价、最低价、收盘价、成交量)记录组成。该数据集共包含481,484条记录,提供了指定时期内股票交易活动的详细视图。

Alpha Factors incorporates 108 technical indicators and factors with their expressions, which are believed to possess predictive power regarding stock price movements.

Alpha Factors 包含了 108 个技术指标和因子及其表达式,这些指标和因子被认为对股价走势具有预测能力。

Fundamental Data includes earnings call transcripts, financial statements, and fundamental metrics. The earnings call transcripts are sourced from Seeking Alpha, with 16 transcripts (4 years, quaterly updated) available for each stock. Fundamental metrics include Earnings Per Share (EPS), Price-to-Earnings Ratio (P/E Ratio), Book Value Per Share (BVPS), etc.

基础数据包括财报电话会议记录、财务报表和基本面指标。财报电话会议记录来源于Seeking Alpha,每只股票有16份记录(4年,每季度更新)。基本面指标包括每股收益(EPS)、市盈率(P/E Ratio)、每股账面价值(BVPS)等。

Data Split

数据分割

The datasets were split into training, validation, and test sets based on chronological order to ensure that future data remains unseen during the training process. The split was per

数据集按时间顺序分为训练集、验证集和测试集,以确保训练过程中未来的数据保持不可见。

Table 1: Components in each data source. $\dagger$ and $\ddagger$ denote generated by GPT-4 and external models, respectively.

表 1: 每个数据源中的组成部分。$\dagger$ 和 $\ddagger$ 分别表示由 GPT-4 和外部模型生成。

指令和提示 响应
新闻 新闻文章 走势推理
市场 OHLCV 嵌入统计 走势
Alpha 表达式描述 综合评分 走势
基本面 财报电话会议脚本 基本面指标 走势推理

formed as follows: Training set: January 1, 2020, to June 30, 2022. Validation set: July 1, 2022, to December 31, 2022. Testing set: January 1, 2023, to December 31, 2023.

训练集:2020年1月1日至2022年6月30日。验证集:2022年7月1日至2022年12月31日。测试集:2023年1月1日至2023年12月31日。

Methodology

方法论

In this study, we propose TradExpert, a novel framework leveraging the MoE LLMs approach, where four LLMs serve as specialized experts for distinct sources of financial data. A General Expert LLM then synthesizes the summaries of the four Expert LLMs to produce the final output. The pipeline of TradExpert is shown in Figure 2.

在本研究中,我们提出了TradExpert,这是一个利用MoE(混合专家)大语言模型方法的新框架,其中四个大语言模型作为专门处理不同金融数据源的专家。一个通用专家大语言模型随后综合这四个专家大语言模型的摘要,生成最终输出。TradExpert的流程如图2所示。

In TradExpert, all expert LLMs are built on the LLaMA2-7B backbone LLM (Touvron et al. 2023b) and are supervised and fine-tuned using the LoRA mechanism (Hu et al. 2022). Before training and fine-tuning, we preprocess the raw datasets to construct prompts, instructions, and groundtruth responses for each LLM. An overall description of the pre processed datasets is demonstrated in Table 1. The details will be introduced in the following.

在TradExpert中,所有专家大语言模型都基于LLaMA2-7B主干大语言模型 (Touvron et al. 2023b) 构建,并使用LoRA机制 (Hu et al. 2022) 进行监督和微调。在训练和微调之前,我们对原始数据集进行预处理,为每个大语言模型构建提示、指令和真实响应。预处理数据集的总体描述如表1所示。细节将在下文介绍。

News Analyst

新闻分析师

The News Analyst LLM is designed to analyze texts of news articles to predict stock movements. The prompt and instruction for fine-tuning the LLM are shown in Figure 3. The outputs from the News Analyst LLM include not only a prediction of the stock movement but also a reasoning of how the news article relates to the predicted movement in order to employ a Chain-of-Thought (CoT) (Wei et al. 2022) reasoning approach. The ground-truth reasonings are pre-generated by the OpenAI GPT-4 API using instructions and prompts that incorporate the actual stock movements and the texts of news articles.

新闻分析大语言模型 (News Analyst LLM) 旨在分析新闻文章文本以预测股票走势。微调大语言模型的提示和指令如图 3 所示。新闻分析大语言模型的输出不仅包括对股票走势的预测,还包括新闻文章与预测走势之间关系的推理,以采用思维链 (Chain-of-Thought, CoT) 推理方法 (Wei et al. 2022)。真实推理由 OpenAI GPT-4 API 预先生成,使用的指令和提示结合了实际股票走势和新闻文章文本。

Market Analyst

市场分析师

The Market Analyst LLM focuses on analyzing historical OHLCV (Open, High, Low, Close, Volume) data to predict stock movements. However, time series data is inherently continuous and lacks the discrete token structure that LLMs are designed to process. This misalignment poses a significant challenge in effectively utilizing LLMs on time series. To this end, we utilize a reprogramming mechanism (Jin et al. 2024) to reprogram the input financial time series into text prototype representations.

市场分析师大语言模型专注于分析历史OHLCV(开盘价、最高价、最低价、收盘价、成交量)数据以预测股票走势。然而,时间序列数据本质上是连续的,缺乏大语言模型设计用来处理的离散Token结构。这种不匹配在有效利用大语言模型处理时间序列数据时构成了重大挑战。为此,我们采用了一种重新编程机制(Jin et al. 2024),将输入的金融时间序列重新编程为文本原型表示。

Formally, let an OHLCV data instance be X(i) ∈RN×T which consists of $N$ variables across $T$ time steps. $\mathbf{X}^{(i)}$ is first divided and embedded into a sequence of patch embeddings $\left{\mathbf{X}{P}^{\left(i\right)}\in\mathbb{R}^{N\times L{P}\times d_{m}}\right}$ , where $L_{P}$ and $d_{m}$ are the number of patches and the patch embedding dimension respectively. The patches are then reprogrammed using a collection of text prototypes $\mathbf{E}^{\prime}\in\mathbb{R}^{V^{\prime}\times D}$ , which is achieved by linearly probing the LLM’s pre-trained word embedding $\check{\mathbf{E}},\in,\mathbb{R}^{V\times\dot{D}}$ , where $V$ and $V^{\prime}$ are the size of the vocabulary of the LLM and the text prototypes ( $V^{\prime}\ll V)$ , and $D$ is the embedding dimension. The reprogrammed patches are generated using a multi-head cross-attention mechanism: $\begin{array}{r}{\mathbf{Z}{k}^{\left(i\right)}\ =\ \mathrm{Softmax}\left[\frac{\mathbf{Q}{k}^{\left(i\right)}\mathbf{K}{k}^{\left(i\right)\top}}{\sqrt{d{k}}}\right)\mathbf{V}{k}^{\left(i\right)}}\end{array}$ , where query $\mathbf{Q}{k}^{\left(i\right)}\ =$ $\mathbf{X}{P}^{(i)}\mathbf{W}{k}^{Q}$ , key $\mathbf{K}_{k}^{\left(i\right)}=\mathbf{E}^{\prime}\mathbf{W}_{k}^{K}$ , and value $\mathbf{V}_{k}^{(i)}=\mathbf{E}^{\prime}\mathbf{W}_{k}^{V}$ for each head $k$ . The reprogrammed embeddings $\mathbf{O}^{(i)}$ are obtained by aggregating the outputs from each attention head and projecting them to the hidden dimensions of the backbone LLM. Finally, the reprogrammed embeddings are aug

形式上,设一个OHLCV数据实例为X(i) ∈RN×T,它由$N$个变量跨越$T$个时间步组成。$\mathbf{X}^{(i)}$首先被分割并嵌入到一系列补丁嵌入$\left{\mathbf{X}{P}^{\left(i\right)}\in\mathbb{R}^{N\times L{P}\times d_{m}}\right}$中,其中$L_{P}$和$d_{m}$分别是补丁的数量和补丁嵌入维度。然后使用一组文本原型$\mathbf{E}^{\prime}\in\mathbb{R}^{V^{\prime}\times D}$对这些补丁进行重新编程,这是通过线性探测大语言模型的预训练词嵌入$\check{\mathbf{E}},\in,\mathbb{R}^{V\times\dot{D}}$实现的,其中$V$和$V^{\prime}$分别是大语言模型和文本原型的词汇表大小($V^{\prime}\ll V)$,$D$是嵌入维度。重新编程的补丁通过多头交叉注意力机制生成:$\begin{array}{r}{\mathbf{Z}{k}^{\left(i\right)}\ =\ \mathrm{Softmax}\left[\frac{\mathbf{Q}{k}^{\left(i\right)}\mathbf{K}{k}^{\left(i\right)\top}}{\sqrt{d{k}}}\right)\mathbf{V}{k}^{\left(i\right)}}\end{array}$,其中查询$\mathbf{Q}{k}^{\left(i\right)}\ =$ $\mathbf{X}{P}^{(i)}\mathbf{W}{k}^{Q}$,键$\mathbf{K}_{k}^{\left(i\right)}=\mathbf{E}^{\prime}\mathbf{W}_{k}^{K}$,值$\mathbf{V}_{k}^{(i)}=\mathbf{E}^{\prime}\mathbf{W}_{k}^{V}$,对于每个头$k$。重新编程的嵌入$\mathbf{O}^{(i)}$通过聚合每个注意力头的输出并将它们投影到骨干大语言模型的隐藏维度来获得。最后,重新编程的嵌入被增强...

Figure 4: Instruction and prompt for the Market Analyst.

图 4: 市场分析师的指令和提示

mented with a language description of statistics extracted from TSFresh (Christ et al. 2018), serving as prompts for the Alpha Expert. An example of instruction and prompt is shown in Figure 4.

通过从 TSFresh (Christ et al. 2018) 提取的统计数据的语言描述来实现,作为 Alpha Expert 的提示。指令和提示的示例如图 4 所示。

Alpha Expert

Alpha Expert

The Alpha Expert specializes in processing expressionbased alpha factors, which are technical indicators and algorithm-generated factors believed to possess predictie power regarding stock price movements.

Alpha Expert 专注于处理基于表达式的 alpha 因子,这些因子是技术指标和算法生成的因子,被认为对股价走势具有预测能力。

We leverage GPT-4’s capability of understanding complex expressions to pre-generate a language description for each factor. In this way, we built our Alpha database, where an alpha record consists of:

我们利用 GPT-4 理解复杂表达式的能力,为每个因子预先生成语言描述。通过这种方式,我们构建了 Alpha 数据库,其中每个 Alpha 记录包含:

• Expression: The mathematical or logical formula used to compute the alpha factor based on OHLCV data. E.g. rank(ts argmax(corr(ts rank(close, 10), ts rank(volume, 10), 10), 5)) • Description: Generated by GPT-4 with prompts that include the expression.

• 表达式:用于基于 OHLCV 数据计算 alpha 因子的数学或逻辑公式。例如:rank(ts argmax(corr(ts rank(close, 10), ts rank(volume, 10), 10), 5))
• 描述:由 GPT-4 生成,提示中包含该表达式。

For each stock, we first calculate the values of all alpha factors based on OHLCV data and then derive a comprehensive score via a LightGBM-based model (Ke et al. 2017). Subsequently, we select Top-K alphas that contribute most significantly to this comprehensive score. Descriptions of these Top-K alphas are retrieved from the database and, along with the calculated values, are included in the prompts and instructions for the Alpha Expert.

对于每只股票,我们首先基于OHLCV数据计算所有alpha因子的值,然后通过一个基于LightGBM的模型(Ke et al. 2017)得出综合评分。随后,我们选择对综合评分贡献最显著的Top-K alpha因子。这些Top-K alpha因子的描述从数据库中检索出来,并与计算出的值一起包含在Alpha Expert的提示和指令中。

Fundamental Analyst

基本面分析师

The Fundamental Analyst LLM specializes in analyzing fundamental data, including earnings call transcripts and financial metrics, to predict stock price movements on a quarterly basis. The procedure of the Fundamental Analyst LLM is similar to that of the News Analyst LLM, with key differences being that the fundamental data is updated quarterly and, therefore, the movement predictions are made for the next quarter. The response should include a prediction in one of the following five categories: “Strong Rise”, “Moderate Rise”, “No Change”, “Moderate Fall”, or “Strong Fall”, followed by a reasoning.

基本面分析师大语言模型专注于分析基本面数据,包括财报电话会议记录和财务指标,以预测季度股价走势。基本面分析师大语言模型的流程与新闻分析师大语言模型类似,主要区别在于基本面数据每季度更新一次,因此预测的是下一季度的走势。响应应包括以下五个类别之一的预测:“大幅上涨”、“温和上涨”、“无变化”、“温和下跌”或“大幅下跌”,并附上推理。

Figure 5: Instructions and prompts for the General Expert LLM: (Top) Prediction mode, (Bottom) Ranking mode.

图 5: 通用专家大语言模型的指令和提示:(上) 预测模式,(下) 排序模式。

General Expert

通用专家

The General Expert LLM can operate in two distinct modes: prediction mode and ranking mode. Both modes begin by summarizing the reports (historical conversation including instructions, prompts, and responses) from the four specialized experts due to the limitations on input context length of the backbone LLM.

通用专家大语言模型可以在两种不同的模式下运行:预测模式和排序模式。由于骨干大语言模型的输入上下文长度限制,这两种模式都从总结四位专业专家的报告(包括指令、提示和响应的历史对话)开始。

In prediction mode, used for stock movement prediction, the summarized reports are used to construct a prompt with a prediction prefix. Given the summarized reports, the General Expert LLM outputs a binary prediction indicating whether the stock will rise or fall.

在预测模式中,用于股票走势预测时,汇总的报告被用来构建一个带有预测前缀的提示。给定汇总的报告,通用专家大语言模型会输出一个二元预测,指示股票是会上涨还是下跌。

In ranking mode, used for stock trading, the General Expert LLM functions as a comparator to establish the ranking ability. Specifically, given the summarized reports of two stocks, the General Expert LLM would determine which stock is likely to perform better in the future. To generate a Top-K ranking of stocks, we employ a relaxed comparisonbased sorting similar to BubbleSort: We initially compare every pair of stocks and count the number of wins for each stock. Subsequently, we sort these counts to establish the rankings for stocks. Although algorithms like QuickSort and vanilla BubbleSort offer fewer comparisons for Top-K selection on average $\mathcal{O}(N\log N)$ and $\mathcal{O}(N\cdot K)$ , we propose to use this relaxed comparison-based sorting alogrithm with $\mathcal{O}(N^{2})$ due to the non-transitive nature of LLM-based comparator (Liu et al. 2024). Therefore, more comparisons tend to yield more accurate rankings in practice.

在用于股票交易的排名模式中,通用专家大语言模型 (General Expert LLM) 充当比较器以建立排名能力。具体来说,给定两只股票的总结报告,通用专家大语言模型将确定哪只股票在未来可能表现更好。为了生成股票的前 K 名排名,我们采用了一种类似于冒泡排序的宽松比较排序方法:我们首先比较每对股票,并计算每只股票的胜出次数。随后,我们根据这些计数对股票进行排序以确定排名。尽管像快速排序和普通冒泡排序这样的算法在平均情况下提供了更少的比较次数($\mathcal{O}(N\log N)$ 和 $\mathcal{O}(N\cdot K)$),但由于基于大语言模型的比较器的非传递性 (Liu et al. 2024),我们建议使用这种宽松的比较排序算法,其复杂度为 $\mathcal{O}(N^{2})$。因此,在实践中,更多的比较往往会产生更准确的排名。

The General Expert LLM is finetuned on both tasks of stock movement prediction and stock comparison simultaneously. The instructions and prompts are shown in Figure 5.

通用专家大语言模型同时在股票走势预测和股票比较两个任务上进行了微调。指令和提示如图 5 所示。

Experiments

实验

In this section, we conduct a comprehensive evaluation for TradExpert framework on two main tasks: stock movement prediction and stock trading simulation. Our experiments aims to address the following research questions: RQ1: How does TradExpert perform in stock movement prediction compared with state-of-the-art baselines? RQ2: What are the potential profits and associated risks of TradExpert in the back testing on the real market? RQ3: How effective is the reasoning capability of TradExpert for unstructured data? RQ4: What is the significance of each expert within the TradExpert framework? RQ5: Why we choose the relaxed comparison-based sorting algorithm in TradExpert?

在本节中,我们对TradExpert框架在两个主要任务上进行了全面评估:股票走势预测和股票交易模拟。我们的实验旨在解决以下研究问题:
RQ1: 与最先进的基线方法相比,TradExpert在股票走势预测中的表现如何?
RQ2: 在真实市场的回测中,TradExpert的潜在利润和相关风险是什么?
RQ3: TradExpert对非结构化数据的推理能力有多有效?
RQ4: TradExpert框架中每个专家的意义是什么?
RQ5: 为什么我们在TradExpert中选择基于宽松比较的排序算法?

Datasets

数据集

We include two categories of datasets in our experiments:

我们在实验中包含了两类数据集:

• Benchmark Datasets: We use publicly available benchmark datasets in stock movement prediction research including CIKM18 (Wu et al. 2018), ACL18 (Xu and Cohen 2018), and BigData22 (Soun et al. 2022) datasets. • Proprietary Datasets: We also utilize our proprietary datasets, which include extensive historical OHLCV data, news articles, alpha factors, and fundamental metrics for a comprehensive analysis.

• 基准数据集:我们使用了股票走势预测研究中公开可用的基准数据集,包括 CIKM18 (Wu et al. 2018)、ACL18 (Xu and Cohen 2018) 和 BigData22 (Soun et al. 2022) 数据集。
• 专有数据集:我们还利用了我们的专有数据集,其中包括广泛的历史 OHLCV 数据、新闻文章、alpha 因子和基本面指标,以进行全面分析。

Experimental Setup

实验设置

In our experiments, the four expert LLMs and the General Expert LLM are bulit on the LLaMA-2-7B bakcbone model (Touvron et al. 2023b) and are finetuned via LoRA (Hu et al. 2022) mechanism.

在我们的实验中,四个专家大语言模型和通用专家大语言模型均基于 LLaMA-2-7B 骨干模型 (Touvron et al. 2023b) 构建,并通过 LoRA (Hu et al. 2022) 机制进行微调。

Stock Movement Prediction: TradExpert works in prediction mode, that is, the General Expert LLM reponses a binary prediction indicating whether a stock will rise or fall the next day. Methods are evaluated using binary classification metrics such as accuracy (Acc) and Matthews Correlation Coefficient (MCC).

股票走势预测:TradExpert 在预测模式下工作,即通用专家大语言模型会给出一个二元预测,指示某只股票次日是上涨还是下跌。评估方法采用二元分类指标,如准确率 (Acc) 和马修斯相关系数 (MCC)。

Stock Trading Simulation: TradExpert works in ranking mode, that is, the General Expert LLM acts as a comparator to sort the stocks. We simulate the real profit and risk of TradExpert by executing trades based on the Top-K ranked stocks. TradExpert and baselines are evaluated using metrics including Annualized Return (AR), Sharpe Ratio (SR), Annualized Volatility (AV), and Maximum Drawdown (MD).

股票交易模拟:TradExpert 在排名模式下工作,即 General Expert 大语言模型作为比较器对股票进行排序。我们通过根据 Top-K 排名的股票执行交易来模拟 TradExpert 的实际利润和风险。TradExpert 和基线模型使用年化收益率 (AR)、夏普比率 (SR)、年化波动率 (AV) 和最大回撤 (MD) 等指标进行评估。

Baselines

基线

For stock movement prediction, the baselines include: (1) Hybrid Models: StockNet (Xu and Cohen 2018), ALSTMW (Soun et al. 2022), ALSTM-D (Soun et al. 2022), SLOT (Soun et al. 2022). 2) Large Language Models: GPT-4 (Achiam et al. 2023), Gemini (Team et al. 2023), LLaMA2-70B (Touvron et al. 2023b), LLaMA3- 8B (AI@Meta 2024), FinMA-7B (Xie et al. 2023), FinGPTLlaMA2-7B (Yang, Liu, and Wang 2023), InternLM7B (Cai et al. 2024), Falcon-7B (Almazrouei et al. 2023), Mixtral-7B (Jiang et al. 2023).

对于股票走势预测,基线模型包括:(1) 混合模型:StockNet (Xu and Cohen 2018)、ALSTMW (Soun et al. 2022)、ALSTM-D (Soun et al. 2022)、SLOT (Soun et al. 2022)。(2) 大语言模型:GPT-4 (Achiam et al. 2023)、Gemini (Team et al. 2023)、LLaMA2-70B (Touvron et al. 2023b)、LLaMA3-8B (AI@Meta 2024)、FinMA-7B (Xie et al. 2023)、FinGPTLlaMA2-7B (Yang, Liu, and Wang 2023)、InternLM7B (Cai et al. 2024)、Falcon-7B (Almazrouei et al. 2023)、Mixtral-7B (Jiang et al. 2023)。

For stock trading simulation, the baselines include: (1)Traditional Models: Random Forest (Breiman 2001), Decision Tree (Loh 2011), SVM (Cortes and Vapnik 1995). (2) Deep Learning Models: A2C (Mnih et al. 2016), PPO (Schulman et al. 2017), SARL (Ye et al. 2020), EIIE (Jiang, Xu, and Liang 2017), and DeepTrader (Wang et al. 2021). To reduce computational costs in back testing, we evaluated all methods on datasets with stocks on the DOW 30 list, a subset of the S&P 500, with around 30 stocks.

股票交易模拟的基线包括:(1) 传统模型:随机森林 (Random Forest) (Breiman 2001)、决策树 (Decision Tree) (Loh 2011)、支持向量机 (SVM) (Cortes and Vapnik 1995)。(2) 深度学习模型:A2C (Mnih et al. 2016)、PPO (Schulman et al. 2017)、SARL (Ye et al. 2020)、EIIE (Jiang, Xu, and Liang 2017) 和 DeepTrader (Wang et al. 2021)。为了减少回测的计算成本,我们在 DOW 30 列表的股票数据集上评估了所有方法,该数据集是 S&P 500 的一个子集,包含大约 30 只股票。

Table 2: Comparison results on stock movement prediction task. As a binary classification problem, methods are evaluated by Accuracy (Acc) and Mattheus Correlation Coefficient (MCC). Both metrics are better with higher values. The best and second best results are in bold and underlined, respectively.

表 2: 股票走势预测任务的对比结果。作为一个二分类问题,方法通过准确率 (Acc) 和马修斯相关系数 (MCC) 进行评估。两个指标的值越高越好。最佳和次佳结果分别用加粗和下划线表示。

BigData22 ACL18 CIKM18 S&P500 (Ours)
Acc ↑ MCC ↑ Acc ↑ MCC ↑
混合模型
ALSTM-W 0.48 -0.01 0.53 0.08
ALSTM-D 0.49 0.01 0.53 0.07
StockNet 0.53 -0.02 0.54 -0.02
SLOT 0.55 0.10 0.59 0.21
大语言模型
GPT-4 0.54 0.03 0.52 0.02
Gemini 0.55 0.04 0.52 0.04
LLaMA2-7B-chat 0.54 0.05 0.51 0.01
LLaMA2-70B 0.47 0.00 0.51 0.01
LLaMA3-8B 0.55 0.02 0.52 0.02
FinMA-7B 0.51 0.02 0.51 0.03
FinGPT-7B-lora 0.45 0.00 0.49 0.00
InternLM 0.56 0.08 0.51 0.02
Falcon 0.55 0.00 0.51 0.00
Mixtral-7B 0.46 0.02 0.49 0.00
MoE 大语言模型
TradExpert-NM 0.59 0.12 0.60 0.15

Results

结果

Stock Movement Prediction We implemented all baselines ourselves or utilized existing open-source codes, except the closed-source model SLOT, for which we refer to the metrics reported in the relevant paper. To ensure a fair comparison, we only included the News Analyst and Market Analyst in TradExpert, named TradExeprt-NM. The results are shown in Table 2. As we can see, among hybrid models, SLOT achieves outstanding accuracy and MCC on the ACL18, benefitting from the proposed global market guidance. Among LLMs, InternLM shows remarkable performance, particularly on our proprietary S&P500 dataset. Our proposed TradExpert-NM, utilizing a mixture of expert LLMs approach, consistently outperformed other models across all datasets except for MCC on the ACL18, show- casing its superior performance. Noting that BigData22, ACL18, and CIKM18 are relatively small datasets with texts from tweets, while our S&P500 dataset consist of news articles with much more words. This difference in text lengths contributes to the more significant improvements obtained by TradExpert-7B-NM on the S&P500 dataset.

股票走势预测

我们自行实现了所有基线模型或利用了现有的开源代码,除了闭源模型 SLOT,我们参考了相关论文中报告的指标。为了确保公平比较,我们仅在 TradExpert 中包含了新闻分析师和市场分析师,命名为 TradExeprt-NM。结果如表 2 所示。我们可以看到,在混合模型中,SLOT 在 ACL18 数据集上取得了出色的准确率和 MCC,这得益于其提出的全球市场指导。在大语言模型中,InternLM 表现出色,尤其是在我们专有的 S&P500 数据集上。我们提出的 TradExpert-NM 采用了专家大语言模型混合方法,在所有数据集上均优于其他模型,除了 ACL18 上的 MCC,展示了其卓越的性能。需要注意的是,BigData22、ACL18 和 CIKM18 是相对较小的数据集,文本来自推文,而我们的 S&P500 数据集由新闻文章组成,字数更多。这种文本长度的差异使得 TradExpert-7B-NM 在 S&P500 数据集上获得了更显著的改进。

Stock Trading Simulation We perform back testing to evaluate TradExpert and baselines. To reduce computational costs in back testing, we limit the stock pool to about 30 stocks on the DOW 30, a subset of the S&P 500. For TradExpert, we implement a Buy-and-Hold trading strategy on the Top-K stocks ranked by TradExpert. The back testing period is the same as the testing period of our datasets, which ranges from January 1, 2023, to December 31, 2023. The results summarized in Table 3 demonstrate TradExpert’s superior performance across all metrics considered. Among traditional models, XGBoost achieved a relatively high return but also exhibited high volatility and drawdown, indicating greater risk. Deep learning models generally outperformed traditional models. Among them, DeepTrader stood out with the highest return and Sharpe ratio. TradExpert, our proposed model, significantly outperformed all other models with an exceptional AR of $49.79%$ and the lowest AV of $9.95%$ . This combination yielded an outstanding Sharpe ratio of 5.01, indicating a high return per unit of risk. Figure 6 shows the trends of cumulative returns over time for all methods.

股票交易模拟

我们通过回测来评估 TradExpert 和基线模型。为了降低回测的计算成本,我们将股票池限制在道琼斯 30 指数(DOW 30)中的约 30 只股票,这是标普 500 指数的一个子集。对于 TradExpert,我们在其排名前 K 的股票上实施买入并持有策略。回测周期与我们的数据集测试周期相同,即从 2023 年 1 月 1 日到 2023 年 12 月 31 日。表 3 中总结的结果表明,TradExpert 在所有考虑的指标上均表现出色。在传统模型中,XGBoost 实现了相对较高的回报,但也表现出较高的波动性和回撤,表明风险较大。深度学习模型通常优于传统模型。其中,DeepTrader 以最高的回报和夏普比率脱颖而出。我们提出的模型 TradExpert 显著优于所有其他模型,其异常的年化回报率(AR)为 $49.79%$,最低的年化波动率(AV)为 $9.95%$。这一组合产生了出色的夏普比率 5.01,表明每单位风险的高回报。图 6 显示了所有方法随时间累积回报的趋势。


Figure 6: Cumulative returns over time of all methods on 30 stocks on DOW 30 list. DJI Index represents the market trend.

图 6: 所有方法在 DOW 30 列表中的 30 只股票上的累计收益随时间的变化。DJI 指数代表市场趋势。

Table 3: Comparison results on stock trading simulation task with stocks on the DOW 30. Annualized Return (AR), Sharpe Ratio (SR), Annualized Volatility (AV), and Maximum Drawdown (MD) are utilized to evaluate the profits and risks of methods. The best results are in bold.

表 3: DOW 30 股票交易模拟任务的比较结果。使用年化收益率 (AR)、夏普比率 (SR)、年化波动率 (AV) 和最大回撤 (MD) 来评估方法的收益和风险。最佳结果以粗体显示。

MoE Large Language Models TradExpert49.79% 9.95%5.016.56%

AR↑ AV↓ SR↑ MD↑
DJI Index 13.92% 11.41% 1.22 9.70%
传统模型
SVM 15.77% 26.67% 0.59 19.94%
XGBoost 21.58% 27.29% 0.79 21.90%
LightGBM 2.17% 22.74% 0.1 21.29%
深度学习模型
A2C 19.16% 11.29% 1.7 9.09%
PPO 16.62% 11.51% 1.44 9.45%
EIIE 23.64% 13.73% 1.72 10.07%
SARL 21.87% 14.72% 1.49 8.52%
DeepTrader 32.45% 17.86% 1.82 15.32%

MoE 大语言模型 TradExpert 49.79% 9.95% 5.01 6.56%

Table 4: Ablation study for the impacts of experts.

表 4: 专家影响的消融研究

配置 AR↑ AV↓ SR↑ MD←
TradExpert 49.79% 9.95% 5.00 6.56%
w/oMarket 30.87% 16.43% 1.88 13.29%
w/oNews 31.92% 18.36% 1.74 13.04%
w/oAlpha 41.65% 11.38% 3.66 8.94%
w/oFundamental 44.32% 10.68% 4.15 7.82%

Ablation Study

消融研究

The Impacts of Experts To evaluate the effectiveness of each expert within the TradExpert framework, we created multiple versions of TradExpert, each with a specific expert removed. By comparing the performance of these modified frameworks, we can assess the impact of each expert on the overall functionality of TradExpert. The results in Table 4 reveal the varying degrees of impact of each expert. The Market Analyst and the News Analyst emerged as the most critical, significantly influencing profitability and risk management, as seen by the largest drop in AR and AV when they were removed, respectively. The Alpha Expert is obviously less impactful than the Market Analysts and the News Analysts. The Fundamental Analyst had the smallest effect on daily trading metrics, but provided essential long-term stability, evident from the modest changes in AR and MD upon its removal. This highlights a strategic balance in TradExpert, where each expert contributes uniquely to the final decision and prediction.

专家影响评估

为了评估TradExpert框架中每位专家的有效性,我们创建了多个版本的TradExpert,每个版本都移除了特定的专家。通过比较这些修改后的框架的性能,我们可以评估每位专家对TradExpert整体功能的影响。表4中的结果显示,每位专家的影响程度各不相同。市场分析师和新闻分析师最为关键,分别显著影响了盈利能力和风险管理,这从移除它们后AR和AV的最大下降中可以看出。Alpha专家的影响显然小于市场分析师和新闻分析师。基本面分析师对日常交易指标的影响最小,但提供了必要的长期稳定性,这从移除它后AR和MD的微小变化中可以看出。这突显了TradExpert中的战略平衡,每位专家都对最终决策和预测做出了独特的贡献。

The Effectiveness of Structured Data Reasoning. We show the effectiveness by comparing TradExpert-MA with traditional models for structured data like OHLCV data and alpha factors. We use a genetic programming-based symbolic regression model as our baseline, which mines alpha expressions aimed at predicting the RankIC of day $T+1$ ’s

结构化数据推理的有效性。我们通过将 TradExpert-MA 与传统的结构化数据模型(如 OHLCV 数据和 alpha 因子)进行比较,展示了其有效性。我们使用基于遗传编程的符号回归模型作为基线,该模型挖掘旨在预测第 $T+1$ 天 RankIC 的 alpha 表达式。

Table 5: Ablation study for the effectiveness of structured data reasoning in predicting day $T+1$ ’s returns.

表 5: 结构化数据推理在预测第 $T+1$ 天收益中的有效性消融研究。

模型 RankIC ↑ RankICIR ↑
TradExpert-MA 0.12 0.90
Alpha Combination 0.07 0.65

Table 6: Ablation study for the choices of ranking algorithm. $\dagger$ denotes being equipped in TradExpert.

表 6: 排序算法选择的消融研究。$\dagger$ 表示在 TradExpert 中配备。

RankIC ↑ RankICIR ↑ Time
RelaxedSortt 0.12 0.90
BubbleSort 0.06 0.65
QuickSort 0.03 0.38

returns. TradExpert-MA is built on top of the same alphas, where News and Fundamental experts were removed to exclude affects from other sources. We compare TradExpertMA with the combination of alphas using metrics of RankIC and RankICIR. The results are shown in Table 5. The improvements over the alpha combination demonstrate the reasoning ability of TradExpert for structured data.

TradExpert-MA 构建于相同的阿尔法因子之上,其中移除了新闻和基本面专家,以排除其他来源的影响。我们将 TradExpert-MA 与阿尔法因子的组合进行比较,使用 RankIC 和 RankICIR 作为评估指标。结果如表 5 所示。相较于阿尔法因子的组合,TradExpert 在结构化数据上的推理能力得到了显著提升。

The Choices of Ranking algorithm In TradExpert, we implement the Top-K ranking by sorting all stocks completely using a relaxed comparison-based algorithm, where TradExpert serves as the comparator. To justify our choice of this seemingly cumbersome approach, we conducted comparison experiments with other theoretically more efficient ranking algorithms. Specifically, our alternatives include QuickSort and BubbleSort with time complexity $\mathcal{O}(N\log N)$ and ${\mathcal{O}}(N{\cdot}K)$ , respectively. The comparison results in Table 6 demonstrate that our approach outperforms others, despite having a higher computational complexity. This is attributed to the non-transitive nature of LLM-based comparator. Therefore, a greater number of comparisons yield more accurate rankings in TradExpert.

在 TradExpert 中,我们通过使用基于松弛比较的算法对所有股票进行完全排序来实现 Top-K 排名,其中 TradExpert 作为比较器。为了证明我们选择这种看似繁琐的方法的合理性,我们与其他理论上更高效的排名算法进行了对比实验。具体来说,我们的替代方案包括时间复杂度分别为 $\mathcal{O}(N\log N)$ 和 ${\mathcal{O}}(N{\cdot}K)$ 的快速排序 (QuickSort) 和冒泡排序 (BubbleSort)。表 6 中的比较结果表明,尽管我们的方法具有更高的计算复杂度,但其性能优于其他方法。这归因于基于大语言模型的比较器的非传递性。因此,在 TradExpert 中,更多的比较会产生更准确的排名。

Conclusion

结论

In this study, we introduced Trade Expert, a novel framework that harnesses the power of LLMs to enhance stock trading strategies. By integrating multiple specialized LLMs, each focused on distinct aspects of financial data, Trade Expert provides a comprehensive and nuanced analysis that significantly outperforms traditional financial models in practice. Looking ahead, our goal is to explore how to employ TradeExpert in the high-frequency trading scenario and extend its capabilities to encompass a wider range of global markets.

在本研究中,我们介绍了 Trade Expert,这是一个利用大语言模型 (LLM) 来增强股票交易策略的新框架。通过整合多个专注于金融数据不同方面的专业大语言模型,Trade Expert 提供了全面且细致的分析,在实践中显著优于传统的金融模型。展望未来,我们的目标是探索如何在高频交易场景中使用 Trade Expert,并将其能力扩展到更广泛的全球市场。

Limitation Although TradExpert has notable strengths, its processing time poses certain challenges. On average, it takes 4.7 seconds for a single stock with an Nvidia A5000 GPU. For daily trading, this processing time is generally manageable. However, for scenarios demanding quicker decision-making, such as high-frequency trading, TradExpert’s latency becomes a notable drawback.

尽管 TradExpert 具有显著的优势,但其处理时间带来了一定的挑战。在配备 Nvidia A5000 GPU 的情况下,单只股票的平均处理时间为 4.7 秒。对于日常交易来说,这个处理时间通常是可以接受的。然而,在需要更快决策的场景中,例如高频交易,TradExpert 的延迟成为一个明显的缺点。

References

参考文献

阅读全文(20积分)