FingerFlex: Inferring Finger Trajectories from ECoG signals
FingerFlex: 从ECoG信号推断手指轨迹
Abstract
摘要
Motor brain-computer interface (BCI) development relies critically on neural time series decoding algorithms. Recent advances in deep learning architectures allow for automatic feature selection to approximate higher-order dependencies in data. This article presents the FingerFlex model - a convolutional encoder-decoder architecture adapted for finger movement regression on electro cor tico graphic (ECoG) brain data. State-of-the-art performance was achieved on a publicly available BCI competition IV dataset 4 with a correlation coefficient between true and predicted trajectories up to 0.74. The presented method provides the opportunity for developing fully-functional high-precision cortical motor braincomputer interfaces.
运动脑机接口(BCI)的开发高度依赖于神经时间序列解码算法。深度学习架构的最新进展使得自动特征选择能够近似数据中的高阶依赖关系。本文提出FingerFlex模型——一种适用于皮层脑电(ECoG)数据手指运动回归的卷积编码器-解码器架构。在公开可用的BCI竞赛IV数据集4上实现了最先进的性能,真实轨迹与预测轨迹之间的相关系数高达0.74。该方法为开发功能齐全的高精度皮层运动脑机接口提供了可能。
Introduction
引言
Brain-computer interface
脑机接口
One of the promising rehabilitation tools for people with disabilities is the brain-computer interface (BCI) (Lebedev and Nicolelis 2006; Bamdad, Zarshenas, and Auais 2015; Yuan and He 2014). In general, developing a functioning BCI involves interpreting neural signals to extract relevant information from the brain. The extracted features, i.e. control parameters can be used in the restoration of the lost functions and ability augmentation via controlling external devices. Specifically, motor BCIs aim to decode movementrelated information in the sensory-motor cortex into meaningful behavioral patterns, such as joint movement or finger flexion (Tam et al. 2019; Volkova et al. 2019).
脑机接口 (BCI) (Lebedev and Nicolelis 2006; Bamdad, Zarshenas, and Auais 2015; Yuan and He 2014) 是残障人士极具前景的康复工具之一。通常,开发功能性BCI需要解析神经信号以提取大脑中的相关信息。这些提取的特征(即控制参数)可通过控制外部设备,用于恢复丧失的功能或增强能力。具体而言,运动型BCI旨在将感觉运动皮层中与运动相关的信息解码为有意义的动作模式,例如关节运动或手指屈伸 (Tam et al. 2019; Volkova et al. 2019)。
Recording of the neural activity
神经活动记录
Brain activity is analyzed using electro physiological methods measuring single neuron or population level electric potentials. Most methods allowing to capture spiking activity, such as op to genetics, single-unit and multi-unit recordings are highly-invasive. At the same time, population level recording tools, such as electroencephalogram (EEG) and magneto e ncep halo gram (MEG) require no surgical intervention, albeit lack spacial resolution due to a voltage leakage in the human scalp and movement-related artifacts (Neil Cuffin and Cohen 1979; Hamalainen et al. 1993; Jackson and Bolger 2014). The electro cor tico gram (ECoG) used in this study lies in between these methods, estimated to register localized activity of $10^{5}$ neurons on the cortical surface (Engel et al. 2005). ECoG electrodes are placed over the surface of the cortex with approximately $1~\mathrm{cm}$ inter-electrode distance. Electro cor tico gram provides higher spatial resolution and superior signal quality compared to EEG and MEG with a tradeoff of portability and invasive ness.
脑活动通过测量单个神经元或群体水平电位的电生理方法进行分析。大多数能捕捉尖峰活动的方法(如光遗传学、单单元和多单元记录)都具有高度侵入性。而群体水平记录工具(如脑电图(EEG)和脑磁图(MEG))虽无需手术干预,但由于头皮电压泄漏和运动伪影导致空间分辨率不足 (Neil Cuffin和Cohen 1979; Hamalainen等 1993; Jackson和Bolger 2014)。本研究使用的皮层脑电图(ECoG)介于这些方法之间,估计能记录皮层表面约$10^{5}$个神经元的局部活动 (Engel等 2005)。ECoG电极以约$1~\mathrm{cm}$的间距放置在皮层表面,与EEG和MEG相比具有更高的空间分辨率和更优的信号质量,但牺牲了便携性和侵入性。
Movement correlates in the human brain
人类大脑中的运动相关性
The primary brain area involved in the movement processing is primary motor cortex (PMC) (Kakei, Hoffman, and Strick 1999). The adjacent brain areas - supplementary motor area (SMA) and somatosensory cortex take part into motor planning and receive peripheral inputs on the body position and environment (Tam et al. 2019). Planning and exe- cuting motor programs change the activity of single neurons in the cortical sheet of the motor areas (Kakei, Hoffman, and Strick 1999). Spiking activity in a synchronized neuronal population, in turn, contributes to the change in local field potentials registered via surface electrodes (Buzsaki, Anastassiou, and Koch 2012). Despite the lack of an exact areato-muscle correspondence in the motor cortex, overlapping regions associated with different muscle groups and even fingers can be distinguished (Schieber 2001; Mar janine j ad, Taherian, and Valero-Cuevas 2017; Georg o poul os and Carpenter 2015).
参与运动处理的主要脑区是初级运动皮层 (PMC) (Kakei, Hoffman, and Strick 1999)。相邻脑区——辅助运动区 (SMA) 和体感皮层参与运动规划,并接收关于身体位置和环境的周围输入 (Tam et al. 2019)。规划和执行运动程序会改变运动区皮层单神经元的活动 (Kakei, Hoffman, and Strick 1999)。同步神经元群中的尖峰活动反过来又会导致通过表面电极记录的局部场电位变化 (Buzsaki, Anastassiou, and Koch 2012)。尽管运动皮层中没有精确的脑区与肌肉对应关系,但可以区分与不同肌肉群甚至手指相关的重叠区域 (Schieber 2001; Marjaninejad, Taherian, and Valero-Cuevas 2017; Georgopoulos and Carpenter 2015)。
Another feature of population activity in the cortex is that its local field potential comprises a set of independent oscillatory processes (Buzsaki and Draguhn 2004). It was shown that changes in alpha band $(8-13{\mathrm{ Hz}})$ ) power, also known as mu-rhythm, beta $(13{-}35~\mathrm{Hz})$ ), gamma $(35–100~\mathrm{Hz})$ and high-gamma $\mathrm{(100-250~Hz)}$ correlate significantly with ongoing movement patterns (Tam et al. 2019; Mar janine j ad, Taherian, and Valero-Cuevas 2017). It is thus possible to use frequency-decomposed or time-frequency represented signal instead of raw time series data.
大脑皮层群体活动的另一个特征是,其局部场电位由一组独立的振荡过程组成 (Buzsaki and Draguhn 2004)。研究表明,阿尔法频段 $(8-13{\mathrm{ Hz}})$ 功率(也称为μ节律)、贝塔频段 $(13{-}35~\mathrm{Hz})$ 、伽马频段 $(35–100~\mathrm{Hz})$ 以及高伽马频段 $\mathrm{(100-250~Hz)}$ 的变化与当前运动模式显著相关 (Tam et al. 2019; Marjanine Jad, Taherian, and Valero-Cuevas 2017)。因此,可以使用频率分解或时频表示的信号代替原始时间序列数据。
The primary purpose of this study is to improve existing solutions for decoding a continuous finger trajectory based solely on ECoG-registered brain oscillator y patterns in the
本研究的主要目的是改进现有解决方案,仅基于ECoG记录的大脑振荡模式来解码连续手指轨迹。
sensor i motor cortex.
传感器 i 运动皮层。
Related work
相关工作
While some studies concentrated on classifying finger activations (Hotson et al. 2016; Onaran, Ince, and Cetin 2011; Yao and Shoaran 2019), the others predicted hand translations (Pistohl et al. 2008; Nakanishi et al. 2013; Bundy et al. 2016; Sliwowski et al. 2022) and gestures (Pan et al. 2018; Branco et al. 2017), numerous approaches were presented for solving finger trajectory prediction task.
一些研究专注于手指动作分类 (Hotson et al. 2016; Onaran, Ince, and Cetin 2011; Yao and Shoaran 2019),另一些则预测手部平移 (Pistohl et al. 2008; Nakanishi et al. 2013; Bundy et al. 2016; Sliwowski et al. 2022) 和手势 (Pan et al. 2018; Branco et al. 2017)。针对手指轨迹预测任务,目前已提出多种解决方案。
Liang and Bougrain (2012) and Flamary (2012) presented the first attempts to decode one-dimensional finger flexions on the publicly available BCI Competition dataset IV (Schalk et al. 2007). The former study employed switching linear models, while the latter - linear regression with rigorous feature selection, which allowed to win the competition. Nakanishi et al. (2014) used sparsed linear regression on frequency band decomposed signal achieving comparable results both on the BCI competition dataset and data gathered by their team. With the advancement of deep learning (LeCun, Bengio, and Hinton 2015) techniques over the following years, neural network approaches were more frequently used in consequent studies. For example, Xie, Schwartz, and Prasad (2018) used long short-term memory based (Hochreiter and Schmid huber 1997) architecture to decode finger trajectory. Jubien et al. (2019) compared different conventional machine learning methods with artificial neural network approaches on the same task to conclude that deep learning solves the task with higher accuracy. The method developed by Frey et al. (2021) presented the opportunity for developing a generalized decoder for neural data yielding high performance over a variety of tasks including finger trajectory regression over the ECoG data. The attempt to build an interpret able convolutional neural network was made by Petrosyan et al. (2021), which is highly relevant for neuroscience research on details of motor encoding in the human brain. Lastly, a new type of features based on Riemann geometry (Congedo, Barachant, and Bhatia 2017) was introduced in the finger movement decoding task by Yao, Zhu, and Shoaran (2022) used consequently with linear discriminators and support vector machine (SVM) regressors.
Liang和Bougrain (2012) 以及Flamary (2012) 首次尝试在公开的BCI Competition数据集IV (Schalk等, 2007) 上解码一维手指弯曲信号。前者采用切换线性模型,后者通过严格特征选择的线性回归方法赢得了比赛。Nakanishi等 (2014) 对频带分解信号使用稀疏线性回归,在BCI竞赛数据集和团队自采数据上都取得了可比结果。
随着深度学习 (LeCun, Bengio和Hinton 2015) 技术的发展,神经网络方法在后续研究中得到更广泛应用。例如Xie, Schwartz和Prasad (2018) 采用基于长短期记忆网络 (Hochreiter和Schmidhuber 1997) 的架构解码手指轨迹。Jubien等 (2019) 对比了传统机器学习方法与人工神经网络在该任务上的表现,证实深度学习能获得更高精度。
Frey等 (2021) 开发的神经数据通用解码器在包括ECoG数据手指轨迹回归在内的多任务中表现优异。Petrosyan等 (2021) 尝试构建可解释卷积神经网络,这对研究人脑运动编码的神经机制具有重要意义。Yao, Zhu和Shoaran (2022) 最新引入基于黎曼几何 (Congedo, Barachant和Bhatia 2017) 的特征表示方法,配合线性判别器和支持向量机 (SVM) 回归器进行手指运动解码。
The novelty of this work is primarily addressed by the use of different decoding paradigm based on the encoderdecoder architecture, which transforms feature time series into a target time series through a set of convolutions, deconvolutions, and skip-connections. Our research aims to prove this approach can outperform existing conventional machine learning solutions based on thorough feature extraction, as well as deep learning models implemented by the other research teams in a finger flexion decoding task.
这项工作的创新性主要体现在采用了基于编码器-解码器架构的全新解码范式,通过一系列卷积、反卷积和跳跃连接将特征时间序列转换为目标时间序列。我们的研究旨在证明,在手指弯曲解码任务中,该方法能够超越基于精细特征提取的传统机器学习方案,以及其它研究团队实现的深度学习模型。
Methods
方法
Dataset
数据集
The primary dataset for training the model was the BCI competition IV dataset 4 (Schalk et al. 2007). The dataset contains electro cor tico graphic recordings of three anonymous patients with corresponding finger movement information, represented as a one-dimensional coordinate denoting the finger fold. The recording grids consist of 48-64 electrodes evenly distributed in grids of 6x8, 8x8, and other shapes. The authors of a dataset imposed a limitation on ECoG electrode distribution information. The channels were shuffled beforehand and the electrode positions relative to the brain were not provided either. ECoG was recorded at a sampling frequency of 1000 Hertz, while finger movements were sampled at $25\mathrm{Hz}$ with a data glove.
训练模型的主要数据集是BCI竞赛IV数据集4 (Schalk et al. 2007)。该数据集包含三名匿名患者的皮层脑电图记录及对应的手指运动信息,后者以一维坐标表示手指弯曲程度。记录网格由48-64个电极组成,均匀分布在6x8、8x8等不同形状的网格中。数据集作者对ECoG电极分布信息进行了限制:通道数据已预先打乱,且未提供电极在大脑中的相对位置。皮层脑电图采样频率为1000赫兹,而手指运动数据通过数据手套以$25\mathrm{Hz}$采样。
The subjects were instructed to move a particular finger after a cue, which was presented as a corresponding word (e.g., ”thumb”) on a computer monitor. Each cue lasted for two seconds and was followed by a two-second resting period during which the screen was blank. More details on the experimental procedure and recording setup can be found in the dataset description (Miller and Schalk 2008).
受试者被要求在提示后移动特定手指,提示以相应单词(如"拇指")形式呈现在电脑显示器上。每个提示持续两秒,随后是两秒的空白屏幕休息期。有关实验程序和记录设置的更多细节可在数据集描述中查阅 (Miller and Schalk 2008)。
Data preprocessing and feature extraction
数据预处理与特征提取
Data preprocessing can be divided into two parts: preprocessing of ECoG signals and preprocessing of the finger movement data (Figure 1). The ECoG preprocessing pipeline consisted of the following steps. Firstly, data standar diz ation and subtraction of the median from each channel was performed. Then, band-pass and band-stop filters were used to filter frequencies in the range of 40 to $300\mathrm{Hz}$ and remove the power grid frequency of $50\mathrm{Hz}$ and its harmonics. The frequency range from 40 to $300\mathrm{Hz}$ constitutes gamma and high-gamma components of the local field potential signal, which were chosen as the main predictors for the task.
数据预处理可分为两部分:ECoG信号预处理和手指运动数据预处理 (图 1)。ECoG预处理流程包括以下步骤:首先对每个通道进行数据标准化并减去中值,接着使用带通和带阻滤波器过滤40至$300\mathrm{Hz}$频率范围,并去除$50\mathrm{Hz}$工频及其谐波。40至$300\mathrm{Hz}$频段构成局部场电位信号的伽马与高伽马成分,这些成分被选为该任务的主要预测指标。

Figure 1: ECoG and finger motion data processing pipeline
图 1: ECoG和手指运动数据处理流程
As a method of explicit extraction of time-frequency represent ation of the signal, the convolution with complex Morlet wavelets was chosen. Wavelet decomposition allows capturing specific frequency components of a signal change in time. Forty wavelet kernels were created with base frequencies ranging from 40 to $300\mathrm{Hz}$ evenly spaced on a logarithmic scale. The convolution operation of selected wavelets with each channel was performed and the amplitudes of resulting analytical signals were extracted as an absolute value of a complex-valued vector. As a result of this operation for each ECoG channel, a spec tr ogram is obtained of shape: (number of wavelets, number of time points in a time window). Then, the sampling rate was then lowered from $1000~\mathrm{Hz}$ to $100~\mathrm{Hz}$ by decimation for the correspondence to a motion capture time series. Next, the obtained timefrequency envelopes were standardized with respect to 0.1 and 0.9 quantiles, and the values not falling within this range were then taken equal to the corresponding scaler boundary values to combat outliers. This operation was done using the Robust Scale r class from the scikit-learn library, which was fitted on the training set and applied to both the training and validation sets. Lastly, to account for the delay between brain activity and actual movements, the forward shift of ECoG data with respect to the finger movement time series was introduced. The exact value of the shift can be regarded as a tunable hyper parameter within the relevant physiological range: $0-200\mathrm{ms}$ .
作为信号时频表征的显式提取方法,我们选择了与复Morlet小波的卷积运算。小波分解能够捕捉信号随时间变化的特定频率成分。我们创建了40个小波核,其基频在40至$300\mathrm{Hz}$范围内按对数尺度均匀分布。对每个通道执行选定小波的卷积运算后,提取所得解析信号的振幅作为复值向量的绝对值。该运算为每个ECoG通道生成一个形状为(小波数量,时间窗口内时间点数)的频谱图。随后通过降采样将采样率从$1000~\mathrm{Hz}$降至$100~\mathrm{Hz}$,以匹配动作捕捉时间序列。接着,使用0.1和0.9分位数对获得的时频包络进行标准化处理,超出该范围的值被截断至相应边界以消除异常值。此操作通过scikit-learn库的RobustScaler类实现,该缩放器在训练集上拟合后同时应用于训练集和验证集。最后,为补偿大脑活动与实际运动之间的延迟,对ECoG数据相对于手指运动时间序列进行了前向位移,位移量可作为可调超参数在$0-200\mathrm{ms}$生理范围内调整。
The preprocessing of finger movement data consisted of two steps. First, to keep the sampling frequencies of finger movements and time-frequency time series matched, upsampling from $25\mathrm{Hz}$ to $100\mathrm{Hz}$ using bicubic interpolation was conducted. Second, MinMax scaling was applied.
手指运动数据的预处理包含两个步骤。首先,为了使手指运动和时频时间序列的采样频率匹配,使用双三次插值法将采样频率从 $25\mathrm{Hz}$ 上采样至 $100\mathrm{Hz}$。其次,应用了最小-最大缩放 (MinMax scaling)。
The model architecture
模型架构
The final model is designed in the same way as the convolutional auto encoder with a set of additions and improvements. The architecture of the model is depicted on Figure 2
最终模型的设计方式与卷积自编码器类似,但增加了一系列改进。模型架构如图 2 所示
As the input, the model takes a window of ECoG features of arbitrary length and returns the corresponding window of predicted movements; however, during the training process, the length of such a window is fixed and equal to 256 time points (2.56 secs). Initially, the model performs dimensionality reduction using a regular convolutional layer. Next comes a set of convolutional encoder layers. Each encoder layer consists of a 1D convolution layer, layer normalization, GeLU activation function, dropout layer, and maximum pooling with stride equals 2. This part of the model is responsible for encoding the features of the time window fragments. Encoder layer architecture is inspired by wav2vec architecture (Schneider et al. 2019).
模型输入任意长度的ECoG特征窗口,输出对应的预测运动窗口;但在训练过程中,该窗口长度固定为256个时间点(2.56秒)。模型首先通过常规卷积层进行降维,随后接一组卷积编码层。每个编码层包含一维卷积层、层归一化、GeLU激活函数、Dropout层以及步长为2的最大池化。这部分结构负责对时间窗口片段特征进行编码,其设计灵感源自wav2vec架构 (Schneider et al. 2019)。
This is followed by a symmetrical set of convolutional decoder layers, each of which uses for further prediction not only information from the previous layer, but also from the symmetrical encoder layer. In other words, skip connections like those of the UNet model (Ronne berger, Fischer, and Brox 2015) were added. Skip connections make it possible to use for prediction not only the generalized representation in the latent space obtained by the encoder, but also the original local features, which ultimately increases the accuracy of the final model. The model is completed by a $1\times1$ convolution layer that uses the resulting set of features from each point in time to predict the coordinates of each of the five fingers at that point.
随后是一组对称的卷积解码器层,每层不仅利用前一层的特征进行预测,还结合了对称编码器层的信息。换言之,该模型添加了类似UNet (Ronneberger, Fischer, and Brox 2015) 的跳跃连接结构。跳跃连接使得预测过程既能利用编码器生成的潜在空间全局表征,又能保留原始局部特征,从而提升最终模型的精度。模型末端通过 $1\times1$ 卷积层整合每个时间点的特征集,预测该时刻五根手指的坐标位置。
Training details
训练细节
The BCI competition IV dataset 4 was already divided into training and validation sets, the recording of each of the three patients was divided into 2 parts, where the first part of 6.5 minutes duration constitutes training subset and the second part of 3.5 minutes long constitutes validation data subset. The model was trained and optimized using only the training subset, while the validation set was used for the performance measurement and hyper parameter adjustment. Due to lack of alignment of electrode grid shapes between subjects and intentional mixing of electrodes done by the dataset creators, the model was trained for each subject individually. Note that over fitting due to a hyper parameter search using the validation set was addressed by performing ablation study using only one subject provided in the dataset. The same model was trained on the other patients without any additional hyper parameter adjustment.
BCI竞赛IV数据集4已预先划分为训练集和验证集,每位患者的记录分为两部分:前6.5分钟构成训练子集,后3.5分钟构成验证数据子集。模型仅使用训练子集进行训练和优化,验证集则用于性能评估和超参数调整。由于受试者间电极网格形状未对齐且数据集创建者故意混排电极,模型需针对每位受试者单独训练。需注意的是,通过仅使用数据集中一名受试者进行消融研究,解决了因验证集超参数搜索导致的过拟合问题。相同模型在其他患者数据上训练时未进行额外超参数调整。
Loss function The loss function was chosen as the half sum of the mean squared error (MSE) and the mean cosine distance. Since the metric of the final score is correlation, it is essential to include it in the loss for training process. However, the correlation itself is not an optimal optimization criterion when working with small windows, since it is too sensitive to insignificant noise within this window in those cases when there is no movenet. Therefore, instead of a correlation loss, it was decided to use the cosine distance. The ablation study was conducted to test whether the combination of MSE and cosine similarity metrics performed better than any of those metrics alone. Nevertheless, due to the evaluation criteria at BCI competition IV, correlation coefficient (CC) metric was used to evaluate and compare model performance.
损失函数
损失函数选择为均方误差 (MSE) 和平均余弦距离之和的一半。由于最终评分的指标是相关性,因此必须将其纳入训练过程的损失中。然而,在处理小窗口时,相关性本身并不是一个最优的优化标准,因为在没有运动的情况下,它对窗口内不显著的噪声过于敏感。因此,决定使用余弦距离代替相关性损失。通过消融实验验证了 MSE 和余弦相似度指标的组合是否比单独使用任一指标表现更好。尽管如此,根据 BCI competition IV 的评估标准,仍采用相关系数 (CC) 指标来评估和比较模型性能。
Hyper parameters The number of features on each of the layers of the encoder as a result of the experiments was taken equal to (32, 32, 64, 64, 128, 128). The number of features in the decoder layers is symmetrical with respect to the bottleneck. The stride size was taken equal to 2 to reduce the sampling rate of the original window as gradually as possible, which allows better extraction of common features.
超参数
编码器各层的特征数根据实验结果设定为 (32, 32, 64, 64, 128, 128)。解码器各层的特征数与瓶颈层呈对称分布。步长设为2以尽可能渐进地降低原始窗口的采样率,这有助于更好地提取共性特征。
The learning rate was picked according to the Pytorch Lightning library’s automatic learning rate finder recommendation and was set fixed to $8.4\cdot10^{-5}$ .
学习率根据Pytorch Lightning库的自动学习率查找器推荐选取,并固定设置为 $8.4\cdot10^{-5}$ 。
As was previously mentioned, wavelet number, loss function combination coefficients and the delay between ECoG and hand movement time series can be considered as the tunable hyper parameters.
如前所述,小波数量、损失函数组合系数以及ECoG信号与手部运动时间序列之间的延迟可视为可调超参数。
Results
结果
Model development
模型开发
Every subsequently improved model was trained for 30 epochs. Model weights trained during the epoch preceding over fitting of the model were chosen for inference. The model refinement took several important steps to achieve the final performance.
每个后续改进的模型都训练了30个周期。选择模型过拟合前一个周期训练的权重进行推理。模型优化经历了几个重要步骤以实现最终性能。
Architecture refinement Baseline encoder-decoder architecture, trained on unscaled absolute values of waveletconvoluted signals, yielded average correlation value of 0.2 on the first subject (Table 1). However, adding data scaling both for ECoG features and finger movements improved results using the same architecture by more than two-fold, to 0.45 on average on the first subject. It can be speculated that this win in performance is due to highly variable absolute values of the time-frequency representation of the signal. The electro physiological recordings consistently show a $1/f$ relationship between signal spectral power and a corresponding frequency (Buzsaki and Draguhn 2004), leading to a multiple-fold difference between signal power in highly distinct frequency bands. A modification of the encoderdecoder architecture parameters, which are mainly a number of features in each layer and the number of layers allowed to increase the average correlation coefficient further to 0.55. The final model architecture is shown in Figure 2. Then the use of wavelets only from the gamma frequency band and an increase in their number up to 40 also led to an improvement in the results to an average correlation value of 0.6. Finally, skip connections were added, which increased the mean correlation coefficient to 0.66.
架构优化
基线编码器-解码器架构在小波卷积信号未缩放的绝对值上训练时,在第一位受试者上获得的平均相关系数为0.2 (表1)。然而,在相同架构下,对ECoG特征和手指运动数据进行缩放后,结果提升了两倍以上,第一位受试者的平均相关系数达到0.45。可以推测,这种性能提升源于信号时频表示的绝对值高度可变性。电生理记录始终显示信号频谱功率与对应频率之间存在 $1/f$ 关系 (Buzsaki and Draguhn 2004),导致不同频段间的信号功率存在多倍差异。
通过调整编码器-解码器架构参数(主要是每层特征数和层数),平均相关系数进一步提升至0.55。最终模型架构如图2所示。随后,仅使用伽马频段的小波并将其数量增加至40个,结果进一步改善,平均相关系数达到0.6。最后,添加跳跃连接使平均相关系数增至0.66。

Figure 2: FingerFlex model architecture
图 2: FingerFlex 模型架构
Table 1: Model development steps with corresponding performance obtained on the validation set.
| Modelimprovementstep | Meancorrelation |
| Baselinemodel | 0.2 |
| +Feature scaling | 0.45 |
| +Architectureimprovement | 0.55 |
| + Frequency range adjustment | 0.59 |
| + Optimizing number of wavelets | 0.6 |
| + Adding skip-connections | 0.66 |
表 1: 模型开发步骤及在验证集上获得的对应性能
| 模型改进步骤 | 平均相关性 |
|---|---|
| 基线模型 | 0.2 |
| +特征缩放 | 0.45 |
| +架构改进 | 0.55 |
| +频率范围调整 | 0.59 |
| +优化小波数量 | 0.6 |
| +添加跳跃连接 | 0.66 |
Hyper parameter tuning Then an ablation study was performed to calculate the optimal value of the delay hyperparameter between the ECoG signal and the predicted motion. The baseline delay chosen as the initial one was $20\mathrm{ms}$ , and as the consequent experiments showed that for the built model the value of delay is not crucial, as long as it is within
超参数调优
随后进行消融实验以计算ECoG信号与预测运动之间延迟超参数的最优值。初始选择的基线延迟为$20\mathrm{ms}$,后续实验表明,对于所构建的模型而言,只要延迟值在一定范围内,其具体数值并不关键。
$0\mathrm{ -~}200\mathrm{ms}$ range. Therefore, the final shift value between ECoG and finger motion time series was kept at $20\mathrm{ms}$ .
$0\mathrm{ -~}200\mathrm{ms}$ 范围内。因此,ECoG与手指运动时间序列间的最终偏移值保持为 $20\mathrm{ms}$ 。
Another ablation study was conducted to test whether the use of a combined loss function - the sum of cosine distance and MSE is empirically justified. Cosine distance, when used alone yielded a correlation value of 0.44, which is a worse result than that of MSE (0.64). However, using a combination of cosine distance and MSE increases the correlation by an average of 0.02 compared to the regular MSE, summing up to 0.66.
另一项消融研究旨在验证结合使用余弦距离和均方误差(MSE)的损失函数是否具有实证依据。单独使用余弦距离时,相关性值为0.44,低于MSE的0.64。然而,将余弦距离与MSE结合使用后,相关性平均比单独使用MSE提高了0.02,最终达到0.66。
Decoding performance
解码性能
Finger movement inference, carried out on three subjects presented in the BCI Competition IV dataset 4 showed the following results: 0.66 average correlation on the first subject, 0.62 on the second and 0.74 on the third (see Table 3). Recall that each patient had its own model trained, however, hyper parameter tuning was carried out only using the first subject. The mean correlation coefficient across all subjects is 0.67. Correlation values for different fingers are in the range of $0.6-0.75$ for the first subject, $0.53-0.73$ for the second and $0.68-0.78$ for the third.
在BCI Competition IV数据集4中对三位受试者进行的手指运动推断结果显示:第一位受试者平均相关性为0.66,第二位为0.62,第三位为0.74 (见表3)。需注意的是,每位患者使用独立训练的模型,但超参数调优仅基于第一位受试者完成。全体受试者的平均相关系数为0.67。各手指相关性数值范围分别为:第一位受试者$0.6-0.75$,第二位$0.53-0.73$,第三位$0.68-0.78$。
Model verification on a different dataset
在不同数据集上进行模型验证
To check model stability across datasets, the built model was also tested on Stanford dataset (Miller 2019). Since the data representation in this dataset was the same as in BCI Competition IV dataset 4, the entire data processing pipeline (Figure 1) remained unchanged.
为了验证模型在不同数据集上的稳定性,构建的模型还在Stanford数据集 (Miller 2019) 上进行了测试。由于该数据集的数据表示与BCI Competition IV数据集4相同,整个数据处理流程 (图1) 保持不变。
The feature processing, hyper parameters and model architecture were fixed. The obtained results are represented in the Table 4. While there are subjects, which the model fails to recognize finger movements with correlations of 0.2 0.4, there are patients, who have mean correlation coefficients between predicted and actual finger movements in the range of $0.5-0.7.$ . The low performance on some patients could be attributed to irrelevant positioning of the ECoG grid with regards to brain areas mostly involved during movement planning and execution.
特征处理、超参数和模型架构均已固定。所得结果如表4所示。虽然存在模型无法识别的相关性在0.2-0.4之间的手指运动受试者,但也有患者的预测手指运动与实际手指运动之间的平均相关系数在$0.5-0.7$范围内。部分患者表现不佳的原因可能是ECoG网格的定位与运动和计划执行过程中主要涉及的大脑区域无关。
Table 2: Performance comparison on works solving BCI Competition IV dataset 4 finger trajectory decoding task
表 2: BCI竞赛IV数据集4指轨迹解码任务解决方案性能对比
| 方法 | S1 | S2 | S3 | 平均值 |
|---|---|---|---|---|
| Flamary (2012)-BCI竞赛第二名 | 0.48 | 0.24 | 0.56 | 0.43 |
| Liang和Bougrain (2012)-BCI竞赛冠军 | 0.45 | 0.39 | 0.59 | 0.48 |
| Xie等人 (2018)-LSTM | 0.56 | 0.41 | 0.58 | 0.52 |
| Frey等人 (2021)-多功能CNN | N/A | N/A | N/A | 0.52 |
| Petrosyan等人 (2021)-可解释CNN | 0.45 | 0.34 | 0.56 | 0.45 |
| Yao等人 (2022)-lightGBM | 0.52 | 0.47 | 0.61 | 0.53 |
| 我们的解决方案(FingerFlex) | 0.66 | 0.62 | 0.74 | 0.67 |
Table 3: FingerFlex model performance on the BCI competition IV dataset 4. Correlation coefficients are calculated separately for each subject and finger using the provided validation set.
表 3: FingerFlex 模型在 BCI competition IV 数据集 4 上的性能表现。相关系数是使用提供的验证集分别针对每个受试者和手指计算得出的。
| 受试者 | 拇指 | 食指 | 中指 | 无名指 | 小指 | 平均值 |
|---|---|---|---|---|---|---|
| S1 | 0.66 | 0.75 | 0.64 | 0.66 | 0.6 | 0.66 |
| S2 | 0.66 | 0.73 | 0.53 | 0.63 | 0.54 | 0.62 |
| S3 | 0.78 | 0.68 | 0.68 | 0.78 | 0.76 | 0.74 |
| 平均值 | 0.7 | 0.72 | 0.62 | 0.69 | 0.63 | 0.67 |
Table 4: FingerFlex model performance (correlation coefficients) on the ECoG finger motion dataset provided by Miller (2019).
表 4: FingerFlex模型在Miller (2019)提供的ECoG手指运动数据集上的性能表现(相关系数)。
| 受试者 | 拇指食指 | 中指无名指 | 小指 | 平均值 |
|---|---|---|---|---|
| bs | 0.65 0.66 | 0.45 0.53 | 0.44 | 0.55 |
| cC | 0.76 0.70 | 0.61 0.76 | 0.73 | 0.71 |
| zt | 0.59 | 0.73 0.68 0.62 | 0.53 | 0.63 |
| jp | 0.68 | 0.67 0.31 0.65 | 0.53 | 0.57 |
| ht | 0.31 | 0.32 0.34 0.36 | 0.36 | 0.34 |
| mv | 0.42 | 0.85 0.42 0.40 | 0.71 | 0.56 |
| wc | 0.45 | 0.63 0.42 0.27 | 0.31 | 0.42 |
| wm | 0.33 | 0.08 0.25 0.32 | 0.06 | 0.22 |
| jc | 0.65 | 0.37 0.32 | 0.43 0.37 | 0.43 |
Discussion
讨论
Contribution
贡献
Our approach shows great promise in developing highprecision decoders of neural signals related to movements for use in the brain-computer interface. The proposed model significantly outperforms the winner of the BCI competition IV with a mean correlation value of approximately 0.67. It also improves greatly the current state-of-the-art results of other research groups solving the finger trajectory decoding task (Table 2). Moreover, FingerFlex successfully decoded finger trajectories using the other dataset without any additional hyper parameter search (Table 4)
我们的方法在开发用于脑机接口的高精度神经信号解码器方面展现出巨大潜力。该模型以约0.67的平均相关系数显著超越了BCI竞赛IV的冠军成绩,同时大幅提升了其他研究团队在手指轨迹解码任务上的当前最优结果 (表2)。此外,FingerFlex无需额外超参数搜索就成功解码了另一数据集的手指轨迹 (表4)。
The method developed in this study stands on several important ideas and improvements. First of all, the timefrequency feature selection choice was crucial to capture fine changes in different frequency components of the brain signals responsible for finger movements. Secondly, the hybrid encoder-decoder architecture with skip connections was employed for the first time for the ECoG decoding task. In addition, the introduction of a new loss function to a conventional MSE further improved regression results. Lastly, the physiologically relevant hyper parameters, such as time delay between finger trajectory and ECoG time series and frequency range for wavelet convolution alongside with conventional hyper parameters, such as learning rate and model depth, are stable across participants and datasets. These hyper parameters can be fixed once providing sufficient performance for the task.
本研究开发的方法基于几个重要理念和改进。首先,时频特征选择对捕捉负责手指运动的脑信号在不同频率成分中的细微变化至关重要。其次,首次在ECoG解码任务中采用了带有跳跃连接的混合编码器-解码器架构。此外,在传统均方误差(MSE)基础上引入的新损失函数进一步提升了回归结果。最后,生理相关的超参数(如手指轨迹与ECoG时间序列之间的时延、小波卷积的频率范围)与传统超参数(如学习率和模型深度)在参与者和数据集间保持稳定。这些超参数一旦确定,就能为任务提供足够性能。
There are also several advantages inherent in the suggested approach. Firstly, our method does not require training separate decoders for resting and moving states, as that was done in El ghara b awy and Wahed (2016) and Flamary (2012). It captures both changes in finger coordinate during movement periods and during resting periods without switching and adding any additional information. Secondly, the model is relatively lightweight, having only about 600000 parameters, which potentially allows to use it in realtime inference applications. Furthermore, the method does not rely on ECoG data, which follows the movement in time, i.e. it makes predictions based only on a current time window, not taking into account future and preceding ones. This feature further facilitates real-time deployment. Lastly, the length of the time window can be arbitrary due to the nature of convolution operation, thus providing further flexibility for other decoding solutions.
所提出的方法还具有几项固有优势。首先,我们的方法无需像El gharabawy和Wahed (2016) 与Flamary (2012) 那样为静止和运动状态分别训练解码器,它能同时捕捉运动期和静止期的手指坐标变化,无需切换或添加额外信息。其次,该模型相对轻量,仅包含约60万个参数,这使得它有望应用于实时推理场景。此外,该方法不依赖随时间追踪运动的ECoG数据,即仅基于当前时间窗口进行预测,无需考虑前后时间窗,这一特性进一步提升了实时部署的可行性。最后,由于卷积运算的特性,时间窗口长度可任意设定,这为其他解码方案提供了更大的灵活性。
Limitations and potential improvements
局限性与潜在改进空间
The unequal electrode placement between subjects and uncertainty about electrode positioning relative to the brain limit the possible generalization approaches. However, the proposed method is highly suitable for individualized training. One of the strongest arguments for that is that only 6.5 minutes of recording is sufficient to achieve a significant performance on decoding fine movements.
受试者间电极放置位置的不对称性以及电极相对于大脑定位的不确定性,限制了可能的泛化方法。然而,所提出的方法非常适合个性化训练。最有力的论据之一是仅需6.5分钟的记录即可在解码精细动作方面取得显著性能。

Figure 3: An example of true and decoded finger trajectories time series. The data segment for visualization was taken from the validation set
图 3: 真实与解码手指轨迹时间序列示例。可视化数据段取自验证集
One of the further research directions is implementing transfer learning across participants (Wan et al. 2021). If electrode positions and subjects’ magnetic-resonance imaging scans are available, it becomes possible to use source reconstruction approaches (Hamalainen et al. 1993; Chen, Yao, and Hudson 2002), i.e. solving the ”inverse problem”. This allows making a transition from individualized brain surface spatial patterns (electrode space) to standardized cortical activation s (source space). This is essential for developing cross-subjects solutions and training a network on the larger amounts of data gathered in different experimental settings and across participants. Another possible way of solving the lack of correspondence of electrode location between subjects is to use separate feature extractors for each subject inferring generalized features suitable for transfer learning like was done in the work of Peterson et al. (2021).
进一步研究方向之一是实现跨参与者的迁移学习 (Wan et al. 2021) 。若电极位置和受试者的磁共振成像扫描数据可用,便可采用源重建方法 (Hamalainen et al. 1993; Chen, Yao, and Hudson 2002) ,即求解"逆问题"。这使得从个体化脑表面空间模式 (电极空间) 向标准化皮层激活 (源空间) 的转换成为可能。这对于开发跨受试者解决方案、在不同实验设置和参与者群体采集的海量数据上训练网络至关重要。另一种解决受试者间电极位置不对应问题的方法是采用类似Peterson等人 (2021) 研究的方案,为每个受试者配备独立特征提取器来推断适用于迁移学习的通用特征。
Besides transfer learning, a possible advancement could be adding interpret able blocks to enhance understanding of the data. An important step in that direction was made by Petrosyan et al. (2021) and Sturm et al. (2016), and while adding interpret able blocks might worsen model performance, research in this direction could provide useful insights on motor encoding in the human brain and what features of ECoG signals really matter for the movement prediction.
除了迁移学习,一个可能的改进方向是添加可解释模块以增强对数据的理解。Petrosyan等人 (2021) 和Sturm等人 (2016) 已在该方向迈出重要一步。虽然添加可解释模块可能降低模型性能,但该方向的研究能为人类大脑运动编码提供有益洞见,并揭示ECoG信号中哪些特征真正影响运动预测。
Finally, an important step would be to omit any manual feature extraction of the time series data allowing the neural network to learn to extract relevant features for the task itself. Time-frequency methods have their limitations and their use imply increasing a number hyper parameters, which is not always favorable. Temporal convolutions are by far the most relevant way to implement automatic feature extraction and this approach also works well, as was shown by Li et al. (2021b), Li et al. (2021a) and Peterson et al. (2021). The possible downside of this approach would be the increased demand for lengthy data due to the increased number of parameters in the network, and gathering more data is often impossible in clinical settings of neuro physiological research.
最后,关键的一步是省略时间序列数据的手动特征提取,让神经网络自行学习提取与任务相关的特征。时频分析方法存在局限性,其使用会增加超参数数量,这并不总是有利的。目前,时间卷积是实现自动特征提取最有效的方式,Li等人(2021b)、Li等人(2021a)和Peterson等人(2021)的研究已证实该方法的有效性。这种方法的潜在缺点是由于网络参数增加,对长序列数据的需求会增大,而在神经生理学临床研究中,通常无法获取更多数据。
Conclusion
结论
The present study developed a novel deep learning approach for decoding finger flexions based on oscillator y activity on the ECoG channels. With the use of convolutional encoderdecoder architecture with skip-connections, the performance on the trajectory regression task was significantly improved compared to the existing solutions using the BCI competition IV dataset 4. The proposed model can potentially be used for real-time inference of finger flexions. This research shows promise for developing fast, functional, and precise neuro prosthetic devices.
本研究开发了一种基于ECoG通道振荡活动解码手指弯曲的新型深度学习方法。通过采用带跳跃连接的卷积编码器-解码器架构,在BCI竞赛IV数据集4上的轨迹回归任务性能较现有解决方案显著提升。该模型有望用于手指弯曲的实时推理,为开发快速、功能性强且精确的神经假肢设备提供了新思路。
