A Novel Bi-hemispheric Discrepancy Model for EEG Emotion Recognition

一种新型双半球差异模型用于脑电情绪识别

Abstract—The neuroscience study [1] has revealed the discrepancy of emotion expression between left and right hemispheres of human brain. Inspired by this study, in this paper, we propose a novel bi-hemispheric discrepancy model (BiHDM) to learn the asymmetric differences between two hemispheres for electroencephalograph (EEG) emotion recognition. Concretely, we first employ four directed recurrent neural networks (RNNs) based on two spatial orientations to traverse electrode signals on two separate brain regions, which enables the model to obtain the deep representations of all the EEG electrodes’ signals while keeping the intrinsic spatial dependence. Then we design a pairwise subnetwork to capture the discrepancy information between two hemispheres and extract higher-level features for final classification. Besides, in order to reduce the domain shift between training and testing data, we use a domain disc rim in at or that adversarial ly induces the overall feature learning module to generate emotion-related but domain-invariant feature, which can further promote EEG emotion recognition. We conduct experiments on three public EEG emotional datasets, and the experiments show that the new state-of-the-art results can be achieved.

摘要—神经科学研究 [1] 揭示了人脑左右半球情绪表达的差异性。受此启发，本文提出一种新型双半球差异模型 (BiHDM)，通过捕捉两半球间非对称差异实现脑电图 (EEG) 情绪识别。具体而言，我们首先基于两个空间方位部署四个定向循环神经网络 (RNN)，分别遍历两个脑区的电极信号，使模型在保持固有空间依赖性的同时获取所有 EEG 电极信号的深层表征。随后设计配对子网络捕获两半球间差异信息，并提取更高层次特征用于最终分类。此外，为降低训练与测试数据间的域偏移，我们采用领域判别器对抗式引导整体特征学习模块生成与情绪相关但域不变的特征，从而进一步提升 EEG 情绪识别性能。在三个公开 EEG 情绪数据集上的实验表明，本方法能达到当前最优性能。

Index Terms—EEG emotion recognition, bi-hemispheric discrepancy, spatial-temporal network

索引术语—EEG情绪识别、双半球差异、时空网络

I. INTRODUCTION

I. 引言

Emotion, as a common mental phenomenon, is closely related to our daily life. Although it is easy to sense other people’s emotion in human-human interaction, it is still difficult for machines to understand the complicated emotions of human beings [2]. As the first step to make machines capture human emotions, emotion recognition has received substantial attention from human-machine-interaction (HMI) and pattern recognition research communities in recent years [3], [4], [5].

情感作为一种常见的心理现象，与我们的日常生活密切相关。尽管在人际交往中感知他人情绪很容易，但机器仍难以理解人类复杂的情感 [2]。作为让机器捕捉人类情感的第一步，情感识别近年来在人机交互 (HMI) 和模式识别研究领域受到了广泛关注 [3], [4], [5]。

Human emotional expressions are mostly based on verbal behavior methods (e.g., speech), and nonverbal behavior methods (e.g., facial expression). Thus, a large body of literature concentrates on learning the emotional components

人类情感表达主要基于言语行为方式（如语音）和非言语行为方式（如面部表情）。因此，大量文献集中于学习情感成分

Yang Li and Tengfei Song are with the Key Laboratory of Child Development and Learning Science (Ministry of Education), and the Department of Information Science and Engineering, Southeast University, Nanjing, Jiangsu, 210096, China. Wenming Zheng and Yuan Zong are with the Key Laboratory of Child Development and Learning Science (Ministry of Education), School of Biological Sciences and Medical Engineering, Southeast University, Nanjing, Jiangsu, 210096, China.(∗Corresponding author: Wenming Zheng (E-mail: wenming zheng@seu.edu.cn).) Lei Wang is with the School of Computing and Information Technology, University of Wollongong, NSW, 2500, Australia. Lei Qi is with the State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, 210096, China. Tong Zhang and Zhen Cui are with the School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, 210096, China.

杨立和宋腾飞任职于儿童发展与学习科学教育部重点实验室及东南大学信息科学与工程学院，江苏南京，210096。郑文明和宗元任职于儿童发展与学习科学教育部重点实验室及东南大学生物科学与医学工程学院，江苏南京，210096。（*通讯作者：郑文明（邮箱：wenming_zheng@seu.edu.cn）。王磊任职于伍伦贡大学计算与信息技术学院，新南威尔士州，2500，澳大利亚。齐磊任职于南京大学计算机软件新技术国家重点实验室，江苏南京，210096。张桐和崔振任职于南京理工大学计算机科学与工程学院，江苏南京，210096。

from speech and facial expression data. However, from the viewpoint of neuroscience, humans emotion originates from a variety of brain cortex regions, such as the orbital frontal cortex, ventral medial prefrontal cortex, and amygdala [6], which provides us a potential approach to decode emotion by recording the continuous human brain activity signals over these brain regions. For example, by placing the EEG electrodes on the scalp, we can record the neural activities in the brain, which can be used to recognize human emotions.

从语音和面部表情数据来看。然而，从神经科学的角度出发，人类的情绪源于多个大脑皮层区域，例如眶额皮层 (orbital frontal cortex)、腹内侧前额叶皮层 (ventral medial prefrontal cortex) 和杏仁核 (amygdala) [6]，这为我们提供了一种通过记录这些脑区连续的脑活动信号来解码情绪的潜在方法。例如，通过在头皮上放置脑电图 (EEG) 电极，我们可以记录大脑中的神经活动，进而用于识别人类情绪。

Most existing EEG emotion recognition methods focus on two fundamental challenges. One is how to extract discriminative features related to emotions. Typically, EEG features can be extracted from the time domain, frequency domain, and time-frequency domain. In [7], Jenke et al. evaluated all the existing features by using machine learning techniques on a self-recorded dataset. The other challenge is how to classify the features correctly. Many EEG emotion recognition models and methods have been proposed over the past years [8], [9]. For example, Zheng et al. [10] proposed a group sparse canonical correlation analysis method for simultaneous EEG channel selection and emotion recognition. Li et al. [11] fused the information propagation patterns and activation difference in the brain to improve the performance of emotional recognition. These techniques have shown excellent performance on some EEG emotional datasets.

现有的大多数脑电图(EEG)情绪识别方法主要关注两个基本挑战。一是如何提取与情绪相关的判别性特征。通常可以从时域、频域和时频域提取EEG特征。在[7]中，Jenke等人通过在自记录数据集上使用机器学习技术评估了所有现有特征。另一个挑战是如何正确分类这些特征。过去几年已经提出了许多EEG情绪识别模型和方法[8][9]。例如，Zheng等人[10]提出了一种群体稀疏典型相关分析方法，用于同时进行EEG通道选择和情绪识别。Li等人[11]融合了大脑中的信息传播模式和激活差异来提高情绪识别的性能。这些技术在一些EEG情绪数据集上表现出了优异的性能。

Recently, many researchers have attempted to consider the neuroscience findings of emotion as the prior knowledge to extract features or develop models, effectively enhancing the performance of EEG emotion recognition. For example, Hinrikus et al. [12] used EEG spectral asymmetry index for the depression detection. It is well realized through neuroscience study that although the anatomy of human brain looks like symmetric, the left and right hemispheres have different responses to emotions. For example, from the view of neuroscience, Dimond et al. [1], Davidson et al. [13], and Herrington et al. [14] have studied the asymmetry of emotion expression, and Schwartz et al. [15], Wager et al. [16], and Costanzo et al.[17] have discussed the emotion lateralization. Furthermore, the literature of EEG emotion recognition has seen the use of asymmetry to classify EEG emotional signal. Lin et al. [4] investigated the relationships between emotional states and brain activities, and extracted power spectrum density, differential asymmetry power, and rational asymmetry power as the features. Motivated by their previous findings of critical brain areas for emotion recognition, Zheng et al. [5] selected six symmetrical temporal lobe electrodes as the critical channels for EEG emotion recognition. Li et al. [18] separately extracted two brain hemispheric features and achieved the state-of-the-art classification performance. The above researches demonstrate that it is a promising and fruitful way to integrate the unique characteristics of EEG signal into the machine learning algorithms. It will be an interesting and meaningful topic of how to utilize this discrepancy property of two brain hemispheres to improve EEG emotion recognition.

近来，许多研究者尝试将情绪神经科学发现作为先验知识来提取特征或开发模型，有效提升了脑电(EEG)情绪识别的性能。例如Hinrikus等人[12]利用EEG频谱不对称指数进行抑郁症检测。神经科学研究已明确：虽然人脑解剖结构看似对称，但左右半球对情绪的反应存在差异。例如从神经科学视角，Dimond等人[1]、Davidson等人[13]和Herrington等人[14]研究了情绪表达的不对称性，Schwartz等人[15]、Wager等人[16]以及Costanzo等人[17]探讨了情绪偏侧化现象。此外，EEG情绪识别领域已有研究利用不对称性对脑电情绪信号进行分类。Lin等人[4]探究了情绪状态与大脑活动的关系，提取功率谱密度、差分不对称功率和理性不对称功率作为特征。基于先前发现的情绪识别关键脑区，Zheng等人[5]选取六个对称颞叶电极作为EEG情绪识别的关键通道。Li等人[18]分别提取两个大脑半球特征，实现了最先进的分类性能。上述研究表明，将EEG信号的独有特性融入机器学习算法是极具前景的研究方向。如何利用大脑两半球这种差异性特性来提升EEG情绪识别，将成为有趣且富有意义的课题。

Thus, in this paper, we propose a novel neural network model BiHDM to learn the bi-hemispheric discrepancy for EEG emotion recognition. BiHDM aims to obtain the deep discrepant features between the left and right hemispheres, which is expected to contain more disc rim i native information to recognize the EEG emotion signals. To achieve this goal, we need to solve two major problems, i.e., how to extract the features for each hemispheric EEG data and meanwhile measure the difference between them. Unlike other data structures such as skeletal action data, in which the position of each node varies with time, the EEG data consists of several electrodes that are set under the predefined coordinates on the scalp. Hence, to avoid losing this intrinsic graph structural information of EEG data, we can simplify the graph structure learning process by using the horizontal and vertical traversing RNNs, which will construct a complete relationship graph and generate disc rim i native deep features for all the EEG electrodes. After obtaining these deep features of each electrodes, we can extract the asymmetric discrepancy information between two hemispheres by performing specific pairwise operations for any paired symmetric electrodes. The concrete process is as follows:

因此，本文提出了一种新型神经网络模型BiHDM，用于学习脑电(EEG)情绪识别中的双半球差异特征。BiHDM旨在获取左右脑半球之间的深层差异特征，这些特征有望包含更多用于识别脑电情绪信号的判别性信息。为实现这一目标，我们需要解决两个主要问题：如何提取每个半球脑电数据的特征，同时衡量它们之间的差异。与骨骼动作数据等其他数据结构不同（其中每个节点的位置随时间变化），脑电数据由多个电极组成，这些电极按照预设坐标固定在头皮上。因此，为避免丢失脑电数据固有的图结构信息，我们可以通过使用水平和垂直遍历的RNN来简化图结构学习过程，这将构建完整的关联图并为所有脑电电极生成判别性深层特征。在获得每个电极的深层特征后，我们可以通过对任意成对的对称电极执行特定的配对操作，提取两个半球之间的不对称差异信息。具体过程如下：

(1) Firstly, we employ individual two RNN modules to separately scan all spatial electrodes’ data on the left and right hemispheres and generate deep feature representations for all the EEG electrodes. Herein, when the RNN module traverses the spatial regions, it will walk under two predefined stack strategies determined with respect to the horizontal and vertical direction streams;

(1) 首先，我们采用两个独立的RNN模块分别扫描左右半球所有空间电极的数据，并为所有EEG电极生成深度特征表示。在此过程中，当RNN模块遍历空间区域时，将按照水平方向和垂直方向流预先定义的两种堆叠策略进行移动；

appeared.

出现。

To the best of our knowledge, this is the first work to integrate the electrodes’ discrepancy relation on two hemispheres into deep learning models to improve EEG emotion recognition. The experimental results verify the discrimination and effectiveness of this differential information between the left and right hemispheres for EEG emotion recognition.

据我们所知，这是首个将两半球电极差异关系整合到深度学习模型中以提高脑电图（EEG）情绪识别的研究。实验结果验证了左右半球间这种差异信息对EEG情绪识别的区分性和有效性。

The remainder of this paper is organized as follows: In section II, we specify the method of BiHDM as well as its application to EEG emotion recognition. In section III, we conduct extensive experiments to evaluate the proposed method for EEG emotion recognition. In sections IV and V, we discuss the paper and conclude it.

本文的其余部分组织如下：在第 II 部分中，我们详细介绍了 BiHDM 方法及其在 EEG (脑电图) 情绪识别中的应用。第 III 部分通过大量实验评估所提出的 EEG 情绪识别方法。第 IV 和第 V 部分分别对本文进行讨论和总结。

II. THE PROPOSED MODEL FOR EEG EMOTION RECOGNITION

II. 基于脑电图 (EEG) 的情绪识别模型

A. The BiHDM model

A. BiHDM 模型

To specify the proposed method clearly, we illustrate the framework of the BiHDM model in Fig. 1. Its goal is to capture the asymmetric differential information between two hemispheres. We adopt three steps to achieve this goal. First, we obtain the deep representations of all the electrodes’ data. Subsequently, we characterize the relationship between the identified paired electrodes on two hemispheres, and generate a more disc rim i native and higher-level discrepancy feature for final classification. Third, we leverage a classifier and a disc rim in at or to corporate ly induce the above process to generate the emotion-related but domain-invariant features. The overall process is described as follows.

为明确阐述所提方法，我们在图1中展示了BiHDM模型的框架。其目标是捕捉两个半球间的不对称差异信息。我们通过三个步骤实现这一目标：首先获取所有电极数据的深层表征；随后刻画两半球配对电极间的关系，生成更具判别力的高层差异特征用于最终分类；最后通过分类器和判别器协同引导上述过程，生成与情绪相关但领域不变的特征。具体流程如下：

Obtaining the deep representation for each electrode: In BiHDM, we attempt to separately extract the EEG electrodes’ deep features on left and right brain hemispheres by using two independent RNN modules. To avoid losing the intrinsic graph structural information of EEG data, for each hemispheric EEG data, we build the RNN module traversing the spatial regions under two predefined stacks, which are determined with respect to horizontal and vertical directions. These two directional RNNs are actually complementary for simplifying the technology to construct a complete relationship graph of electrodes’ locations. Concretely, for an EEG sample $\mathbf{X}{t}$ , it $\mathbb{R}^{d\times N}$ , swplhiet raes $\mathbf{X}{t}^{l}$ $\mathbf{X}{t}=[\mathbf{X}{t}^{l},\mathbf{X}{t}^{r}]=[\mathbf{x}{1}^{l},\cdot\cdot\cdot,\mathbf{x}{\frac{N}{2}}^{l},\mathbf{x}{1}^{r},\cdot\cdot\cdot,\mathbf{x}{\frac{N}{2}}^{r}]\in$ $\mathbf{X}{t}^{r}$ on the left and right hemispheres, $d$ is the dimension of each EEG electrode’s data and $N$ is the number of electrodes. When modeling spatial dependencies, two graphs, i.e., $\mathtt{G}^{l}{=}{\mathtt{N}^{l},\mathtt{E}^{l}}$ and , are used to separately represent the electrodes’ spatial relations on the left and right hemispheres, where $\mathrm{N}^{l}={\mathbf{x}{i}^{l}}$ and $\mathbb{N}^{r}={\mathbf{x}{i}^{r}},(i=1,2,\cdot\cdot\cdot,\frac{N}{2})$ denote the electrode sets, while $\mathrm{E}^{l}={e_{i j}^{l}}$ and $\mathrm{E}^{r}={e_{i j}^{r}}$ represent the edges between spatially neighboring electrodes. Then we traverse through $\mathrm{G}^{l}$ and $\mathsf{G}^{r}$ separately with a predefined forward evolution sequence so that the input state and the previous states can be defined for an RNN unit. This process can be formulated as
获取每个电极的深度表征：在BiHDM中，我们尝试通过两个独立的RNN模块分别提取左右脑半球EEG电极的深层特征。为避免丢失EEG数据固有的图结构信息，针对每个半球的EEG数据，我们构建了按照水平和垂直方向两个预定义堆栈遍历空间区域的RNN模块。这两个方向性RNN在简化电极位置完整关系图构建技术方面实际互为补充。具体而言，对于EEG样本$\mathbf{X}{t}$，其$\mathbb{R}^{d\times N}$，其中$\mathbf{X}{t}^{l}$和$\mathbf{X}{t}^{r}$分别表示左右半球的信号，即$\mathbf{X}{t}=[\mathbf{X}{t}^{l},\mathbf{X}{t}^{r}]=[\mathbf{x}{1}^{l},\cdot\cdot\cdot,\mathbf{x}{\frac{N}{2}}^{l},\mathbf{x}{1}^{r},\cdot\cdot\cdot,\mathbf{x}{\frac{N}{2}}^{r}]\in$，$d$为每个EEG电极数据的维度，$N$为电极总数。建模空间依赖关系时，采用两个图$\mathtt{G}^{l}{=}{\mathtt{N}^{l},\mathtt{E}^{l}}$和分别表示左右半球电极空间关系，其中$\mathrm{N}^{l}={\mathbf{x}{i}^{l}}$和$\mathbb{N}^{r}={\mathbf{x}{i}^{r}},(i=1,2,\cdot\cdot\cdot,\frac{N}{2})$表示电极集合，$\mathrm{E}^{l}={e_{i j}^{l}}$和$\mathrm{E}^{r}={e_{i j}^{r}}$表示空间相邻电极间的边。随后我们按预定义的前向演化序列分别遍历$\mathrm{G}^{l}$和$\mathsf{G}^{r}$，从而为RNN单元定义输入状态和先前状态。该过程可表述为

$$
\begin{array}{r l}&{\mathbf{s}{i}^{l}=\sigma(\mathbf{U}^{l}\mathbf{x}{i}^{l}+\sum_{j=1}^{N/2}e_{i j}^{l}\mathbf{V}^{l}\mathbf{s}{j}^{l}+\mathbf{b}^{l})\in\mathbb{R}^{d_{l}\times1},}\ &{\mathbf{s}{i}^{r}=\sigma(\mathbf{U}^{r}\mathbf{x}{i}^{r}+\sum_{j=1}^{N/2}e_{i j}^{r}\mathbf{V}^{r}\mathbf{s}{j}^{r}+\mathbf{b}^{r})\in\mathbb{R}^{d_{r}\times1},}\end{array}
$$

Fig. 1: The framework of BiHDM. BiHDM consists of four RNN modules to capture each hemispheric EEG electrodes’ information from horizontal and vertical streams. Then all the electrodes’ data representations interact and construct the final vector for the classifier and disc rim in at or.

图 1: BiHDM框架。BiHDM由四个RNN模块组成，分别从水平和垂直流中捕捉每个半球EEG电极的信息。随后所有电极的数据表征进行交互，构建出用于分类器和判别器的最终向量。

where $\mathbf{s}{i}^{l},\mathbf{s}{i}^{r}$ and $d_{l},d_{r}$ are the hidden units and the dimensions of RNN modules on the left and right hemispheres, respectively; $\sigma(\cdot)$ denotes the nonlinear operation such as Sigmoid function; ${\mathbf{U}^{l}\in\mathbb{R}^{d_{l}\times d},\mathbf{V}^{l}\in\mathbb{R}^{d_{l}\times d_{l}},\mathbf{b}^{l}\in\mathbb{R}^{d_{l}\times1}}$ and ${{\bf U}^{r}\in\mathbb{R}^{d_{r}\times d}$ , $\mathbf{V}^{r}\in\mathbb{R}^{d_{r}\times d_{r}}$ , $\mathbf{b}^{r}\in\mathbb{R}^{d_{r}\times1}}$ are the learnable transformation matrices of the two hemispheric RNN modules; and $\mathcal{N}\left(\mathbf{x}{i}^{\cdot}\right)$ denotes the set of predecessors of the node $\mathbf{x}{i}^{\cdot}$ . Here $d_{l}=d_{r}$ . As the RNN modules traverse all the nodes in $\mathrm{N}^{l}$ and $\mathrm{N}^{r}$ , the obtained hidden states $\mathbf{s}{i}^{l}$ and $\mathbf{s}_{i}^{r}$ can be used as the deep features to represent the $i$ -th electrode’s data on two hemispheres.

其中 $\mathbf{s}{i}^{l},\mathbf{s}{i}^{r}$ 和 $d_{l},d_{r}$ 分别是左右半球RNN模块的隐藏单元和维度； $\sigma(\cdot)$ 表示非线性操作（如Sigmoid函数）； ${\mathbf{U}^{l}\in\mathbb{R}^{d_{l}\times d},\mathbf{V}^{l}\in\mathbb{R}^{d_{l}\times d_{l}},\mathbf{b}^{l}\in\mathbb{R}^{d_{l}\times1}}$ 和 ${{\bf U}^{r}\in\mathbb{R}^{d_{r}\times d}$ , $\mathbf{V}^{r}\in\mathbb{R}^{d_{r}\times d_{r}}$ , $\mathbf{b}^{r}\in\mathbb{R}^{d_{r}\times1}}$ 是两个半球RNN模块的可学习变换矩阵； $\mathcal{N}\left(\mathbf{x}{i}^{\cdot}\right)$ 表示节点 $\mathbf{x}{i}^{\cdot}$ 的前驱节点集合。此处 $d_{l}=d_{r}$ 。当RNN模块遍历 $\mathrm{N}^{l}$ 和 $\mathrm{N}^{r}$ 中所有节点时，获得的隐藏状态 $\mathbf{s}{i}^{l}$ 和 $\mathbf{s}_{i}^{r}$ 可作为表示第 $i$ 个电极在左右半球数据的深层特征。

Particularly, for the left and right hemispheric RNN modules, they traverse the spatial regions under two predefined horizontal and vertical stacks. Therefore, we will obtain two paired deep feature sets, i.e., $(\mathbf{S}{t}^{l h},\mathbf{S}{t}^{r h})$ and $(\mathbf{S}{t}^{l v},\mathbf{S}{t}^{r v})$ , where $\mathbf{\bar{S}}{t}^{l h}={\mathbf{s}{i}^{l h}}\in\mathbb{R}^{d_{l}\times(N/2)}$ and $\mathbf{S}{t}^{r h}={\mathbf{s}{i}^{r h}}\in\mathbb{R}^{d_{r}\times(N/2)}$ represent the left and right hemispheric electrodes’ deep features under horizontal direction, while $\mathbf{S}{t}^{l v}={\mathbf{s}{i}^{l v}}\in\mathbb{R}^{d_{l}\times\bar{(}N/2)}$ and $\mathbf{S}{t}^{r v}={\mathbf{s}{i}^{r v}}\in\mathbb{R}^{d_{r}\times(N/2)}$ represent the deep features under vertical direction. So far, we obtain the deep representation of each electrode, which has the emotional disc rim i native information and keeps the location structural relation.

特别地，左右半球RNN模块按照预设的水平堆叠和垂直堆叠两种方式遍历空间区域。因此，我们将获得两组配对深度特征集，即 $(\mathbf{S}{t}^{l h},\mathbf{S}{t}^{r h})$ 和 $(\mathbf{S}{t}^{l v},\mathbf{S}{t}^{r v})$。其中 $\mathbf{\bar{S}}{t}^{l h}={\mathbf{s}{i}^{l h}}\in\mathbb{R}^{d_{l}\times(N/2)}$ 与 $\mathbf{S}{t}^{r h}={\mathbf{s}{i}^{r h}}\in\mathbb{R}^{d_{r}\times(N/2)}$ 表示水平方向下左右半球电极的深度特征，而 $\mathbf{S}{t}^{l v}={\mathbf{s}{i}^{l v}}\in\mathbb{R}^{d_{l}\times\bar{(}N/2)}$ 和 $\mathbf{S}{t}^{r v}={\mathbf{s}{i}^{r v}}\in\mathbb{R}^{d_{r}\times(N/2)}$ 则对应垂直方向下的深度特征。至此，我们获得了每个电极的深度表征，这些表征既包含情感判别信息，又保持了位置结构关系。

Interaction between the paired electrodes on two hemispheres: After obtaining the deep features of every electrode above, i.e., $(\mathbf{S}{t}^{l h},\mathbf{S}{t}^{r h})$ and $(\mathbf{S}{t}^{l v},\mathbf{S}_{t}^{r v})$ , we perform a specific pairwise operation on the paired electrodes referring to the symmetric locations on the brain scalp to identify the asymmetric differential information between two hemispheres. This operation can be expressed as
两半球配对电极间的交互作用：在获取上述每个电极的深层特征后，即 $(\mathbf{S}{t}^{l h},\mathbf{S}{t}^{r h})$ 和 $(\mathbf{S}{t}^{l v},\mathbf{S}_{t}^{r v})$，我们对位于头皮对称位置的配对电极执行特定的成对操作，以识别两半球间的不对称差异信息。该操作可表示为

$$
\begin{array}{r l}{\hat{\mathbf{S}}{t}^{h}=\mathcal{F}(\mathbf{S}{t}^{l h},\mathbf{S}{t}^{r h})=\mathcal{F}({\mathbf{s}{i}^{l h}},{\mathbf{s}{i}^{r h}})\in\mathbb{R}^{d_{p}\times(N/2)},}\ {\hat{\mathbf{S}}{t}^{v}=\mathcal{F}(\mathbf{S}{t}^{l v},\mathbf{S}{t}^{r v})=\mathcal{F}({\mathbf{s}{i}^{l v}},{\mathbf{s}{i}^{r v}})\in\mathbb{R}^{d_{p}\times(N/2)},}\end{array}
$$

where $\hat{\mathbf{S}}{t}^{h}{=}{\hat{\mathbf{s}}{i}^{h}}$ and $\hat{\mathbf{S}}{t}^{v}={\hat{\mathbf{s}}{i}^{v}}$ are the deep asymmetric differential features, $\mathcal{F}(\cdot)$ denotes the pairwise operation between any two paired electrodes’ data representations.

其中 $\hat{\mathbf{S}}{t}^{h}{=}{\hat{\mathbf{s}}{i}^{h}}$ 和 $\hat{\mathbf{S}}{t}^{v}={\hat{\mathbf{s}}{i}^{v}}$ 是深度非对称差分特征，$\mathcal{F}(\cdot)$ 表示任意两个配对电极数据表示之间的成对操作。

where $d_{p_{1}}=d_{p_{2}}=d_{l}$ and $d_{p_{3}}=1$ .1

其中 $d_{p_{1}}=d_{p_{2}}=d_{l}$ 且 $d_{p_{3}}=1$。

To further capture the higher-level discrepancy discriminative features, we utilize another RNN module that performs on the obtained differential asymmetric features ${\hat{\mathbf{s}}{i}^{h}}$ and from the horizontal and vertical streams. Formally, the operations on them can be written as

为了进一步捕捉更高层次的差异判别特征，我们采用了另一个RNN模块来处理从水平和垂直流中获得的差分非对称特征${\hat{\mathbf{s}}{i}^{h}}$。形式上，对这些特征的操作可以表示为

$$
\begin{array}{r l}{\tilde{\mathbf{s}}{i}^{h}=\sigma(\mathbf{U}^{h}\hat{\mathbf{s}}{i}^{h}+\mathbf{V}^{h}\tilde{\mathbf{s}}{i-1}^{h}+\mathbf{b}^{h})\in\mathbb{R}^{d_{g}\times1},}\ {\tilde{\mathbf{s}}{i}^{v}=\sigma(\mathbf{U}^{v}\hat{\mathbf{s}}{i}^{v}+\mathbf{V}^{v}\tilde{\mathbf{s}}{i-1}^{v}+\mathbf{b}^{v})\in\mathbb{R}^{d_{g}\times1},}\end{array}
$$

where ${\mathbf{U}^{h}\in~\mathbb{R}^{d_{g}\times d_{p}}$ , $\mathbf{V}^{h}\in\mathbb{R}^{d_{g}\times d_{g}}$ , $\mathbf{b}^{h}\in\mathbb{R}^{d_{g}\times1}}$ and ${\mathbf{U}^{v}\in\mathbb{R}^{d_{g}\times d_{p}}$ , ${\bf V}^{v}\in\mathbb{R}^{d_{g}\times d_{g}}$ , $\mathbf{b}^{v}\in\mathbb{R}^{d_{g}\times1}}$ are the learnable parameter matrices, and $d_{g}$ is the hidden unit’s dimension of the high-level RNN module. Moreover, to automatically detect the salient information related to emotion among these paired differential features, projection matrices are applied to the higher-level discrepancy disc rim i native features ${\tilde{\mathbf{s}}{i}^{h}}$ and ${\tilde{\mathbf{s}}{i}^{v}}$ obtained by Eq. (7) and (8). Denoting the projection matrices for the horizontal and vertical traversing directions by $\mathbf{W}^{h}=[w_{i k}^{h}]{(N/2)\times K}$ and $\mathbf{W}^{v}=[w_{i k}^{v}]_{(N/2)\times K}$ , the projection can be written as

其中 $\mathbf{V}^{h}\in\mathbb{R}^{d_{g}\times d_{g}}$ , $\mathbf{b}^{h}\in\mathbb{R}^{d_{g}\times1}$ 和 ${\mathbf{U}^{v}\in\mathbb{R}^{d_{g}\times d_{p}}$ , ${\bf V}^{v}\in\mathbb{R}^{d_{g}\times d_{g}}$ , $\mathbf{b}^{v}\in\mathbb{R}^{d_{g}\times1}}$ 是可学习的参数矩阵，$d_{g}$ 为高层RNN模块的隐藏单元维度。此外，为自动检测这些配对差分特征中与情感相关的显著信息，需对式(7)和(8)获得的高层差异判别特征 ${\tilde{\mathbf{s}}{i}^{h}}$ 和 ${\tilde{\mathbf{s}}{i}^{v}}$ 应用投影矩阵。设水平和垂直遍历方向的投影矩阵分别为 $\mathbf{W}^{h}=[w_{i k}^{h}]{(N/2)\times K}$ 和 $\mathbf{W}^{v}=[w_{i k}^{v}]_{(N/2)\times K}$ ，投影运算可表示为

$$
\begin{array}{r l}&{\bar{\mathbf{s}}{k}^{h}=\sigma(\sum_{i=1}^{N/2}w_{i k}^{h}\tilde{\mathbf{s}}{i}^{h}+\hat{\mathbf{b}}^{h})\in\mathbb{R}^{d_{g}\times1},k=1,2,\cdots,K,}\ &{\bar{\mathbf{s}}{k}^{v}=\sigma(\sum_{i=1}^{N/2}w_{i k}^{v}\tilde{\mathbf{s}}{i}^{v}+\hat{\mathbf{b}}^{v})\in\mathbb{R}^{d_{g}\times1},k=1,2,\cdots,K.}\end{array}
$$

Finally, we use two learnable mapping matrices $\mathbf{G}^{h}\in$ $\mathbb{R}^{d_{o}\times d_{g}}$ and $\mathbf{G}^{v}\in\mathbb{R}^{d_{o}\times d_{g}}$ to summarize the stimulus $\bar{\mathbf{S}}{t}^{h}=$ ${\bar{\mathbf{s}}{k}^{h}}\in\mathbb{R}^{d_{g}\times K}$ and $\bar{\mathbf{S}}{t}^{v}={\bar{\mathbf{s}}{k}^{v}}\in\mathbb{R}^{d_{g}\times K}$ from two directional streams, namely,

最后，我们使用两个可学习的映射矩阵 $\mathbf{G}^{h}\in$ $\mathbb{R}^{d_{o}\times d_{g}}$ 和 $\mathbf{G}^{v}\in\mathbb{R}^{d_{o}\times d_{g}}$ 来汇总来自两个方向流的刺激 $\bar{\mathbf{S}}{t}^{h}=$ ${\bar{\mathbf{s}}{k}^{h}}\in\mathbb{R}^{d_{g}\times K}$ 和 $\bar{\mathbf{S}}{t}^{v}={\bar{\mathbf{s}}{k}^{v}}\in\mathbb{R}^{d_{g}\times K}$，即

$$
\mathbf{S}{t}^{h v}=\mathbf{G}^{h}\bar{\mathbf{S}}{t}^{h}+\mathbf{G}^{v}\bar{\mathbf{S}}{t}^{v}\in\mathbb{R}^{d_{o}\times K}.
$$

Until now, for an input EEG sample $\mathbf{X}{t}$ , the output feature $\mathbf{S}_{t}^{h v}$ is obtained.

对于输入的EEG样本$\mathbf{X}{t}$，现已获得输出特征$\mathbf{S}_{t}^{h v}$。

Disc rim i native prediction and domain adversarial strategy: Like most supervised models, we add a supervision term into the network by applying the softmax function to the output feature $\mathbf{S}{t}^{h v}{=}{\mathbf{s}_{k}^{h v}}$ , $\left(k=1,\cdots,K\right)$ to predict the class label.
判别式预测与领域对抗策略：与大多数监督模型类似，我们在网络中添加监督项，通过对输出特征 $\mathbf{S}{t}^{h v}{=}{\mathbf{s}_{k}^{h v}}$ （$k=1,\cdots,K$）应用 softmax 函数来预测类别标签。

Let $\mathbf{o}=[(\bar{\mathbf{s}}{1}^{h v})^{\mathrm{T}},(\mathbf{s}{2}^{h v})^{\mathrm{T}},\cdot\cdot\cdot,(\mathbf{s}{K}^{h v})^{\mathrm{T}}]\in\mathbb{R}^{1\times K d_{o}}$ denotes the output feature vector, then

令 $\mathbf{o}=[(\bar{\mathbf{s}}{1}^{h v})^{\mathrm{T}},(\mathbf{s}{2}^{h v})^{\mathrm{T}},\cdot\cdot\cdot,(\mathbf{s}{K}^{h v})^{\mathrm{T}}]\in\mathbb{R}^{1\times K d_{o}}$ 表示输出特征向量，则

$$
\mathbf{y}=\mathbf{o}\mathbf{P}+\mathbf{b}^{c}={y_{1},y_{2},\cdot\cdot\cdot~,y_{C}}\in\mathbb{R}^{1\times C},
$$

where $\mathbf{P}\in\mathbb{R}^{K d_{o}\times C}$ and $\mathbf{b}^{c}\in\mathbb{R}^{1\times C}$ are the transform matrices, and $C$ is the number of emotion types.

其中 $\mathbf{P}\in\mathbb{R}^{K d_{o}\times C}$ 和 $\mathbf{b}^{c}\in\mathbb{R}^{1\times C}$ 是变换矩阵，$C$ 为情感类型数量。

Finally, the output vector of BiHDM is fed into the softmax layer for emotion classification, which can be written as

最后，BiHDM的输出向量被输入到softmax层进行情感分类，可表示为

$$
P(c|\mathbf{X}{t})=\exp(y_{c})/\sum_{i=1}^{C}\exp(y_{i}),
$$

where $P(c|\mathbf{X}{t})$ denotes the predicted probability that the input sample $\mathbf{X}{t}$ belongs to the $c$ -th class. As a result, the label $\tilde{l}{t}$ of sample $\mathbf{X}_{t}$ is predicted as

其中 $P(c|\mathbf{X}{t})$ 表示输入样本 $\mathbf{X}{t}$ 属于第 $c$ 类的预测概率。因此，样本 $\mathbf{X}{t}$ 的标签 $\tilde{l}_{t}$ 被预测为

$$
\tilde{l}{t}=a r g\operatorname*{max}{c}P(c|\mathbf{X}_{t}).
$$

Here $\theta_{f}$ and $\theta_{c}$ denote the learnable parameters of the feature extraction module and the classifier, while $l_{t}$ and $M_{1}$ are the ground-truth label of sample $\mathbf{X}_{t}$ and the number of training samples. By minimizing the above loss function, discriminative features could be extracted for emotion recognition.

这里 $\theta_{f}$ 和 $\theta_{c}$ 分别表示特征提取模块和分类器的可学习参数，而 $l_{t}$ 和 $M_{1}$ 分别是样本 $\mathbf{X}_{t}$ 的真实标签和训练样本数量。通过最小化上述损失函数，可以提取出用于情绪识别的判别性特征。

To align the feature distributions between source and target domains, we adopt the domain adversarial strategy by adding a disc rim in at or into the network. It works cooperatively with the classifier to induce the feature extraction process to generate emotion-distinguishable but domain-invariant features.

为了对齐源域和目标域之间的特征分布，我们采用领域对抗策略，在特征提取网络中引入判别器 (discriminator) 。该判别器与分类器协同工作，引导特征提取过程生成具有情感区分性但领域不变的特征。

Concretely, we predefine the source domain label set $D_{S}=$ ${0,0,\cdots,0}\in\mathbb{Z}^{M_{1}\times1}$ and target domain label set $D_{T}= $ ${1,1,\cdots,1}\in\mathbb{Z}^{M_{2}\times1}$ , where $M_{2}$ is the number of testing samples. Then through maximizing the loss function of the disc rim in at or, which can be denoted as

具体而言，我们预定义源域标签集 $D_{S}=$ ${0,0,\cdots,0}\in\mathbb{Z}^{M_{1}\times1}$ 和目标域标签集 $D_{T}=$ ${1,1,\cdots,1}\in\mathbb{Z}^{M_{2}\times1}$ ，其中 $M_{2}$ 为测试样本数量。随后通过最大化判别器 (discriminator) 的损失函数来实现，该函数可表示为

$$
\begin{array}{r l r}{\lefteqn{L_{d}(\mathbf{X}{t}^{S},\mathbf{X}{t^{\prime}}^{T};\boldsymbol{\theta}{f},\boldsymbol{\theta}{d})}}\ {=-\sum_{k=1}^{M_{1}}\log P(0|\mathbf{X}{t}^{S})-\sum_{k^{\prime}=1}^{M_{2}}\log P(1|\mathbf{X}_{t^{\prime}}^{T}),}\end{array}
$$

the feature extraction process expects to have the ability to generate the data representation to confuse the disc rim in at or to distinguish which domain the input comes from (i.e., the domain-invariant features). Here $\mathbf{X}{t}^{S}$ and $\mathbf{X}{t^{\prime}}^{T}$ denote the $t\cdot$ -th and $t^{\prime}$ -th sample in the source and target data set respectively, and $\theta_{d}$ represents the learnable parameter of disc rim in at or.

特征提取过程期望具备生成数据表征的能力，以混淆判别器 (discriminator) 或区分输入来自哪个域 (即域不变特征)。这里 $\mathbf{X}{t}^{S}$ 和 $\mathbf{X}{t^{\prime}}^{T}$ 分别表示源数据集和目标数据集中的第 $t$ 个和第 $t^{\prime}$ 个样本，$\theta_{d}$ 代表判别器的可学习参数。

B. The optimization of BiHDM

B. BiHDM 的优化

The overall optimization of BiHDM can be expressed as

BiHDM的整体优化可表示为

$$
\begin{array}{r l}{\operatorname*{min}L(\mathbf{X};\boldsymbol{\theta}{f},\boldsymbol{\theta}{c},\boldsymbol{\theta}{d})}\ {\quad\quad=\operatorname*{min}L_{c}(\mathbf{X}^{S};\boldsymbol{\theta}{f},\boldsymbol{\theta}{c})+\operatorname*{max}L_{d}(\mathbf{X}^{S},\mathbf{X}^{T};\boldsymbol{\theta}{f},\boldsymbol{\theta}_{d}),}\end{array}
$$

where $L(\cdot)$ is the loss function of the overall model, and $\mathbf{X}$ denotes the entire data set that consists of the source data set $\mathbf{X}^{S}$ and target data set XT , i.e., X = [XS, XT ] ∈ Rd×N×(M1+M2).

其中 $L(\cdot)$ 是整个模型的损失函数，$\mathbf{X}$ 表示由源数据集 $\mathbf{X}^{S}$ 和目标数据集 XT 组成的完整数据集，即 X = [XS, XT ] ∈ Rd×N×(M1+M2)。

This max-minimizing loss function will force the parameters of feature extraction module to generate emotion-related but domain-invariant data representation, which benefits for EEG emotion recognition because of the tremendous data distribution shift for EEG emotional signal, especially in the case of subject-independent task where the source and target data come from different subjects.

这种最大化最小化的损失函数会迫使特征提取模块的参数生成与情绪相关但领域不变的数据表示，这有助于脑电图情绪识别，因为脑电图情绪信号存在巨大的数据分布偏移，尤其是在源数据和目标数据来自不同受试者的跨被试任务情况下。

Specifically, the maximizing problem can be transferred to a minimizing problem by using a gradient reversal layer (GRL) [19] before the disc rim in at or, which can be optimized by using stochastic gradient descent (SGD) algorithm [20] easily. GRL acts as an identity transform in the forwardpropagation but reverses the gradient sign while performing the back-propagation operation. The overall optimization process follows the rules below

具体而言，最大化问题可通过在判别器前加入梯度反转层 (GRL) [19] 转换为最小化问题，这能方便地通过随机梯度下降 (SGD) 算法 [20] 进行优化。GRL 在前向传播中保持恒等变换，但在反向传播时反转梯度符号。整体优化过程遵循以下规则：

$$
\begin{array}{l}{{\displaystyle{\theta_{c}\leftarrow\theta_{c}-\alpha\frac{\partial{\cal L}{c}}{\partial\theta_{c}}},~\theta_{d}\leftarrow\theta_{d}-\alpha\frac{\partial{\cal L}{d}}{\partial\theta_{d}}},}\ {{\displaystyle{\theta_{f}\leftarrow\theta_{f}-\alpha(\frac{\partial{\cal L}{c}}{\partial\theta_{f}}-\frac{\partial{\cal L}{d}}{\partial\theta_{f}})},}}\end{array}
$$

where $\alpha$ is the learning rate. In this way, we can iterative ly train the classifier and the disc rim in at or to update the parameters with the similar approach of standard deep learning methods by chain rule.

其中 $\alpha$ 是学习率。通过这种方式，我们可以迭代训练分类器和判别器，并使用链式法则以类似标准深度学习方法的方式更新参数。

III. EXPERIMENTS

III. 实验

A. Setting up

A. 设置

To evaluate the proposed BiHDM model, in this section, we will conduct experiments on three public EEG emotional datasets. All the three datasets were collected when the participants sat in front of a monitor comfortably and watched emotional video clips. The EEG signals are recorded from 62 electrode channels using ESI NeuroScan with a sampling rate of $1000~\mathrm{Hz}$ . The locations of electrodes are on the basis of the international 10-20 system. Thus in the experiment, we perform the pairwise operation on the 31 paired electrodes based on the symmetric locations on the left and right brain hemispheric scalps. The detailed information of these datasets are described as follows:

为评估所提出的BiHDM模型，本节将在三个公开的脑电图(EEG)情感数据集上进行实验。所有数据集均在参与者舒适地坐在显示器前观看情感视频片段时采集。EEG信号通过ESI NeuroScan系统从62个电极通道记录，采样率为$1000~\mathrm{Hz}$。电极位置基于国际10-20系统布置。因此实验中，我们根据左右脑半球头皮的对称位置，对31组成对电极执行配对操作。各数据集详细信息如下：

(1) SEED [21]. SEED dataset contains 15 subjects, and each subject has three sessions. During the experiment, the participants watched three kinds of emotional film clips, i,e, happy, neutral and sad, where each emotion has 5 film clips. Consequently, there are totally 15 trails, and each trail has 185-238 samples for one session of each subject. Then there are totally about 3400 samples in one session;

(1) SEED [21]。SEED数据集包含15名受试者，每位受试者进行三次实验。实验中，参与者观看三种情绪类型的电影片段（快乐、中性、悲伤），每种情绪包含5段影片。每次实验共进行15轮 trials，每位受试者单次实验的每个trial包含185-238个样本，因此单次实验总样本量约为3400个；

(2) SEED $\mathbf{IV}^{2}$ [5]. SEED-IV dataset also contains 15 subjects, and each subject has three sessions. But it includes four emotion types with the extra emotion fear compared with SEED, and each emotion has 6 film clips. Thus there are totally 24 trails, and each trail has 12-64 samples for one session of each subject. Then there are totally about 830 samples in one session;

(2) SEED $\mathbf{IV}^{2}$ [5]。SEED-IV数据集同样包含15名被试对象，每位被试进行三次实验。与SEED相比，该数据集新增恐惧情绪，共涵盖四种情绪类型，每种情绪包含6段电影片段。因此每个实验会话包含24条轨迹，每条轨迹对应单名被试单次会话的12-64个样本。单个会话总样本量约为830个；

(3) $\mathbf{MPED^{2}}$ [22]. MPED dataset contains 30 subjects and each subject has one session. It includes seven refined emotion types, i.e., joy, funny, neutral, sad, fear, disgust and anger, and each emotion has 4 film clips. There are totally 28 trails, and each trail has 120 samples. There are totally 3360 samples in one subject.

(3) $\mathbf{MPED^{2}}$ [22]。MPED数据集包含30名受试者，每位受试者进行一次实验。该数据集包含七种精细化情绪类型，即喜悦、滑稽、中性、悲伤、恐惧、厌恶和愤怒，每种情绪对应4段影片片段。每名受试者共进行28次试验，每次试验包含120个样本，单个受试者样本总量为3360个。

To evaluate the proposed BiHDM model adequately, we design two kinds of experiments including the subject-dependent and subject-independent ones. We use the released handcrafted features, i.e., the differential entropy (DE) in SEED and SEEDIV, and the Short-Time Fourier Transform (STFT) in MPED, as the input to feed our model. Thus the sizes $d\times N$ of the input sample $\mathbf{X}{t}$ are $5\times62$ , $5\times62$ and $1\times62$ for these three datasets, respectively. Moreover, in the experiment, we respectively set the dimension $d_{l}$ of each electrode’s deep representation to 32; the parameters $d_{g}$ and $K$ of the global high-level feature to 32 and 6; and the dimension $d_{o}$ of the output feature to 16 without elaborate traversal. Specifically, we implemented BiHDM using TensorFlow on one Nvidia 1080Ti GPU. The learning rate, momentum and weight decay rate are set as 0.003, 0.9 and 0.95 respectively. The network is trained using SGD with batch size of 200. In addition, we adopt the subtraction as the pairwise operation of the BiHDM model in the experiment section, and discuss the other two types of operations in section III-D.

为充分评估提出的BiHDM模型，我们设计了两类实验：被试相关(subject-dependent)和被试无关(subject-independent)实验。采用公开的手工特征作为模型输入：SEED和SEEDIV数据集使用差分熵(DE)，MPED数据集使用短时傅里叶变换(STFT)。因此输入样本$\mathbf{X}{t}$的尺寸$d\times N$在这三个数据集中分别为$5\times62$、$5\times62$和$1\times62$。实验中将每个电极的深层表示维度$d_{l}$设为32，全局高层特征的参数$d_{g}$和$K$分别设为32和6，输出特征维度$d_{o}$设为16（未进行精细遍历）。具体实现采用TensorFlow框架和Nvidia 1080Ti GPU，学习率、动量和权重衰减率分别设置为0.003、0.9和0.95，使用批量为200的随机梯度下降(SGD)进行训练。实验部分采用减法作为BiHDM模型的成对运算，其他两种运算方式将在第III-D节讨论。

B. The EEG emotion recognition experiments

B. EEG情绪识别实验

The subject-dependent experiment: In this experiment, we adopt the same protocols as [21], [5] and [22]. Namely, for SEED, we use the former nine trails of EEG data per session of each subject as source (training) domain data while using the remaining six trials per session as target (testing) domain data; for SEED-IV, we use the first sixteen trials per session of each subject as the training data, and the last eight trials containing all emotions (each emotion with two trials) as the testing data; for MPED, we use twenty-one trials of

[论文翻译]一种新型双半球差异模型用于脑电情绪识别

原文地址：https://arxiv.org/pdf/1906.01704v1