[论文翻译]基于深度神经网络的脑电情绪识别关键频段与通道研究


原文地址:https://bcmi.sjtu.edu.cn/home/zhengweilong/pdf/TAMD2014_zwl_final_Submitted.pdf


Investigating Critical Frequency Bands and Channels for EEG-based Emotion Recognition with Deep Neural Networks

基于深度神经网络的脑电情绪识别关键频段与通道研究

Abstract—To investigate critical frequency bands and channels, this paper introduces deep belief networks (DBNs) to constructing EEG-based emotion recognition models for three emotions: positive, neutral and negative. We develop an EEG dataset acquired from 15 subjects. Each subject performs the experiments twice at the interval of a few days. DBNs are trained with differential entropy features extracted from multichannel EEG data. We examine the weights of the trained DBNs and investigate the critical frequency bands and channels. Four different profiles of 4, 6, 9 and 12 channels are selected. The recognition accuracies of these four profiles are relatively stable with the best accuracy of $86.65%$ , which is even better than that of the original 62 channels. The critical frequency bands and channels determined by using the weights of trained DBNs are consistent with the existing observations. In addition, our experiment results show that neural signatures associated with different emotions do exist and they share common ali ty across sessions and individuals. We compare the performance of deep models with shallow models. The average accuracies of DBN, SVM, LR and KNN are $86.08%$ , $83.99%$ , $82.70%$ and $72.60%$ , respectively.

摘要—为探究关键频段与通道,本文采用深度信念网络 (DBNs) 构建基于脑电图 (EEG) 的三种情绪识别模型(积极、中性、消极)。我们建立了来自15名被试的EEG数据集,每位被试间隔数日进行两次实验。DBNs训练采用多通道EEG数据提取的微分熵特征,通过分析训练后DBNs的权重来研究关键频段与通道。实验选取了4、6、9和12通道的四种配置方案,识别准确率保持相对稳定(最高达86.65%),甚至优于原始62通道方案。基于DBNs权重确定的关键频段与通道与现有研究发现一致。实验结果证实不同情绪确实存在对应的神经特征标记,且这些标记在不同实验阶段和个体间具有共同特性。深度模型与浅层模型对比显示:DBN、SVM、LR和KNN的平均准确率分别为86.08%、83.99%、82.70%和72.60%。

Index Terms—Affective computing, emotion recognition, EEG, deep belief networks.

索引关键词—情感计算 (Affective computing),情绪识别 (emotion recognition),脑电图 (EEG),深度信念网络 (deep belief networks)。

I. INTRODUCTION

I. 引言

E cM oOm TpI a sO sNes r er sees aera cr chh i si na nc ion m te prud ties rc ips lc ii nea nrc ye ,fi eplsdy cthhaotl oegny-, neuroscience, and cognitive science. For neuroscience, researchers aim to find out the neural circuits and brain mechanisms of emotion processing. For psychology, there exist many basic theories of emotion from different researchers and it is important to build up computational models of emotion. For computer science, we focus on developing practical applications such as estimation of task workload [1] and driving fatigue detection [2].

情绪计算 (Emotion Computing) 是一个跨学科研究领域,涉及心理学 (psychology)、神经科学 (neuroscience) 和认知科学 (cognitive science)。在神经科学方面,研究人员致力于揭示情绪处理的神经回路和大脑机制。心理学领域存在许多来自不同学者的基础情绪理论,建立情绪的计算模型至关重要。对于计算机科学而言,我们专注于开发实际应用,例如任务工作量评估 [1] 和驾驶疲劳检测 [2]。

In multimedia context analysis, for example, there is a large sematic gap between the high-level cognition in the human brain and the low-level features in raw digit data. As

在多媒体内容分析中,例如人类大脑的高级认知与原始数字数据的低级特征之间存在巨大的语义鸿沟。

This work was supported in part by the grants from the National Natural Science Foundation of China (Grant No. 61272248), the National Basic Research Program of China (Grant No. 2013 CB 329401), the Science and Technology Commission of Shanghai Municipality (Grant No.13511500200), the Open Funding Project of National Key Laboratory of Human Factors Engineering (Grant No. HF2012-K-01), and the European Union Seventh Framework Program (Grant No.247619).

本工作部分受到以下项目资助:国家自然科学基金 (Grant No. 61272248)、国家重点基础研究发展计划 (Grant No. 2013 CB 329401)、上海市科学技术委员会 (Grant No.13511500200)、人因工程国家重点实验室开放基金 (Grant No. HF2012-K-01),以及欧盟第七框架计划 (Grant No.247619)。

Wei-Long Zheng and Bao-Liang Lu are with the Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University and the Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai 200240, China. (e-mail: wei long live@gmail.com, bllu@sjtu.edu.cn).

Wei-Long Zheng和Bao-Liang Lu就职于上海交通大学计算机科学与工程系类脑计算与机器智能中心,以及上海交通大学智能交互与认知工程上海市重点实验室,地址:中国上海市东川路800号,邮编200240 (e-mail: wei long live@gmail.com, bllu@sjtu.edu.cn)。

∗Corresponding author

*通讯作者

the emerging big data of social media, it is difficult to tag the contents reliably, especially for affective factors, which are hard to describe across different cultures and language backgrounds. So it is necessary to build an emotion model to automatically recognize the affective tags implicitly [3]. The field of Affective Computing (AC) aspires to narrow the communicative gap between the highly emotional human and the emotionally challenged computer by developing computational systems that recognize and respond to human emotions [4]. The detection and modeling of human emotions are the primary studies of affective computing using pattern recognition and machine learning techniques. Although affective computing has achieved rapid development in recent years, there are still many open problems to be solved [5], [6].

社交媒体大数据的兴起使得内容难以被可靠地标记,尤其是情感因素这类难以跨文化和语言背景描述的元素。因此,有必要构建一个情感模型来自动识别隐含的情感标签 [3]。情感计算 (Affective Computing, AC) 领域旨在通过开发能识别并响应人类情感的计算系统,缩小高情感化人类与情感能力欠缺的计算机之间的沟通鸿沟 [4]。人类情感的检测与建模是情感计算利用模式识别和机器学习技术开展的核心研究。尽管近年来情感计算发展迅速,仍存在许多待解决的开放性问题 [5] [6]。

Among various approaches to emotion recognition, the method based on electroencephalograph y (EEG) signals is more reliable because of its high accuracy and objective evaluation in comparison with other external appearance clues like facial expression and gesture [7]. To deeply understand the brain response under different emotional states can fundamentally advance the computational models for emotion recognition. Various psycho physiology studies have demonstrated the correlations between human emotions and EEG signals [8]–[10]. Moreover, with the quick development of wearable devices and dry electrode techniques [11]–[14], it is now possible to implement EEG-based emotion recognition from laboratories to real-world applications, such as driving fatigue detection and mental state monitoring [15]–[19].

在多种情绪识别方法中,基于脑电图 (EEG) 信号的方法因其高准确性和客观评估特性,相比面部表情、手势等外部表现线索更为可靠 [7]。深入理解不同情绪状态下的大脑反应,能够从根本上推进情绪识别计算模型的发展。多项心理生理学研究已证实人类情绪与脑电信号之间的关联性 [8]–[10]。此外,随着可穿戴设备与干电极技术的快速发展 [11]–[14],基于脑电的情绪识别技术正从实验室走向实际应用场景,如驾驶疲劳检测与精神状态监测 [15]–[19]。

However, EEG signals have low Signal to Noise Ratio (SNR) and are often mixed with much noise when collected. The more challenge problem is that, unlike image or speech signals, EEG signals are temporal asymmetry and nonstationary [20]. So to analyze EEG signals is a hard task. Traditional manual feature extraction and feature selection for EEG are crucial to affective modeling and require specified domain knowledge. The popular feature selection methods for EEG signal analysis are principal component analysis (PCA) and Fisher projection. In general, the cost of these traditional feature selection methods increases quadratically with respect to the number of features considered [21]. What’s more, these methods cannot preserve the original domain information such as channels and frequency bands that are very important for understanding brain response. Recent developing deep learning techniques in machine learning community allow automatic feature extraction and feature selection and can eliminate the limitation of hand-crafted features [5]. Deep learning allows automatically feature selection at the same time with training classification models by bypassing the computational cost in feature selection phase.

然而,脑电图 (EEG) 信号的信噪比 (SNR) 较低,采集时往往混杂大量噪声。更具挑战性的是,与图像或语音信号不同,EEG信号具有时间不对称性和非平稳性 [20],因此分析EEG信号是一项艰巨任务。传统EEG人工特征提取与选择对情感建模至关重要,且需要特定领域知识。EEG信号分析中常用的特征选择方法包括主成分分析 (PCA) 和Fisher投影。通常,这些传统特征选择方法的计算成本会随特征数量呈平方级增长 [21]。更重要的是,这些方法无法保留通道、频段等对理解大脑反应至关重要的原始域信息。机器学习领域最新发展的深度学习技术能实现自动特征提取与选择,突破手工设计特征的局限 [5]。深度学习通过在训练分类模型时同步完成特征选择,规避了特征选择阶段的计算成本。

In the past few years, researchers focused on finding the critical frequency bands and channels for EEG-based emotion recognition with different methods. Li and Lu [22] proposed a frequency band searching method to choose an optimal band for emotion recognition and their results showed that the gamma band (roughly $30{\cdot}100~\mathrm{Hz}$ ) is suitable for EEG-based emotion classification with emotional still images as stimuli. It is also interesting that what would be good positions to place electrodes for emotion recognition when using only few electrodes. Bos [23] chose the following montage: $F p z,$ /right mastoid for arousal recognition, $F3/F4$ for valence recognition, and left mastoid as ground. Her results indicated that $F3$ and $F4$ are the most suitable electrode positions to detect emotional valence. Combining the existing results, Valenzi [24] obtained a pool of eight electrodes: $A F3,A F4,F3,F4.$ $F7,F8,T7$ and $T8$ and achieved an average classification rate of $87.5%$ with these eight electrodes. However, how to select the critical channels and frequency bands and how to evaluate selected pools of electrodes have not been fully investigated yet.

过去几年,研究人员致力于通过不同方法寻找基于脑电图(EEG)情绪识别的关键频段与通道。Li和Lu[22]提出一种频段搜索方法选择情绪识别最优频段,其研究表明伽马波段(约$30{\cdot}100~\mathrm{Hz}$)适用于以静态情绪图片为刺激的EEG情绪分类。另一个有趣的问题是:当仅使用少量电极时,哪些位置最适合情绪识别?Bos[23]采用以下导联组合:$Fpz$/右乳突用于唤醒度识别,$F3/F4$用于效价识别,左乳突作为接地电极。其结果表明$F3$和$F4$是检测情绪效价最合适的电极位置。Valenzi[24]综合现有研究成果,筛选出8个电极组合:$AF3,AF4,F3,F4,F7,F8,T7$和$T8$,使用这些电极获得了$87.5%$的平均分类准确率。然而,如何选择关键通道与频段,以及如何评估电极组合的有效性,仍有待深入研究。

Since 2006, deep learning has emerged in machine community [25] and has generated a great impact in signal and information processing. Many deep architecture models are proposed such as deep auto-encoder [26], convolution neural network [27], [28] and deep belief network [29]. Deep architecture models achieve successful results and outperform shallow models (e.g. MLP, SVMs, CRFs) in many challenge tasks, especially in speech and image domains [29]–[31]. Recently deep learning methods are also successfully applied to physiological signal processing such as EEG, electromyogram (EMG), electrocardiogram (ECG), and skin resistance (SC), and achieve comparable results in comparison with other conventional methods [5], [32]–[34].

自2006年起,深度学习在机器学习领域崭露头角[25],并对信号与信息处理产生了深远影响。研究者们提出了多种深度架构模型,如深度自编码器(deep auto-encoder)[26]、卷积神经网络(convolution neural network)[27][28]和深度信念网络(deep belief network)[29]。这些深度架构模型在多项挑战性任务中取得突破性成果,尤其在语音与图像领域表现显著优于浅层模型(如MLP、SVMs、CRFs)[29]–[31]。近年来,深度学习方法也成功应用于脑电图(EEG)、肌电图(EMG)、心电图(ECG)、皮肤电阻(SC)等生理信号处理,相较传统方法取得了可比拟的效果[5][32]–[34]。

In this paper, we focus on investigating critical frequency bands and critical channels for efficient EEG-based emotion recognition. Here, we introduce deep learning methodologies to deal with these two problems. First, to shed light on the relationship between emotional states and change of EEG signals, we devise a protocol that subjects are asked to elicit their own emotions when watching three types of emotional movies (positive, neutral and negative). After that, we extract efficient features called differential entropy [35], [36] from multichannel EEG data, and then we train deep belief networks with differential entropy features as inputs. By analyzing the weight distributions learned from the trained deep belief networks, we choose different setups for frequency bands and channels and compare the performance of different feature subsets. We also compare the deep learning methods with feature-based shallow models like kNN, logistic regression and SVM, in order to explore the advantages of deep learning and the feasibility of applying unsupervised feature learning to EEG-based emotion recognition.

本文重点研究基于脑电图(EEG)高效情绪识别的关键频段与关键通道。我们引入深度学习方法来解决这两个问题:首先,为揭示情绪状态与脑电信号变化之间的关系,我们设计了一项实验方案,要求受试者在观看三类情绪影片(积极、中性、消极)时自主诱发情绪。随后从多通道脑电数据中提取差分熵(differential entropy) [35][36]作为有效特征,并以这些特征作为输入训练深度信念网络。通过分析训练后深度信念网络学得的权重分布,我们选择不同的频段和通道配置,比较不同特征子集的性能表现。同时将深度学习方法与k近邻算法(kNN)、逻辑回归和支持向量机(SVM)等基于特征的浅层模型进行对比,以探索深度学习优势及无监督特征学习在脑电情绪识别中的应用可行性。

The main contributions of this paper can be described as the following aspects. First, considering the feature learning and feature selection properties of deep neural networks, we introduce deep learning methodologies to emotion recognition based on multichannel EEG data. By analyzing the weight distributions learned from the trained deep belief networks, we investigate different electrode set reductions and define the optimal electrode placement which outperforms original full channels with less computational cost and more feasibility in real world applications. And we show the superior performance of deep models over shallow models like $k\mathbf{NN}$ , logistic regression and SVM. The experiment results also indicate that the differential entropy features extracted from EEG data possess accurate and stable information for emotion recognition. We find that neural signatures associated with positive, neutral and negative emotions in channels and frequency bands do exist.

本文的主要贡献可概括为以下方面。首先,考虑到深度神经网络的特征学习和特征选择特性,我们引入了基于多通道EEG数据的深度学习情感识别方法。通过分析训练好的深度信念网络所学习的权重分布,我们研究了不同电极集的精简方案,并定义了优于原始全通道的最优电极布局方案,该方案具有更低计算成本和更高实际应用可行性。实验结果表明,深度模型在性能上显著优于kNN、逻辑回归和SVM等浅层模型。从EEG数据提取的微分熵特征被证实具有准确稳定的情感识别信息。我们发现,与积极、中性和消极情绪相关的神经特征确实存在于特定通道和频段中。

The layout of the paper is as follows. In Section II, we give a brief overview of related research on emotion recognition using EEG, as well as the use of deep learning methodologies for physiological signals. A systematic description of signal analysis methods and classification procedure for feature extraction and construction of deep belief networks is given in Section III. Section IV gives the motivation and rationale for our emotion experimental setting. A detailed description of all the materials and protocol we used is presented. In Section V, the detailed parameters for different class if i ers are given and we systematically compare the performance of deep belief networks with other shallow models. Then we investigate different electrode set reductions and neural signatures associated with different emotions according to the weight distributions obtained from the trained deep neural networks. In Section VI, we discuss the problems in emotion recognition studies. Finally, in Section VII, we present conclusions.

论文结构如下。第 II 节简要概述了利用脑电图 (EEG) 进行情绪识别的相关研究,以及深度学习在生理信号处理中的应用。第 III 节系统描述了信号分析方法、分类流程、特征提取及深度信念网络的构建。第 IV 节阐述了本实验设计的动机与理论基础,并详细说明了所用材料与实验协议。第 V 节给出了不同分类器的详细参数,系统比较了深度信念网络与浅层模型的性能,进而根据训练好的深度神经网络权重分布,研究了不同电极组精简方案及与情绪相关的神经特征。第 VI 节探讨了情绪识别研究中的现存问题。最后,第 VII 节给出结论。

II. RELATED WORK

II. 相关工作

With the fast development of wearable devices and dry electrode techniques [11]–[14], it enables us to record and analyze the brain activity in natural settings. This development is leading to a new trend that integrates brain-computer interfaces (BCIs) with emotional factors. Emotional braincomputer interfaces are closed-loop affective computing systems, which build interactive environments [37]. Figure 1 shows the emotional brain-computer interface cycle. Emotional brain-computer interfaces consist of the following six main phases. First, users are exposed to designed or realworld stimuli according to the protocol. The brain activities are recorded as EEG simultaneously. Then the raw data will be pre processed to remove noise and artifacts. Some relevant features will be extracted and a classifier will be trained based on the extracted features. After identifying user current emotional states, a feedback can be implemented to respond to the users.

随着可穿戴设备和干电极技术的快速发展 [11]–[14],我们得以在自然环境中记录并分析大脑活动。这一发展正引领着将脑机接口 (BCI) 与情感因素相融合的新趋势。情感脑机接口是闭环情感计算系统,能够构建交互环境 [37]。图 1: 展示了情感脑机接口的循环流程,该系统包含以下六个主要阶段:首先,用户根据实验协议接触设计或真实世界刺激物,同时通过脑电图 (EEG) 记录大脑活动;随后对原始数据进行预处理以消除噪声和伪影;提取相关特征后,基于这些特征训练分类器;在识别用户当前情绪状态后,可实施反馈机制响应用户。

One of the goals of affective neuroscience is to examine whether patterns of brain activities for specific emotions exist, and whether these patterns are to some extent common across individuals. Various studies have examined the neural correlates of emotions. Davidson et al. [38], [39] showed that frontal EEG asymmetry is related to approach and withdrawal emotions, with approach tendencies reflected in left frontal activity and withdrawal tendencies reflected in relative rightfrontal activity. Sammler et al. [8] investigated the EEG correlates of the processing of pleasant and unpleasant music. They found that pleasant music is associated with an increase of frontal midline theta power. Knyazev et al. [9] proposed gender differences in implicit and explicit processing of emotional facial expressions with the event-related theta synchronization. Mathersul et al. [10] investigated the relationships among non clinical depression/anxiety and later aliz ed frontal/pa rie to temporal activity on the basis of both negative mood and alpha EEG. Their findings supported predictions for frontal but not posterior regions. Wang et al. [40] indicated that for positive and negative emotions, the subject-independent features are mainly on right occipital lobe and parietal lobe in alpha band, the parietal lobe and temporal lobe in beta band, and left frontal lobe and right temporal lobe in gamma band. Martini et al [41] found that an increase in P300 and late positive potential and an increase in gamma activity during viewing unpleasant pictures as compared to neutral ones. They suggested that the full elaboration of unpleasant stimuli requires a tight inter hemispheric communication between temporal and frontal regions, which is realized by means of phase synchronization at about $40\mathrm{Hz}$ . However, most of the existing experiments on passive BCI use a very controlled approach with time locked stimuli using ERP analysis, especially in psychology. This ideal experimental setting limits the range of real-world conditions and hard to be generalized to natural settings in a real environment.

情感神经科学的目标之一是探究特定情绪是否存在对应的大脑活动模式,以及这些模式是否在个体间具有一定共性。多项研究已探讨情绪的神经关联机制:Davidson等人[38][39]发现脑电图(EEG)前额叶不对称性与趋近/回避情绪相关——左前额叶活动反映趋近倾向,右前额叶活动反映回避倾向;Sammler团队[8]研究愉悦/不悦音乐处理时的EEG特征,发现愉悦音乐会增强前额中线θ波能量;Knyazev等[9]通过事件相关θ同步现象,提出情绪面部表情隐性与显性加工存在性别差异;Mathersul等人[10]基于负面情绪和α波EEG,发现非临床抑郁/焦虑与侧化前额叶-颞顶叶活动的关系仅在前额区域符合预测;Wang团队[40]指出正负情绪在α波的右枕叶/顶叶、β波的顶叶/颞叶、γ波的左前额叶/右颞叶存在被试无关特征;Martini等[41]观测到不悦图片相比中性图片会诱发更强的P300成分、晚正电位及γ波活动,认为不悦刺激的完整加工需要颞叶与前额叶通过40Hz相位同步实现半球间紧密协作。然而现有被动式脑机接口研究多采用严格受控的ERP时间锁定范式(尤其心理学领域),这种理想实验条件难以推广到真实环境中的自然场景。


Fig. 1. Emotional brain-computer interface cycle

图 1: 情感脑机接口循环

Various studies in affective computing community try to build computational models to estimate emotional states using machine learning techniques. Lin et al. [42] applied machine learning algorithms to categorize EEG signals according to subject self-reported emotional states during music listening. They obtained an average classification accuracy of $82.29%$ for four emotions (joy, anger, sadness and pleasure) across 26 subjects. Soleymani et al. [3] proposed a user-independent emotion recognition method using EEG, pupillary response and gaze distance, which achieved the best classification accuracies of $68.5%$ for three labels of valence and $76.4%$ for three labels of arousal using a modality fusion across 24 participants. Hadji dimitri ou et al. [43] employed three time-frequency distributions (spec tr ogram, Hilbert-Huang spectrum, and ZhaoAtlas-Marks transform) as features to classify ratings of liking and familiarity. They also investigated the time course of music-induced affect responses and the role of familiarity. Li and Lu [22] proposed a frequency band searching method to choose an optimal band, into which the recorded EEG signal is filtered. They used common spatial patterns (CSP) and linearSVM to classify two emotions (happiness and sadness). Their experimental results indicated that the gamma band (roughly $30{-}100~\mathrm{Hz})$ ) is suitable for EEG-based emotion classification. Wang et al. [40] systematically compared three kinds of EEG features (power spectrum feature, wavelet feature and nonlinear dynamical feature) for emotion classification. They proposed an approach to track the trajectory of emotion changes with manifold learning.

情感计算领域的多项研究尝试利用机器学习技术构建计算模型来估计情绪状态。Lin等人[42]应用机器学习算法根据受试者在音乐聆听过程中自我报告的情绪状态对脑电图(EEG)信号进行分类,在26名受试者对四种情绪(喜悦、愤怒、悲伤和愉悦)的分类中获得了平均82.29%的准确率。Soleymani等人[3]提出了一种基于EEG、瞳孔反应和注视距离的用户无关情绪识别方法,通过多模态融合在24名参与者中对效价三分类和唤醒度三分类分别取得了68.5%和76.4%的最佳分类准确率。Hadjidimitriou等人[43]采用三种时频分布(频谱图、Hilbert-Huang谱和Zhao-Atlas-Marks变换)作为特征来分类喜好度和熟悉度评分,并研究了音乐诱发情绪反应的时间进程及熟悉度的作用。Li和Lu[22]提出了一种频带搜索方法来选择最优频带,将记录的EEG信号滤波至该频带,使用共同空间模式(CSP)和线性支持向量机(linearSVM)对两种情绪(快乐和悲伤)进行分类,实验结果表明伽马频带(约30-100Hz)适合基于EEG的情绪分类。Wang等人[40]系统比较了三种EEG特征(功率谱特征、小波特征和非线性动力学特征)在情绪分类中的表现,并提出了一种用流形学习追踪情绪变化轨迹的方法。

Recently, deep learning methods are applied to processing physiological signals such as EEG, EMG, ECG, and SC. Martinez et al. [5] trained an efficient deep convolution neural network to classify four cognitive states (relaxation, anxiety, excitement and fun) using skin conductance and blood volume pulse signals. They indicated that the proposed deep learning approach can outperform traditional feature extraction and selection methods and yield a more accurate affective model. Martin et al. [44] applied deep belief nets and hidden Markov model to detect sleep stage using multimodal clinical sleep datasets. Their results of using raw data with a deep model were comparable to handmade feature approach. To address two challenges of small sample problem and irrelevant channels, Li et al. [34] proposed a DBN based model for affective state recognition from EEG signals and compared it with five baselines with improvement of $11.5%$ to $24.4%$ . Zheng et al. [33] trained a deep belief network with differential entropy features extracted from multichannel EEG as input and achieved the best classification accuracy of $87.62%$ for two emotional categories in comparison with the state-of-theart methods. In our previous work [32], we proposed a deep belief network based method to select the critical channels and frequency bands for three emotions (positive, neutral and negative). The experimental results showed that the selected channels and frequency bands could achieve comparable accuracies in comparison with that of the total features. In this paper, we extend our previous work to multichannel EEG processing and further investigate the weight distributions of trained deep neural networks, which reflects crucial neural signatures for emotion recognition.

近年来,深度学习技术被应用于处理EEG(脑电图)、EMG(肌电图)、ECG(心电图)和SC(皮肤电导)等生理信号。Martinez等人[5]通过皮肤电导和血容量脉冲信号训练了一个高效的深度卷积神经网络,用于分类四种认知状态(放松、焦虑、兴奋和愉悦)。研究表明,所提出的深度学习方法优于传统特征提取与选择方法,能构建更精准的情感模型。Martin团队[44]采用深度信念网络和隐马尔可夫模型处理多模态临床睡眠数据集进行睡眠分期,其直接使用原始数据的深度模型效果与人工特征方法相当。针对小样本问题和无关通道两大挑战,Li等人[34]提出基于DBN(深度信念网络)的EEG信号情感状态识别模型,在五种基线方法对比中实现了11.5%至24.4%的性能提升。Zheng等[33]采用多通道EEG微分熵特征作为输入训练深度信念网络,在两个情感类别分类任务中达到87.62%的最优准确率,优于当时最先进方法。我们前期的研究[32]提出基于深度信念网络的关键通道与频段选择方法,针对三种情绪(积极、中性和消极)的实验表明,所选特征能达到与全特征相当的分类精度。本文在先前工作基础上扩展至多通道EEG处理,并深入分析训练后深度神经网络的权重分布,这些权重反映了情绪识别的关键神经特征。

The problem of electrode set reduction is commonly studied to reduce the computational complexity and ignore the irrelative noise. The optimal electrodes placement is usually defined according to some statistical factors like correlation coefficient, F-score and accuracy rate. Some studies shared the same pool of electrodes for restrict of commercial EEG device like Emotiv1 . In [42], Lin et al. identified 30 subjectindependent features that were most relevant to emotional processing across subjects according to F-score criterion and explored the feasibility of using fewer electrodes to characterize the EEG dynamics during music listening. The identified features were primarily derived from electrodes placed near the frontal and the parietal lobes. Valenzi et al. [24] selected a set of eight electrodes: AF 3, AF 4, F 3, F 4, F 7, F 8, T 7 and $T8$ , and achieved a promising result of $87.5%$ for four emotions. A similar study is proposed by Li et al. [34], which applied a DBN based model for affective states recognition from EEG signals to deal with two problems: small number of samples and noisy channels. They proposed a DBN-based channels selection method. Their interesting observation is that data in irrelevant channels randomly update the parameters in the DBN model, and data in critical channels update the parameters in the DBN model according to the related patterns. However, they did not explore the performance of these critical channels. In this paper, we proposed a novel electrode selection method through the weight distributions obtained from the trained deep neural networks instead of statistical parameters and show its superior performance over original full pool of electrodes.

电极集缩减问题常被研究用于降低计算复杂度并忽略无关噪声。最优电极布局通常根据相关系数、F值和准确率等统计因素确定。部分研究因受限于Emotiv等商用EEG设备而共享相同电极池。Lin等[42]根据F值标准确定了30个与跨被试情绪处理最相关的被试无关特征,并探索了使用更少电极表征音乐聆听期间EEG动态的可行性。这些特征主要源自额叶和顶叶附近的电极。Valenzi等[24]选取了AF3、AF4、F3、F4、F7、F8、T7和$T8$八个电极,在四种情绪分类中取得了$87.5%$的优异结果。Li等[34]提出了类似研究,他们采用基于DBN的模型从EEG信号中识别情感状态,以解决样本量少和噪声通道两大问题。其提出的基于DBN的通道选择方法有个有趣发现:无关通道的数据会随机更新DBN模型参数,而关键通道的数据会依据相关模式更新参数。但该研究未探讨这些关键通道的性能表现。本文提出了一种基于训练后深度神经网络权重分布(而非统计参数)的新型电极选择方法,并证明了其性能优于原始全电极池。

Although various approaches have been proposed for EEGbased emotion recognition, most of the experimental results cannot be compared directly for different setups of experiments. There is still a lack of publicly available emotional EEG datasets. To the best of our knowledge, the popular publicly available emotional EEG datasets are MAHNOB HCI [3] and DEAP [45]. The first one includes EEG, physiological signals, eye gaze, audio, and facial expressions of 30 people when watching 20 emotional videos. The subjects self-reported their felt emotions using arousal, valence, dominance, and predictability as well as emotional keywords. The DEAP dataset includes the EEG and peripheral physiological signals of 32 participants when watching 40 one-minute music videos. It also contains participants’ rate of each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity. For reproducing the results in this paper and enhancing the cooperation in related research fields, the dataset used in this study is freely available to the academic community2.

虽然已经提出了多种基于脑电图 (EEG) 的情绪识别方法,但由于实验设置不同,大多数实验结果无法直接比较。目前仍缺乏公开可用的情绪脑电数据集。据我们所知,流行的公开情绪脑电数据集包括 MAHNOB HCI [3] 和 DEAP [45]。前者包含 30 人在观看 20 段情绪视频时的脑电信号、生理信号、视线追踪、音频和面部表情数据。受试者使用唤醒度、效价、支配度和可预测性以及情绪关键词来自评感受情绪。DEAP 数据集记录了 32 名参与者观看 40 段一分钟音乐视频时的脑电信号和外周生理信号,还包含参与者对每段视频在唤醒度、效价、喜好度、支配度和熟悉度方面的评分。为复现本文结果并促进相关研究领域的合作,本研究所用数据集已向学术界公开2。

III. METHODS

III. 方法

A. Preprocessing

A. 预处理

According to the response of the subjects, only the experiment epochs when the target emotions were elicited were chosen for further analysis. The raw EEG data was downsampled to $200\mathrm{Hz}$ sampling rate. The EEG signals were visually checked and the recordings seriously contaminated by EMG and EOG were removed manually. EOG was also recorded in the experiments, and later used to identify blink artifacts from the recorded EEG data. In order to filter the noise and remove the artifacts, the EEG data was processed with a bandpass filter between $0.3\mathrm{Hz}$ to $50\mathrm{Hz}$ . After performing the preprocessing, we extracted the EEG segments corresponding to the duration of each movie. Each channel of the EEG data was divided into the same-length epochs of 1s without overlapping. There were about 3300 clean epochs for one experiment. Features were further computed on each epoch of the EEG data. All signal processing was performed in the Matlab software.

根据被试反馈,仅选取成功诱发目标情绪的实验时段进行后续分析。原始脑电数据降采样至 $200\mathrm{Hz}$ 采样率,通过目视检查剔除肌电(EMG)和眼电(EOG)严重污染的记录段。实验同步记录的眼电数据用于识别脑电记录中的眨眼伪迹。采用 $0.3\mathrm{Hz}$ 至 $50\mathrm{Hz}$ 带通滤波器进行噪声过滤和伪迹去除。预处理后,提取每段影片持续时间内对应的脑电片段,将各通道数据按1秒时长非重叠分割为等长时段,单次实验约获得3300个洁净时段。所有脑电时段均进行特征计算,信号处理全程使用Matlab软件完成。

B. Feature Extraction

B. 特征提取

An efficient feature called differential entropy (DE) [35], [36] extends the idea of Shannon entropy and is used to measure the complexity of a continuous random variable [46]. Since EEG data has the higher low frequency energy over high frequency energy, DE has the balance ability of discriminating EEG pattern between low and high frequency energy, which was firstly introduced to EEG-based emotion recognition by Duan et al. [36].

一种称为差分熵 (differential entropy, DE) [35][36] 的高效特征扩展了香农熵的概念,用于衡量连续随机变量的复杂度 [46]。由于脑电图 (EEG) 数据在低频能量上高于高频能量,差分熵具备区分低频与高频能量下EEG模式的能力,该特征由Duan等人 [36] 首次引入基于EEG的情绪识别研究。

The original calculation formula of differential entropy is defined as

微分熵的原始计算公式定义为

$$
h(X)=-\int_{X}f(x)l o g(f(x))d x.
$$

$$
h(X)=-\int_{X}f(x)l o g(f(x))d x.
$$

If a random variable obeys the Gaussian distribution $N(\mu,\sigma^{2})$ , the differential entropy can simply be calculated by the following formulation,

如果一个随机变量服从高斯分布 $N(\mu,\sigma^{2})$ ,其微分熵可以通过以下公式简单计算得到,

$$
\begin{array}{c c c}{{\displaystyle h(X)=-\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\frac{(x-\mu)^{2}}{2\sigma^{2}}\log\frac{1}{\sqrt{2\pi\sigma^{2}}}}}\ {{\displaystyle\exp\frac{(x-\mu)^{2}}{2\sigma^{2}}d x=\frac{1}{2}\log2\pi e\sigma^{2}.}}\end{array}
$$

$$
\begin{array}{c c c}{{\displaystyle h(X)=-\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\frac{(x-\mu)^{2}}{2\sigma^{2}}\log\frac{1}{\sqrt{2\pi\sigma^{2}}}}}\ {{\displaystyle\exp\frac{(x-\mu)^{2}}{2\sigma^{2}}d x=\frac{1}{2}\log2\pi e\sigma^{2}.}}\end{array}
$$

It has been proven that, for a fixed length EEG segment, differential entropy is equivalent to the logarithm energy spectrum in a certain frequency band [35]. So differential entropy can be calculated in five frequency bands (delta: 1- $3\mathrm{Hz}$ , theta: $4{-}7\mathrm{Hz}$ , alpha: $8{\circ}13\mathrm{Hz}$ , beta: $14{-}30\mathrm{Hz}$ , gamma: 31- $50\mathrm{Hz})$ ) with time complexity $O(K N\log N)$ , where $K$ is the number of electrodes, and $N$ is the size of samples.

已证实,对于固定长度的脑电图 (EEG) 片段,微分熵 (differential entropy) 等同于特定频带的对数能量谱 [35]。因此可在五个频带计算微分熵 (delta: 1- $3\mathrm{Hz}$ , theta: $4{-}7\mathrm{Hz}$ , alpha: $8{\circ}13\mathrm{Hz}$ , beta: $14{-}30\mathrm{Hz}$ , gamma: 31- $50\mathrm{Hz})$ ),时间复杂度为 $O(K N\log N)$ ,其中 $K$ 为电极数量, $N$ 为样本大小。

For a specified EEG sequence, we used a 256-point ShortTime Fourier Transform with a non-overlapped Hanning window of 1s to extract five frequency bands of EEG signals. Then we calculated differential entropy for each frequency band. Since each frequency band signal has 62 channels, we extracted differential entropy features with 310 dimensions for a sample.

对于指定的脑电图 (EEG) 序列,我们采用256点短时傅里叶变换和1秒无重叠汉宁窗来提取EEG信号的五个频段。随后计算每个频段的微分熵。由于每个频段信号包含62个通道,因此每个样本提取的微分熵特征维度为310。

As the previous studies suggested [38], [47], the asymmetrical brain activity (later aliz ation in left-right direction and caudality in frontal-posterior direction) seems to be effective in the emotion processing. So we also computed differential asymmetry (DASM) and rational asymmetry (RASM) features [36] as the differences and ratios between the DE features of 27 pairs of hemispheric asymmetry electrodes $(F p1,F7,F3,$ F T 7, F C3, T 7, P 7, C3, T P 7, CP 3, $P3$ , $O1$ , $A F3$ , $F5$ , $F7,F C5,F C1,C5,C1,C P5,C P1,P5,P1,P O7,P O5,$ $P O3$ , and $C B1$ of the left hemisphere, and $F p2$ , $F8$ , $F4$ , F T 8, F C4, T 8, P 8, $C4$ , T P 8, CP 4, $P4$ , $O2$ , $A F4$ , $F6$ , $F8,F C6,F C2,C6,C2,C P6,C P2,P6,P2,P O8,P O6,$ $P O4$ , and $C B2$ of the right hemisphere). DASM and RASM are, respectively, defined as

如先前研究[38]、[47]所示,大脑活动的不对称性(左右方向的偏侧化和前后方向的尾向性)在情绪处理中似乎具有重要作用。因此,我们同样计算了27对半球不对称电极(左半球的Fp1、F7、F3、FT7、FC3、T7、P7、C3、TP7、CP3、P3、O1、AF3、F5、F7、FC5、FC1、C5、C1、CP5、CP1、P5、P1、PO7、PO5、PO3、CB1,以及右半球的Fp2、F8、F4、FT8、FC4、T8、P8、C4、TP8、CP4、P4、O2、AF4、F6、F8、FC6、FC2、C6、C2、CP6、CP2、P6、P2、PO8、PO6、PO4、CB2)的DE特征之间的差异和比率,作为差分不对称性(DASM)和比率不对称性(RASM)特征[36]。DASM和RASM分别定义为

$$
D A S M=D E(X_{l e f t})-D E(X_{r i g h t})
$$

$$
D A S M=D E(X_{l e f t})-D E(X_{r i g h t})
$$

and

$$
R A S M=D E(X_{l e f t})/D E(X_{r i g h t}),
$$

$$
R A S M=D E(X_{l e f t})/D E(X_{r i g h t}),
$$

where $X_{l e f t}$ and $X_{r i g h t}$ represent the pairs of electrodes on the left and right hemisphere. We define DCAU features as the differences between DE features of 23 pairs of frontal-posterior electrodes (F T 7-T P 7, F C5-CP 5, F C3-CP 3, F C1-CP 1, F CZ-CP Z, F C2-CP 2, F C4-CP 4, F C6-CP 6, F T 8-T P 8, F 7-P 7, F 5-P 5, F 3-P 3, F 1-P 1, F Z-P Z, F 2-P 2, $F4{\mathrm{-}}P4$ ,

其中 $X_{left}$ 和 $X_{right}$ 分别代表左右半球的电极对。我们将DCAU特征定义为23对前-后电极 (F T 7-T P 7, F C5-CP 5, F C3-CP 3, F C1-CP 1, F CZ-CP Z, F C2-CP 2, F C4-CP 4, F C6-CP 6, F T 8-T P 8, F 7-P 7, F 5-P 5, F 3-P 3, F 1-P 1, F Z-P Z, F 2-P 2, $F4{\mathrm{-}}P4$) 的DE特征差值。


Fig. 2. (a) A RBM contains the hidden layer neurons connected to the visible layer neurons with weights W. (b) A DBN using supervised fine-tuning of all layers with back propagation. (c) The graphical depiction of unrolled DBN using unsupervised fine-tuning of all layers with back propagation.

图 2: (a) RBM包含隐藏层神经元通过权重W与可见层神经元连接。(b) 使用反向传播对所有层进行监督微调的DBN。(c) 使用反向传播对所有层进行无监督微调的展开DBN图示。

$F6{\it-}P6\$ , $F8\mathrm{-}P8$ , F P 1-O1, F P 2-O2, F P Z-OZ, AF 3-CB1, and $A F4\ –C B2$ ). DCAU is defined as

$F6{\it-}P6\$ , $F8\mathrm{-}P8$ , F P 1-O1, F P 2-O2, F P Z-OZ, AF 3-CB1, 以及 $A F4\ –C B2$ )。DCAU定义为

$$
D C A U=D E(X_{f r o n t a l})/D E(X_{p o s t e r i o r}),
$$

$$
D C A U=D E(X_{f r o n t a l})/D E(X_{p o s t e r i o r}),
$$

where $X_{f r o n t a l}$ and $X_{p o s t e r i o r}$ represent the pairs of frontalposterior electrodes.

其中 $X_{f r o n t a l}$ 和 $X_{p o s t e r i o r}$ 分别代表前额-后部电极对。

For comparison, we also extracted conventional power spectral density (PSD) as baseline. The dimensions of PSD, DE, DASM, RASM and DCAU features are 310, 310, 135, 135 and 115, respectively. We applied the linear dynamic system (LDS) approach to further filter out irrelative components and take temporal dynamics of emotional states into account [48].

为便于比较,我们还提取了传统功率谱密度 (PSD) 作为基线。PSD、DE、DASM、RASM 和 DCAU 特征的维度分别为 310、310、135、135 和 115。我们采用线性动态系统 (LDS) 方法进一步滤除无关成分,并考虑情绪状态的时间动态特性 [48]。

C. Classification with Deep Belief Networks

C. 基于深度信念网络的分类

Deep Belief Network is a probabilistic generative model with deep architecture, which characterizes the input data distribution using hidden variables [25], [29]. Each layer of the DBN consists of a restricted Boltzmann machine (RBM) with visible units and hidden units, as shown in Fig. 2(a). There are no visible-visible connections and no hidden-hidden connections. The visible and hidden units have a bias vector, $c$ and $b$ , respectively.

深度信念网络 (Deep Belief Network) 是一种具有深层架构的概率生成模型,它利用隐变量来表征输入数据的分布 [25][29]。DBN 的每一层都由一个具有可见单元和隐藏单元的受限玻尔兹曼机 (RBM) 组成,如图 2(a) 所示。其中不存在可见单元间连接和隐藏单元间连接。可见单元与隐藏单元分别具有偏置向量 $c$ 和 $b$。

A DBN is constructed by stacking a predefined number of RBMs on top of each other, where the output from a lowerlevel RBM is the input to a higher-level RBM, as shown in Fig. 2(b). An efficient greedy layer-wise algorithm is used to pre-train each layer of networks.

深度信念网络 (DBN) 由预定义数量的受限玻尔兹曼机 (RBM) 堆叠而成,其中低层 RBM 的输出作为高层 RBM 的输入,如图 2(b) 所示。采用高效的逐层贪婪算法对网络各层进行预训练。

In an RBM, the joint distribution $P(v,h;\theta)$ over the visible units $v$ and hidden units $h$ , given the model parameters $\theta$ , is defined in terms of an energy function $E(v,h;\theta)$ as

在RBM中,给定模型参数$\theta$,可见单元$v$和隐藏单元$h$的联合分布$P(v,h;\theta)$通过能量函数$E(v,h;\theta)$定义为

$$
P(v,h;\theta)=\frac{e x p(-E(v,h;\theta))}{Z},
$$

$$
P(v,h;\theta)=\frac{e x p(-E(v,h;\theta))}{Z},
$$

where $\begin{array}{r c l}{{Z}}&{{=}}&{{\sum_{v}\sum_{h}e x p(-E(v,h;\theta))}}\end{array}$ is a normalization factor, and the marginal probability that the model assigns to a visible vector $v$ is

其中 $\begin{array}{r c l}{{Z}}&{{=}}&{{\sum_{v}\sum_{h}e x p(-E(v,h;\theta))}}\end{array}$ 是归一化因子,模型分配给可见向量 $v$ 的边缘概率为

$$
P(v;\theta)=\frac{\sum_{h}e x p(-E(v,h;\theta))}{Z}.
$$

$$
P(v;\theta)=\frac{\sum_{h}e x p(-E(v,h;\theta))}{Z}.
$$

For a Gaussian (visible)-Bernoulli (hidden) RBM, the energy function is defined as

对于高斯(可见)-伯努利(隐藏)RBM,其能量函数定义为

$$
E(v,h;\theta)=-\sum_{i=1}^{I}\sum_{j=1}^{J}w_{i j}v_{i}h_{j}-\frac{1}{2}\sum_{i=1}^{I}(v_{i}-b_{i})^{2}-\sum_{j=1}^{J}a_{j}h_{j},
$$

$$
E(v,h;\theta)=-\sum_{i=1}^{I}\sum_{j=1}^{J}w_{i j}v_{i}h_{j}-\frac{1}{2}\sum_{i=1}^{I}(v_{i}-b_{i})^{2}-\sum_{j=1}^{J}a_{j}h_{j},
$$

where $w_{i j}$ is the symmetric interaction term between visible unit $v_{i}$ and hidden unit $h_{j},b_{i}$ and $a_{j}$ are the bias term, and $I$ and $J$ are the numbers of visible and hidden units. The conditional probabilities can be efficiently calculated as

其中 $w_{i j}$ 是可见单元 $v_{i}$ 和隐藏单元 $h_{j}$ 之间的对称交互项,$b_{i}$ 和 $a_{j}$ 是偏置项,$I$ 和 $J$ 分别是可见单元和隐藏单元的数量。条件概率可以高效计算为

$$
P(h_{j}=1|v;\theta)=\sigma(\sum_{i=1}^{I}w_{i j}v_{i}+a_{j}),
$$

$$
P(h_{j}=1|v;\theta)=\sigma(\sum_{i=1}^{I}w_{i j}v_{i}+a_{j}),
$$

$$
P(v_{i}=1|h;\theta)=N\big(\sum_{j=1}^{J}w_{i j}h_{j}+b_{i},1\big),
$$

$$
P(v_{i}=1|h;\theta)=N\big(\sum_{j=1}^{J}w_{i j}h_{j}+b_{i},1\big),
$$

where $\sigma(x)=1/(1+e x p(x))$ , and $v_{i}$ takes real values and follows a Gaussian distribution with mean ${\textstyle\sum_{j=1}^{J}w_{i j}h_{j}+b_{i}}$ and variance one.

其中 $\sigma(x)=1/(1+e x p(x))$,且 $v_{i}$ 取实数值并服从均值为 ${\textstyle\sum_{j=1}^{J}w_{i j}h_{j}+b_{i}}$、方差为1的高斯分布。

Taking the gradient of the log likelihood $\log p(v;\theta)$ , we can derive the update rule for adjusting RBM weights as

对对数似然 $\log p(v;\theta)$ 求梯度,可以推导出调整RBM权重的更新规则为

$$
\Delta w_{i j}=E_{d a t a}(v_{i}h_{j})-E_{m o d e l}(v_{i}h_{j}),
$$

$$
\Delta w_{i j}=E_{d a t a}(v_{i}h_{j})-E_{m o d e l}(v_{i}h_{j}),
$$

where $E_{d a t a}(v_{i}h_{j})$ is the expectation observed in the training set and $E_{m o d e l}(v_{i}h_{j})$ is the same expectation under the distribution defined by the model. But $E_{m o d e l}(v_{i}h_{j})$ is intractable to compute so the contrastive divergence approximation to the gradient is used, where $E_{m o d e l}(v_{i}h_{j})$ is replaced by running the Gibbs sampler initialized at the data for one full step. Sometimes momentum in weight update is used for preventing getting stuck in local minima and regular iz ation prevents the weights from getting too large [49].

其中 $E_{d a t a}(v_{i}h_{j})$ 是训练集中观察到的期望值,$E_{m o d e l}(v_{i}h_{j})$ 是模型定义分布下的相同期望值。但由于 $E_{m o d e l}(v_{i}h_{j})$ 难以计算,因此采用对比散度近似梯度的方法,即用从数据初始化并运行一步Gibbs采样的结果来替代 $E_{m o d e l}(v_{i}h_{j})$。有时会在权重更新中使用动量(momentum)来防止陷入局部极小值,并通过正则化(regularization)防止权重过大 [49]。

In this work, training is performed in three steps: 1) unsupervised pre training of each layer, 2) unsupervised fine-tuning of all layers with back propagation, and 3) supervised finetuning of all layers with back propagation. For unsupervised fine-tuning, $n$ RBMs are unrolled to form a $2n-1$ directed encoder and decoder network that can be fine-tuning with back propagation [25], [49]. Figure 2(c) shows the graphical depiction of unrolled DBN. The goal of training this deep auto encoder is to learn the weights and biases between each layer such that the reconstruction and the input are as close to each other as possible. For supervised fine-tuning, a label layer is added to the top of pre-trained DBN and the weights are updated through error back propagation.

在本工作中,训练分为三个步骤:1) 各层的无监督预训练,2) 通过反向传播对所有层进行无监督微调,以及3) 通过反向传播对所有层进行有监督微调。对于无监督微调,将 $n$ 个 RBM 展开形成一个 $2n-1$ 层的有向编码器和解码器网络,可通过反向传播进行微调 [25][49]。图 2(c) 展示了展开后 DBN 的图示。训练这一深度自编码器的目标是学习各层之间的权重和偏置,使得重构结果与输入尽可能接近。对于有监督微调,在预训练好的 DBN 顶部添加一个标签层,并通过误差反向传播更新权重。

IV. EXPERIMENTS

IV. 实验

A. Stimuli

A. 刺激物

It is important to design efficient and reliable emotion eli citation stimuli for emotion experiments. Nowadays, there are various kinds of stimuli used in emotion research like image, music, metal imagery, and films. Compared to other stimuli, emotional films have several advantages. The existing studies have already evaluated the reliability and efficiency of film clips to eli citation [50], [51]. Emotional films contain both scene and audio, which can expose subjects to more reallife scenarios and elicit strong subjective and physiological changes. So in our experiment, we chose some emotional movie clips to help subjects elicit their own emotions. There are totally fifteen clips in one experiment and each of them lasts for about 4 minutes. There are three categories of emotions (positive, neutral and negative) evaluated in this study and each emotion has five corresponding emotional clips. All the movie clips were carefully chosen as stimuli to help elicit subjects’ right emotions from a preliminary study. Since all of the subjects are native Chinese, we selected the emotional clips from Chinese films. The details of the film clips used in this study are listed in Table I.

设计高效可靠的情绪诱发刺激对情绪实验至关重要。目前情绪研究中使用的刺激类型多样,如图像、音乐、心理意象和影片等。相比其他刺激形式,情绪影片具有多重优势。现有研究已对影片片段诱发情绪的可靠性和有效性进行了评估[50][51]。情绪影片同时包含场景和音频,能让受试者接触更真实的生活场景,引发强烈的主观感受和生理变化。因此本实验选用情绪电影片段作为诱发材料。每次实验共使用15段时长约4分钟的影片片段,涵盖积极、中性和消极三类情绪(每类各5段)。所有影片片段均通过预实验精心筛选,以确保能有效诱发目标情绪。由于受试者均为中文母语者,影片素材均选自中国电影。具体影片信息见表1。

TABLE I DETAILS OF FILM CLIPS USED IN OUR EMOTION EXPERIMENT

表 1: 情绪实验所用电影片段详情

No. Emotionlabel Film clips sources
1 negative 《唐山大地震》
2 negative 《1942》
3 positive 《人在囧途之泰囧》
4 positive 《唐伯虎点秋香》
5 positive 《越光宝盒》
6 neutral 《世界遗产在中国》

B. Subjects

B. 主题

Fifteen subjects (7 males and 8 females; MEAN: 23.27, STD: 2.37) with self-reported normal or corrected-to-normal vision and normal hearing participated in the experiments. All participants were right-handed and were students from Shanghai Jiao Tong University. We selected the subjects using the Eysenck Personality Questionnaire (EPQ). The EPQ is a questionnaire to assess the personality traits of a person devised by Eysenck et al. [52]. They initially conceptualized personality as three biologically-based independent dimensions of temperament measured on a continuum: Extraversion/Introversion, Neurotic is m/Stability and Psychotic is m/Social is ation. It seems that not every subject can elicit specific emotions immediately, even with the stimuli. The subjects who are extraverted and have stable moods tend to elicit the right emotions throughout the emotion experiments. So from the feedback of the EPQ questionnaires, we selected these subjects to participate in the emotion experiments. In advance, the subjects were informed about the procedure. The subjects were instructed to sit comfortably, watch the forthcoming movie clips attentively, and refrain as much as possible from overt movements. Figure 3 shows the experiment scene. The subjects got paid for their participation in the experiments. Each subject participated in the experiment twice at an interval of one week or longer.

15名受试者(7男8女;平均年龄:23.27岁,标准差:2.37岁)参与了实验,均自述视力正常或矫正至正常,听力正常。所有参与者均为右利手,来自上海交通大学。我们采用艾森克人格问卷(EPQ)筛选受试者。EPQ是由Eysenck等人[52]设计的人格特质评估问卷,最初将人格概念化为三个基于生物学的独立气质维度(连续测量):外向/内向、神经质/稳定性和精神质/社会化。研究发现并非所有受试者都能在刺激下立即诱发特定情绪,外向且情绪稳定的受试者在情绪实验中更易诱发目标情绪。因此根据EPQ问卷反馈,我们筛选出这类受试者参与情绪实验。实验前已向受试者说明流程,要求他们保持舒适坐姿、专注观看即将播放的电影片段,并尽量避免明显肢体动作。图3展示了实验场景,受试者会获得实验报酬。每位受试者间隔一周或更长时间参与两次实验。


Fig. 3. The experiment scene

图 3: 实验场景

C. Protocol

C. 协议

We performed the experiments in a quiet environment in the morning or early in the afternoon. EEG was recorded using an ESI NeuroScan System at a sampling rate of $1000~\mathrm{Hz}$ from 62-channel electrode cap according to the international 10-20 system. The layout of EEG electrodes on the cap is shown in Fig. 4. To remove eye-movement artifacts, we recorded the electro o cul ogram. The frontal face videos were also recorded from the camera mounted in front of the subjects. There are totally fifteen sessions in one experiment. There is a 5s hint before each clip, 45s for self-assessment and 15s for rest after each clip in one session. For self-assessment, the questions are following Philippot [53]: 1) what they had actually felt in response to viewing the film clip; 2) have they watched this movie before; 3) have they understood the film clip. Figure 5 shows the detailed protocol.

我们在早晨或午后初期于安静环境中进行实验。采用ESI NeuroScan系统以$1000~\mathrm{Hz}$采样率,通过62通道电极帽(按国际10-20系统布置)记录脑电图(EEG)。电极帽的布局如图4所示。为消除眼动伪迹,我们同步记录了眼电图(EOG),并通过前置摄像头采集受试者正面面部视频。每次实验包含15个会话单元,每个会话包含:5秒提示→45秒影片片段→15秒自评→15秒休息。自评问题遵循Philippot [53]设计:1)观看影片时的实际感受;2)是否曾观看过该影片;3)是否理解影片内容。具体流程见图5。

V. EXPERIMENT RESULTS

五、实验结果

A. Neural Patterns

A. 神经模式

After extracting differential entropy features from five frequency bands (Delta, Theta, Alpha, Beta and Gamma), we further investigate the neural patterns associated with different emotions. The DE feature map of one experiment is shown in Fig. 6. We find that there exist specific neural patterns in high frequency bands for positive, neutral and negative emotions through time-frequency analysis. For positive emotion, it shows that energy of beta and gamma frequency bands increases whereas neutral and negative emotions have lower energy of beta and gamma frequency bands. While the neural patterns of neutral and negative emotions have similar patterns in beta and gamma bands, neutral emotions have higher energy of alpha oscillations. These findings provide fundamental evidences for understanding the mechanism of emotion processing in the brain.

从五个频段(Delta、Theta、Alpha、Beta和Gamma)提取微分熵特征后,我们进一步研究了与不同情绪相关的神经模式。图6展示了一次实验的DE特征图。通过时频分析发现,积极、中性和消极情绪在高频段存在特定神经模式。积极情绪表现为Beta和Gamma频段能量升高,而中性和消极情绪的Beta和Gamma频段能量较低。虽然中性与消极情绪在Beta和Gamma频段的神经模式相似,但中性情绪的Alpha振荡能量更高。这些发现为理解大脑情绪处理机制提供了基础证据。


Fig. 4. The EEG cap layout for 62 electrodes

图 4: 62个电极的EEG帽布局


Fig. 5. Protocol of the EEG experiment

图 5: EEG实验流程


Fig. 6. The DE feature map in one experiment, where the time frames are on the horizontal axis, and the DE features are on the vertical axis.

图 6: 某次实验中的微分熵(DE)特征图,时间帧位于横轴,微分熵特征位于纵轴。

The observed frequencies have been divided into specific groups, as specific frequency ranges are more prominent in certain states of mind. Previous neuroscience studies [54], [55] have shown that EEG alpha bands reflect attention al processing and beta bands reflect emotional and cognitive processing in the brain. Li and Lu [22] also showed that gamma bands of EEG are suitable for emotion classification with emotional images as stimuli. Our findings are consistent with the existing results. When participants watch neutral stimuli, they tend to be more relaxed and less attention al, which evoke alpha responses. And when processing positive emotion, the energy of beta and gamma response enhance.

观测频率被划分为特定组别,因为特定频率范围在不同心理状态下更为显著。现有神经科学研究 [54][55] 表明,脑电图(EEG)的α波段反映注意处理,β波段反映大脑中的情绪与认知处理。Li和Lu [22] 也证实,以情绪图片为刺激物时,EEG的γ波段适用于情绪分类。我们的发现与现有结果一致:当参与者观看中性刺激时,他们往往更放松且注意力降低,这会诱发α波响应;而在处理积极情绪时,β波和γ波响应的能量会增强。

B. Classifier Training

B. 分类器训练

In this study, we systematically compare the classification performance of four class if i ers, $K$ nearest neighbor (kNN), logistic regression (LR), support vector machine (SVM) and deep belief networks (DBNs) for EEG-based emotion recognition. These class if i ers use the DE features mentioned above as inputs. In the emotion experiments, we collect the EEG data from fifteen subjects and each subject has done the experiments twice at intervals of about one week. There are totally 30 experiments evaluated here. The training data and the test data are from different sessions of the same experiment. The training data contains 9 sessions of data while the test data contains other 6 sessions of data from the same experiment.

在本研究中,我们系统比较了四种分类器(K近邻(kNN)、逻辑回归(LR)、支持向量机(SVM)和深度信念网络(DBNs))在基于脑电(EEG)情绪识别中的分类性能。这些分类器使用上述微分熵(DE)特征作为输入。在情绪实验中,我们采集了15名受试者的脑电数据,每位受试者间隔约一周进行两次实验。本研究共评估了30次实验。训练数据和测试数据来自同次实验的不同时段:训练数据包含9个时段的数据,而测试数据包含同次实验另外6个时段的数据。

TABLE II THE DETAILS OF PARAMETERS USED IN DIFFERENT CLASS IF I ERS

表 II 不同分类器使用的参数详情

分类器 参数详情
kNN k=5
LR L2 L2正则化,在[1.5:10]范围内以0.5为步长调整正则化系数
SVM 线性核,C的搜索空间为2l-10:10],步长为1
DBN 结构包含2个隐藏层:分别在[200:500]和[150:500]范围内以50为步长搜索第一和第二隐藏层的最佳神经元数量。小批量大小:201
无监督和监督学习率:0.5, 0.6
动量参数:0.1
激活函数:sigmoid函数

Table II shows the details of parameters used in different class if i ers. For $k\mathbf{NN}$ , we use $k=5$ for baseline in comparison with other class if i ers. For LR, we employ $L2$ -regularized LR and we tune the regular iz ation parameter in [1.5:10] with a step of 0.5. We also use SVM to classify the emotional states for each EEG segment. The basic idea of SVM is to project input data onto a higher dimensional feature space via a kernel transfer function, which is easier to be separated than that in the original feature space. We use LIBSVM software [56] to implement the SVM classifier and employ linear kernel. We search the parameter space $2^{[-10:10]}$ with a step of one for $C$ to find the optimal value.

表 II 展示了不同分类器使用的参数细节。对于 $k\mathbf{NN}$ ,我们采用 $k=5$ 作为基线与其他分类器比较。对于逻辑回归 (LR) ,我们使用 $L2$ 正则化逻辑回归,并在 [1.5:10] 范围内以 0.5 为步长调整正则化参数。我们还使用支持向量机 (SVM) 对每个脑电图 (EEG) 片段进行情绪状态分类。SVM 的基本思想是通过核转换函数将输入数据映射到更高维的特征空间,使其比原始特征空间更易分离。我们使用 LIBSVM 软件 [56] 实现 SVM 分类器并采用线性核函数。我们在 $2^{[-10:10]}$ 范围内以 1 为步长搜索参数 $C$ 以寻找最优值。

For deep neural networks, we construct a DBN with two hidden layers. We search the optimal numbers of neurons in the first and the second hidden layers with step of 50 in the ranges of [200:500] and [150:500], respectively. We set the unsupervised learning rate and supervised learning rate as 0.5 and 0.6, respectively, in the experiment. We also use momentum in the weight update to prevent getting stuck in local minima. Before putting the DE features into DBN, the values of these features are scaled between 0 and 1 by subtracting the mean, divided by the standard deviation and finally adding 0.5. We implement DBN with the DBNToolbox Matlab code [44] in this study.

对于深度神经网络,我们构建了一个包含两个隐藏层的DBN(深度信念网络)。我们以50为步长,分别在[200:500]和[150:500]范围内搜索第一和第二隐藏层的最佳神经元数量。实验中,无监督学习率和有监督学习率分别设为0.5和0.6。我们还使用了动量项进行权重更新,以避免陷入局部极小值。在将DE特征输入DBN之前,通过减去均值、除以标准差并最终加0.5,将这些特征值缩放到0到1之间。本研究采用DBNToolbox Matlab代码[44]实现DBN。

C. Classification Performance

C. 分类性能

The mean accuracies (standard deviations) of DBN and SVM with the DE features from different frequency bands in thirty experiments of fifteen subjects are shown in Table III. It should be noted that ‘Total’ in Table III represents the direct concatenation of five frequency bands of EEG data in this paper. First,