[论文翻译]通过语境化话语处理理解政治


下载PDF:https://arxiv.org/pdf/2012.15784v1.pdf


Abstract

Politicians often have underlying agendas when reacting to events. Arguments in contexts of various events reflect a fairly consistent set of agendas for a given entity. In spite of recent advances in Pretrained Language Models (PLMs), those text representations are not designed to capture such nuanced patterns. In this paper, we propose a Compositional Reader model consisting of encoder and composer modules, that attempts to capture and leverage such information to generate more effective representations for entities, issues, and events. These representations are contextualized by tweets, press releases, issues, news articles, and participating entities. Our model can process several documents at once and generate composed representations for multiple entities over several issues or events. Via qualitative and quantitative empirical analysis, we show that these representations are meaningful and effective.

摘要

政客们经常在对事件做出反应时患上底层议程。各种事件的上下文中的论据反映了给定实体的一组相当一致的议程。尽管近期预用语言模型(PLMS)进行了进展,但这些文本表示不设计用于捕获此类细微差别模式。在本文中,我们提出了一种由编码器和作曲器模块组成的组成读者模型,该模型包括捕获和利用此类信息,以为实体,问题和事件产生更有效的表示。这些表示由 Tweets,新闻稿,问题,新闻文章和参与实体进行了内容化。我们的模型可以一次处理多个文档,并在几个问题或事件中生成多个实体的组合表示。通过定性和定量的实证分析,我们表明这些陈述有意义和有效。

Introduction

Often in political discourse, the same argument trajectories are repeated across events by politicians and political caucuses. Knowing and understanding the trajectories that are regularly used, is pivotal in contextualizing the comments made by them when a new event occurs. Furthermore, it helps us in understanding their perspectives and predict their likely reactions to new events and participating entities. In political text, bias towards a political perspective is often subtle rather than explicitly stated . Choices of mentioning or omitting certain entities or certain attributes can reveal the author's agenda. For example, when a politician tweets in reaction to a new shooting event, it is likely that they oppose gun control and support free gun rights, despite not mentioning their stance explicitly. Our main insight in this paper is that effectively detecting such bias from text requires modeling the broader context of the document. This can include understanding relevant facts related to the event addressed in the text, the ideological leanings and perspectives expressed by the author in the past, and the sentiment/attitude of the author towards the entities referenced in the text. We suggest that this holistic view can be obtained by combining information from multiple sources, which can be of varying types, such as news articles, social media posts, quotes from press releases and historical beliefs expressed by politicians. Despite recent advances in Pretrained Language Models (PLMs) in NLP , which have greatly improved word representations via contextualized embeddings and powerful transformer units, such representations alone are not enough to capture nuanced biases in political discourse. Two of the key reasons are: (i) they do not directly focus on entity/issue-centric data and (ii) they contextualize only based on surrounding text but not on relevant issue/event knowledge. A computational setting for this approach, , requires two necessary attributes: (i) an input representation that combines all the different types of information meaningfully and (ii) an ability to process all the information together in one-shot. We address the first challenge by introducing a graph structure that ties together first-person informal (tweets) and formal discourse (press releases and perspectives), third-person current (news) and consolidated (Wikipedia) discourse. These documents are connected via their authors, the issues/events they discuss and the entities that are mentioned in them. As a clarifying example consider the partial Tweet by President Trump . This tweet will be represented in our graph by connecting the text node to the author node (President Trump) and the referenced entity node (New York Gov. Cuomo). These settings are shown in Fig. test_graph Then, we propose a novel neural architecture that can process all the information in the graph together in one-shot. The architecture generates a distributed representation for each item in the graph that is contextualized by the representations of others. In our example, this results in a modified representation for the tweet and the entities thus helping us characterize the opinion of President Trump about Governor Cuomo in context of NRA or in general. Our architecture builds upon the text representations obtained from BERT . It consists of an Encoder which combines all the documents related to a given node to generate an initial node representation and a Composer which is a Graph Attention Network (GAT) that composes over the graph structure to generate contextualized node embeddings.

介绍

经常在政治话语中,同样的参数轨迹通过政治家和政治核心的事件重复。了解和理解定期使用的轨迹,在上下文中,在发生新事件时,在语境中提出的评论中是关键的。此外,它有助于我们了解他们的观点并预测他们对新事件和参与实体的反应。在政治文本中,偏向政治观点往往是微妙的,而不是明确说明。提及或省略某些实体或某些属性的选择可以揭示作者的议程。例如,当一个政治家推文对新的射击事件的反应时,他们可能反对枪支控制并支持免费枪支权利,尽管没有明确提及他们的立场。我们本文的主要见解是有效地检测文本的这种偏差需要建模文档的更广泛的背景。这可以包括了解与文本中涉及的事件相关的相关事实,作者过去表达的意识形态倾向和观点,以及作者对文本中引用的实体的情绪/态度。我们建议通过将来自多个来源的信息组合来获得该整体视图,该信息可以具有不同类型的类型,例如新闻文章,社交媒体帖子,来自新闻稿和政治家所表达的历史信仰。尽管 NLP 中的预用语言模型(PLMS)最近进行了最近的进展,但通过上下文嵌入和强大的变压器单元具有大大改进的文字表示,这种表示不足以捕捉政治话语中的细微偏差。两个主要原因是:(i)他们不会直接关注实体/发行的数据和(ii)它们仅基于周围文本而不是相关的问题/事件知识。这种方法的计算设置,需要两个必要的属性:(i)将所有不同类型的信息的输入表示,其与(ii)在一次拍摄中一起处理所有信息的能力。我们通过介绍一个图形结构来解决第一人称非正式(推文)和正式话语(新闻稿和观点),第三人称当前(新闻)和综合(维基百科)话语的图形结构来解决第一个挑战。这些文件通过其作者连接,他们讨论的问题/事件以及它们中提到的实体。作为一个澄清的例子,考虑总统特朗普总统的部分推文。通过将文本节点连接到作者节点(总统特朗普)和引用的实体节点(纽约州长 Cuomo),将在我们的图表中表示此推文。这些设置如图 4 所示。测试_praph 然后,我们提出了一种新颖的神经结构,可以在一次拍摄中处理图中的所有信息。该架构为图形中的图表中的每个项目生成分布式表示,该图形由其他人的表示。在我们的示例中,这导致推文和实体的修改表示,从而帮助我们在 NA 或一般的背景下表征总统 Cumo 的总统 Cuomo 的意见。我们的架构构建在从伯特获得的文本表示时。它由编码器组成,该编码器组合与给定节点相关的所有文档,以生成初始节点表示和作曲器,该作曲器是撰写在图形结构上以生成上下文化节点嵌入的图表网络(GAT)。

We design two self-supervised learning tasks to train the model and capture structural dependencies over the rich discourse representation, namely predicting and links over the graph structure. The intuition behind the tasks is that the model is required to understand subtle language usage to solve them. prediction requires the model to differentiate between: (i) the language of one author from another and (ii) the language of the author in context of one issue vs another issue. prediction requires the model to understand the language used by an author when discussing a particular entity given the author's historical discourse. We evaluate the resulting discourse representation via several empirical tasks identifying political perspectives at both article and author levels. Our evaluation is designed to demonstrate the importance of each component of our model and usefulness of the learning tasks. The task evaluates our model's ability to consolidate multiple documents, of different types, from a single author into a coherent perspective about an issue. This is evaluated by framing the problem as a paraphrasing task, comparing the model’s composed representation of an author with a short text expressing the stance directly, i.e., only based on the model’s pre-training process. The and tasks show that our representations capture meaningful information that make them highly effective for political prediction tasks. Both tasks build classifiers on top of the model. evaluates author and issue representations while evaluates graph-contextualized document representations. We perform for two domains: and using politician grades from two different organizations: National Rifles Association (NRA) and League of Conservation Voters (LCV). We compare our model to three competitive baselines: BERT , an adaptation of BERT to our data, and our Encoder architecture. This helps us evaluate different aspects of our model as well as our learning tasks. We also analyse the relative usefulness of various types of documents via an ablation study. The BERT adaptation baseline is designed to be trained on our learning tasks without using the Composer architecture. It helps demonstrate the effectiveness of our learning tasks and the importance of the Composer architecture. Our model outperforms the baselines on all three evaluation tasks. Finally, we perform qualitative analysis, visualizing entities' stances, demonstrating that our representations effectively capture nuanced political information. To summarise, our research contributions include:

我们设计了两个自我监督的学习任务,以培训模型并捕获丰富的话语表示,即通过图形结构预测和链接结构依赖性。任务背后的直觉是该模型需要了解要解决它们的微妙语言用法。预测需要模型来区分:(i)来自另一个作者的一个作者的语言和(ii)作者语言在一个问题上的语言与另一个问题。在讨论作者的历史话语时,预测要求模型了解作者使用的语言。我们通过识别两篇文章和作者水平的政治观点来评估所产生的话语代表。我们的评估旨在展示我们模型的每个组成部分和学习任务的有用性的重要性。该任务评估了我们模型的能力,从单个作者到一个关于一个问题的连贯的角度来巩固不同类型的多种文档。这是通过将问题绘制为释义任务来评估,比较模型的作者的组合表示与直接表达姿态的短文本,即,仅基于模型的预培训过程。 “和任务”显示我们的陈述捕获了有意义的信息,使其使它们对政治预测任务非常有效。两个任务都构建了模型顶部的分类器。评估作者和发出表示表示,同时评估图形上下文化文档表示。我们为两个域名执行:并使用两个不同组织的政治家等级:国家步枪协会(NRA)和保护选民联盟(LCV)。我们将模型与三个竞争基础的模型进行比较:BERT,对我们的数据的适应,以及我们的编码器架构。这有助于我们评估我们模型的不同方面以及我们的学习任务。我们还通过烧蚀研究分析各种文献的相对实用性。 BERT 适配基线旨在在我们的学习任务中培训,而无需使用作曲家架构。它有助于展示我们学习任务的有效性以及作曲家架构的重要性。我们的模型在所有三个评估任务中占据了基线。最后,我们进行定性分析,可视化实体的立场,表明我们的代表有效地捕获了细微的政治信息。总而言之,我们的研究贡献包括:

Due to recent advances in text representations catalysed by , and followed by , and , we are now able to create very rich textual representations that are effective in many nuanced NLP tasks. Although semantic contextual information is captured by these models, they are not explicitly designed to capture entity/event-centric information. Hence, to solve tasks that demand better understanding of such information , there is a need to create more focused representations. Of late, several works attempted to solve such tasks . But, the representations used are usually limited in scope to specific tasks and not rich enough to capture information that is useful across several tasks. Compositional Reader model, that builds upon embeddings and consists of a transformer-based Graph Attention Network inspired from and aims to address those limitations via a generic entity-issue-event-document graph, which is used to learn highly effective representations.

相关的工作

由于最近由文本表示的前进,并且,我们现在能够创建非常丰富的文本表示,这些文本表示在许多细微的 NLP 任务中有效。尽管这些模型捕获了语义上下影信息,但它们没有明确地设计用于捕获以捕获实体/活动为中心的信息。因此,要解决需要更好地理解此类信息的任务,需要创建更多的聚焦表示。较晚,有几项工程试图解决这些任务。但是,所使用的表示通常限于特定任务的范围,并且不足以捕获在多个任务中有用的信息。构成读卡器模型,它在嵌入时构建,包括基于变换器的图表关注网络,它受到了通过通用实体问题 - 事件文档图来解决这些限制的,用于学习高效的表示。

Data

Data Type Count
News Events 367
Authoring Entities 455
Referenced Entities 10,506
Wikipedia Articles 455
Tweets 86,409
Press Releases 62,257
Perspectives 30,446
News Articles 8,244
Total # documents 187,811
Average sents per doc 14.18

We collected US political text data related to $ 8 $ broad topics: . Data used for this paper was focused on $ 455 $ US senators and congressmen. We collected political text data relevant to above topics from $ 5 $ sources: press statements by political entities from ProPublica Congress API https://projects.propublica.org/api-docs/congress-api/ , Wikipedia articles describing political entities, tweets by political entities, perspectives of the senators and congressmen regarding various political issues from and news articles & background of the those political issues from . A total of $ 187,811 $ documents were used to train our model. Summary statistics are shown in Tab.

data

我们收集了与$ 8 $广泛主题相关的美国政治文本数据:。本文用于本文的数据专注于$ 455 $美国参议员和国会议员。从$ 5 $源收集与上述主题相关的政治文本数据通过政治实体,参议员和国会议员的观点,了解这些政治问题的各种政治问题。总共$ 187,811 $文件用于培训我们的模型。摘要统计显示在选项卡中。

Event Identification

Event based categorization of documents is performed as follows: news articles related to each issue are ordered by their date of publication. We find the mean ( $ \mu $ ) and standard deviation ( $ \sigma $ ) of the number of articles published per day for each issue. If more than $ \mu+\sigma $ number of articles are published on a single day for a given issue, we flag it as the beginning of an event. Then, we skip $ 7 $ days and look for a new event. Until a new event window begins, the current event window continues. We use thus obtained event windows to mark events. In our setting, events with in a given issue are non overlapping. We divide events for each issue separately, hence events for different issues overlap. These events last for $ 7-10 $ days on average and hence the non-overlapping assumption within an issue is a reasonable relaxation of reality. To illustrate our point: coronavirus and civil-rights are separate issues and hence have overlapping events. An example event related to coronavirus could be First case of COVID-19 outside of China reporte. Similarly an event about civil-rights could be that Officer who was part of George Floyd killing suspended. We inspected the events manually and found that the events are meaningful for a high percentage of inspected cases ( $ \geq85% $ events). Examples of identified events are shown in the appendix.

事件识别

基于事件的文档分类如下:与每个问题相关的新闻文章按出版日期订购。我们发现每个问题每天发布的文章数量的平均值($\mu $)和标准偏差($\sigma $)。如果超过$\mu+\sigma $在一天发布的文章,则为给定问题,我们将其标记为事件的开头。然后,我们跳过$ 7 $天并查找新事件。在新事件窗口开始之前,当前事件窗口继续。我们使用如此获得的事件窗口标记事件。在我们的设置中,在给定问题中的事件是非重叠的。我们分别划分每个问题的事件,因此不同问题的事件重叠。这些事件持续为$ 7-10 $天平均而且因此,问题中的非重叠假设是一个合理的放松现实。为了说明我们的观点:冠状病毒和公民权利是单独的问题,因此具有重叠的事件。与 Coronavirus 相关的一个例子可能是中国报告以外的 Covid-19。类似地,关于民间权利的事件可能是那个是乔治弗洛伊德杀死暂停的军官。我们手动检查了事件,发现事件对于高百分比的检查案例有意义($\geq85% $事件)。所识别事件的示例显示在附录中。

Data Pre-processing

We use Stanford CoreNLP tool , Wikifier and BERT-base-uncased implementation by to preprocess data for our experiments. We tokenize the documents, apply coreference resolution and extract referenced entities from each document. The referenced entities are then wikified using Wikifier tool . The documents are then categorized by issues and events. News articles from and perspectives from are already classified by issues. We use keyword based querying to extract issue-wise press releases from Propublica API. We use hashtag based classification for tweets. A set of gold hashtags for each issue was created and the tweets were classified accordingly Data collection is detailed in appendix . Sentence-wise BERT-base embeddings of all documents are computed.

数据预处理

我们使用斯坦福州 Corenlp 工具,Wikifier 和 BERT-Base-uncassed 实现,以预处理我们的实验。我们授权文档,应用 Coreference 解析并从每个文档中提取引用的实体。然后使用 Wikifier 工具来使用引用的实体。然后通过问题和事件进行分类。来自问题的新闻文章和观点已经被问题分类。我们使用基于关键字的查询从 Propublica API 提取问题 WISE 新闻稿。我们使用基于 HashTag 的 Clification 来推文。创建了一组用于每个问题的金色标签,并根据附录中详述的数据收集归类为临时分类。计算所有文档的句子伯爵嵌入式。

Query Mechanism

We implemented a query mechanism to obtain relevant subsets of data from the corpus. Each query is a triplet of . Given a query triplet, news articles related to the events for each of the issues, Wikipedia articles for each of the entities, background descriptions of the issues, perspectives of each entity regarding each of the issues and tweets & press releases by each of the entities related to the events in the query are retrieved. Referenced entities for each of the sentences in documents and sentence-wise BERT embeddings of the documents are also retrieved.

查询机制

我们实现了一个查询机制来获取来自语料库的相关数据子集。每个查询都是一个三联体。给定查询三联网,新闻文章与每个问题的事件相关的新闻文章,每个实体的文章,问题的背景描述,每个实体的视角都会有关每个问题和推文\和新闻稿的每个实体检索与查询中的事件相关的实体。还检索文档中的每个句子的引用实体和文档的句子 BERT Embedingings。

Compositional Reader

image

In this section, we describe the architecture of the proposed Compositional Reader' model in detail. It contains $ 3 $ key components: Graph Generator, Encoder and Composer. Given a query output of the query mechanism from Sec. query_mech, Graph Generator creates a directed graph with entities, issues, events and documents as nodes. Encoder is used to generate initial node embeddings for each of the nodes. Composer is a transformer-based Graph Attention Network (GAT) followed by a pooling layer. It generates the final node embeddings and a single summary embedding for the query graph. Each component is described below.

组成读卡器

在本节中,我们详细描述了所提出的“组成读者”模型的体系结构。它包含$ 3 $密钥组件:图形生成器,编码器和作曲家。给定秒的查询机制的查询输出。 query_mech,图形生成器将带有实体,问题,事件和文档的定向图表为节点。编码器用于为每个节点生成初始节点嵌入式。 Composer 是一个基于变压器的图表关注网络(GAT),后跟池层。它生成最终节点嵌入品和查询图嵌入的单个摘要。下面描述每个组件。

Graph Generator

Given the output of the query mechanism for a query, the Graph Generator creates a directed graph with $ 5 $ types of nodes: authoring entities, referenced entities, issues, events and documents. Directed edges are used by Composer to update source nodes' representations using destination nodes. We design the topology with the main goal of capturing the representations of events, issues and referenced entities that reflect author's opinion about them. We add edges from issues/events to author's documents but omit the other direction as our main goal is to contextualize issues/events using author's opinions. Edges are added from authoring entities to their Wikipedia articles and the documents authored by it (tweets, press releases and perspectives). Reverse edges from the authored documents to the author are also added. Uni-directional edges from relevant event nodes to the tweet and press release document nodes are added. Edges from issue nodes to event nodes and vice-versa are added. Edges from the issue nodes to their background description documents are added. Edges from event nodes to news articles describing the events and vice-versa are added. Uni-directional nodes from issue nodes to author perspective nodes are added. Finally, uni-directional edges from referenced entities to all the document nodes are added. An example graph is shown in Fig. test_graph.

image

图生成器

给定查询的查询机制的输出,图形生成器创建带有$ 5 $类型的指示图:创作实体,引用实体,问题,事件和文档。作曲家使用定向边缘来使用目标节点更新源节点的表示。我们设计了拓扑的主要目标,即捕获事件,问题和引用实体的陈述,反映了作者对他们的意见。我们将边缘从问题/事件中添加到作者文件中,但省略另一个方向,因为我们的主要目标是使用作者的意见来上下文化问题/事件。边缘从创作实体添加到他们的维基百科文章和由它创作的文件(推文,新闻稿和透视图)。还添加了从撰写文档到作者的反向边缘。添加了从相关事件节点到 Tweet 的单向边缘,然后按新版本文档节点。添加来自问题节点的边缘以及事件节点,反之亦然。从问题节点到其背景说明文档的边缘已添加。添加了从事件节点到描述事件和反之亦然的新闻文章的边缘。添加到发出节点到作者透视节点的单向节点。最后,添加了来自所有文档节点的引用实体的单向边缘。示例图在图 2 中示出。

Encoder

Encoder is used to compute the initial node embeddings. It consists of BERT followed by a Bi-LSTM. For each node, it takes a sequence of documents as input. The documents are ordered temporally. The output of Encoder is a single embedding of dimension $ d_m $ for each node. Given a node $ \mathcal{N} $ = { $ D_1 $ , $ D_2 $ , , $ D_d $ } consisting of $ d $ documents, for each document $ D_i $ , contextualized embeddings of all the tokens are computed using BERT. Token embeddings are computed sentence-wise to avoid truncating long documents. Then, token embeddings of each document are mean-pooled to get the document embeddings $ \mathcal{\vec{N}}^{bert} $ = { $ \vec{D_1}^{bert} $ , $ \vec{D_2}^{bert} $ , , $ \vec{D_d}^{bert} $ } where $ \vec{D_i}^{bert}\in\mathbb{R}^{1 \times d_m} $ , $ d_m $ is the dimension of a BERT token embedding. The sequence $ \vec{\mathcal{N}}^{bert} $ is passed through a Bi-LSTM to obtain an output sequence $ \vec{E} $ = { $ \vec{e_1} $ , $ \vec{e_2} $ , , $ \vec{e_d} $ }, $ \vec{e_i}\in\mathbb{R}^{1 \times h} $ , where $ h/2 $ is the hidden dimension of the Bi-LSTM, we set $ h = d_m $ in our model. Finally, the output of Encoder is computed by mean-pooling the sequence $ \vec{E} $ . We use BERT-base-uncased model in our experiments where $ d_m=h=768 $ . Initial node embeddings of all the document nodes are set to Encoder output of the documents themselves. For authoring entity nodes, their Wikipedia descriptions, tweets, press releases and perspective documents are passed through Encoder. For issue nodes, background description of the issue is used. For event nodes, Encoder representation of all the news articles related to the event is used. For referenced entities, all documents referring to the entity are used.

编码器

编码器用于计算初始节点嵌入式。它由 BERT 组成,然后是 BI-LSTM。对于每个节点,它需要一系列文件作为输入。文件在时间上订购。编码器的输出是每个节点的尺寸$ d_m $的单个嵌入。给定节点$\mathcal{N} $ = \ {$ D_1 $,$ D_2 $,$ D_d $ +,$ D_d $ }由$ d $文档组成,每个文档$ D_i $,使用 BERT 计算所有令牌的上下文化嵌入式。令牌嵌入式被计算句子,以避免截断截断的长文档。然后,将各文件的令牌的嵌入是均汇集获取文档的嵌入$\mathcal{\vec{N}}^{bert} $ = \ {$\vec{D_1}^{bert} $,$\vec{D_2}^{bert} $,$\vec{D_d}^{bert} $ }其中$\vec{D_i}^{bert}\in\mathbb{R}^{1 \times d_m} $,$ d_m $是嵌入伯特令牌的尺寸。序列$\vec{\mathcal{N}}^{bert} $通过 BI-LSTM 来获取输出序列$\vec{E} $ = {$\vec{e_1} $,$\vec{e_2} $,$\vec{e_d} $},$\vec{e_i}\in\mathbb{R}^{1 \times h} $,其中$ h/2 $是 BI-LSTM 的隐藏维度,我们在我们的模型中设置了$ h = d_m $。最后,通过均值汇集序列$\vec{E} $来计算编码器的输出。我们在我们的实验中使用 BERT-Base-Uncassed 模型,其中$ d_m=h=768 $。所有文档节点的初始节点嵌入为文件本身的编码器输出。对于创作实体节点,它们的维基百科描述,推文,新闻稿和透视文档通过编码器传递。对于发出节点,使用的后台说明问题。对于事件节点,使用与事件相关的所有新闻文章的编码器表示。对于引用实体,使用引用实体的所有文档。

Composer

Composer is a transformer-based graph attention network (GAT) followed by a pooling layer. We use the transformer encoding layer proposed by after removing the position-wise feed forward layer as a graph attention layer. Position-wise feed forward layer is removed because the transformer unit was originally proposed for sequence to sequence prediction tasks, but the nodes in a graph usually have no ordering relationship between them. Adjacency matrix of the graph is used as the attention mask. Self-loops are added for all nodes so that updated representation of the node also depends on its previous representation. Composer module uses $ l=2 $ graph attention layers in our experiments. Composer module generates updated node embeddings $ \mathbb{U}\in\mathbb{R}^{n \times d_m} $ and a summary embedding $ \mathbb{S}\in\mathbb{R}^{1 \times d_m} $ as outputs. The output dimension of node embeddings is $ 768 $ , same as BERT-base.
$$
\centering\begin{multlined} \mathbb{E} \in \mathbb{R}^{d_m \times n}, \mathcal{A} \in {0, 1}^{n \times n}\hfill
\mathbb{G} = layer-norm(\mathbb{E})\hfill
Q = W_q^T \mathbb{G}, Q \in \mathbb{R}^{n_h \times d_k \times n}\hfill
K = W_k^T \mathbb{G}, K \in \mathbb{R}^{n_h \times d_k \times n}\hfill
V = W_v^T \mathbb{G}, V \in \mathbb{R}^{n_h \times d_v \times n}\hfill
M = \frac{Q^{T} K}{\sqrt{d_k}}, M \in \mathbb{R}^{n_h \times n \times n}\hfill
M = mask(M, \mathcal{A})\hfill \mathbb{O} = M V^{T}, \mathbb{O} \in \mathbb{R}^{n_h d_v \times n}\hfill
\mathbb{U} = W_o^T \mathbb{O} + \mathbb{E}\hfill \mathbb{S} = mean-pool(\mathbb{U})\hfill \end{multlined}
$$
where $ n $ is number of nodes in the graph, $ d_m $ is the dimension of a BERT token embedding, $ d_k $ , $ d_v $ are projection dimensions, $ n_h $ is number of attention heads used and $ W_q \in\mathbb{R}^{d_m \times n_h d_k} $ , $ W_k \in\mathbb{R}^{d_m \times n_h d_k} $ , $ W_v \in\mathbb{R}^{d_m \times n_h d_v} $ and $ W_o \in\mathbb{R}^{n_h d_v \times d_m} $ are weight parameters to be learnt. $ \mathbb{E}\in\mathbb{R}^{d_m \times n} $ is the outputs of the encoder. $ \mathcal{A}\in{0, 1}^{n \times n} $ is the adjacency matrix. We set $ n_h=12 $ , $ d_k=d_v=64 $ in our experiments.

Composer

Composer 是一个基于变压器的图表注意网络(GAT),后跟池层。我们使用在将位置明智的前馈层作为图表层中移除后提出的变压器编码层。删除位置馈电前向前层,因为最初提出了变压器单元以序列预测任务,但图中的节点通常不具有它们之间的排序关系。图形的邻接矩阵用作注意掩模。为所有节点添加自循环,以便更新节点的表示也取决于其先前的表示。 COMPOSER 模块在我们的实验中使用$ l=2 $图注意图层。 Composer 模块生成更新的节点嵌入式$\mathbb{U}\in\mathbb{R}^{n \times d_m} $和摘要

Learning Tasks

We design $ 2 $ learning tasks to train the Compositional Reader model: Authorship Prediction and Referenced Entity Prediction. Both the tasks are different flavors of link prediction over graphs. In Authorship Prediction, given a graph, an author node and a document node with no link between them, the task is to predict if the document was authored by the author node. In the Referenced Entity Prediction task, given a graph, a document node and a referenced entity node, the task is to predict if the entity was referenced in the document. For this task, all occurrences of one entity in the text are replaced with a generic $ < $ ent $ > $ token in the document text before the document embedding is computed. Both are detailed below.

学习任务

我们设计$ 2 $学习任务培训组成读者模型:作者预测和引用实体预测。所有任务都是图形链路预测的不同口号。在作者预测中,给定图形,作者节点和文档节点,其中没有链接,任务是预测文档是否由作者节点创作。在参考实体预测任务中,给定图形,文档节点和引用的实体节点,任务是预测该实体是否在文档中引用。对于此任务,文本中的一个实体的所有出现都是在计算文件嵌入之前的文档文本中的通用$ < $ ENT $ > $令牌。两者都详述了下面。

Authorship Prediction

Authorship Prediction is designed as a binary classification task. In this task, given a graph generated by the graph generator model $ \mathcal{G} $ , an author node $ n_a $ and a document node $ n_d $ with no edges between them, the task is to predict whether or not author represented by node $ n_a $ authored the document represented by node $ n_d $ . Intuition behind this learning task is to enable our model to learn differentiating between the language of an author in context of an issue and documents by other entities or documents related to other issues. The model sees documents by the same author for the same issue in the graph and learns to decide whether the input document has similar language or not. It is a fairly simple learning task and hence is an ideal task to start pre-training our model. We concatenate the initial and final node embeddings of the author, document and also the summary embedding of the graph to obtain inputs to the fine-tuning layers for Authorship Prediction task. We add one hidden layer of dimension $ 384 $ before the classification layer. Data samples for the task were created as follows: for each of the $ 455 $ entities, for each of the $ 8 $ issues and for all events related to that issue, we fire a query to the query mechanism and use the graph generator module to obtain a data graph (Fig. test_graph). Hence, we fire $ 3,640 $ queries in total and obtain respective data graphs. To create a positive data sample, we sample a document $ d_i $ authored by the entity $ a_i $ and remove the edges between the nodes that represent the $ a_i $ and $ d_i $ . Negative samples were designed carefully in $ 3 $ batches to enable the model to learn different aspects of the language used by the author. In the first batch, we sample news article nodes from the same graph. In the second batch, we obtain tweets, press releases and perspectives of the same author but from a different issue. In the third batch, we sample documents related to the same issue but from other authors. We generate $ 421,284 $ samples in total, with $ 252,575 $ positive samples and $ 168,709 $ negative samples. We randomly split the data into training set of $ 272,159 $ samples, validation set of $ 73,410 $ samples and test set of $ 75,715 $ samples. We also perform out-sample experiments to evaluate generalization capability to unseen politicians' data. We train the model on training data from two-thirds of politicians and test on the test sets of others. Results are shown in Tab. auth_pred_res. We perform graph trimming to make the computation tractable on a single GPU. We randomly drop $ 80% $ of the news articles, tweets and press releases that are not related to the event to which $ d_i $ belongs. We use graphs with $ 200 $ - $ 500 $ nodes and batch size of $ 1 $ .

作者预测

autheration 预测被设计为二进制分类任务。在此任务中,给定由图形生成器型号$\mathcal{G} $生成的图表,作者节点$ n_a $和文档节点$ n_d $,它们之间没有边缘,任务是预测是否由节点$ n_a 表示的作者。 $编写由节点$ n_d $表示的文档。这种学习任务背后的直觉是使我们的模型能够在问题和文件的上下文中学习分化作者的语言,并由与其他问题相关的其他实体或文档的文档。该模型在图中看到了同一作者的文档对于图表中的相同问题,并学会决定输入文档是否具有类似的语言。这是一个相当简单的学习任务,因此是开始预先训练模型的理想任务。我们连接作者,文档以及摘要嵌入图形的初始和最终节点嵌入,以获取对 Autheration 预测任务的微调层的输入。我们在分类层之前添加一个隐藏的 Dimension $ 384 $。任务的数据样本是如下创建的:对于每个$ 455 $实体,对于每个$ 8 $问题以及与该问题相关的所有事件,我们向查询机制启动查询并使用图形生成器模块要获取数据图(图。Test_Graph)。因此,我们总共释放$ 3,640 $查询并获得各个数据图。要创建正数据示例,我们将由实体$ a_i $创建的文档$ d_i $,并删除表示$ a_i $和$ d_i $的节点之间的边缘。在$ 3 $批处理中仔细设计了负样本,以使模型能够了解作者使用的语言的不同方面。在第一个批处理中,我们从相同的图表中示出新闻文章节点。在第二个批处理中,我们获得了同一作者的推文,新闻稿和透视图,但是从不同的问题中获得了同一作者的透视。在第三批中,我们将与同一问题相关的文件,但从其他作者进行示例。我们总共生成$ 421,284 $样本,$ 252,575 $正样品和$ 168,709 $负样本。我们随机将数据分成$ 272,159 $样本的训练集,验证组$ 73,410 $样本和$ 75,715 $样本的测试集。我们还执行超出样本实验,以评估未来政治家数据的概括能力。我们培训了从三分之二的政客培训数据的模型,并测试了其他三分之二的测试集。结果显示在标签中。 auth_pred_res。我们执行图形修剪,以使计算在单个 GPU 上进行易行。我们随机丢弃 News 文章的$ 80% $与$ d_i $所属的事件无关的推文和新闻稿无关。我们使用$ 200 $-$ 500 $节点和$ 1 $的批量大小的图表。

Referenced Entity Prediction

This task is also designed as binary classification. Given a graph $ \mathcal{G} $ , document node $ d_i $ and referenced entity node $ r_i $ from $ \mathcal{G} $ , the task is to predict whether or not $ r_i $ is referenced in $ d_i $ . To create data samples for this task, we sample a document from the data graph, replace all occurrences of the most frequent referenced entity in the document with a generic $ < $ ent $ > $ token. We remove the link between $ r_i $ and $ d_i $ in $ \mathcal{G} $ . Triplet ( $ \mathcal{G} $ , $ d_i $ , $ r_i $ ) is used as a positive data sample. We sample another referenced entity $ r_j $ from the graph, that is not referenced in $ d_i $ , to generate a negative sample. Intuition behind this learning task is to enable our model to learn the correlation between the author, language in the document and the referenced entity. For example, in context of recent Donald Trump's impeachment hearing, consider the sentence X needs to face the consequences of their actions'. Depending upon the author, X could either be ' or '. Learning to understand such correlations by looking at other documents from the same author is a useful training task for our model. This is also a harder learning problem than Authorship Prediction. We use fine-tuning architecture similar to Authorship Prediction on top of Compositional Reader for this task as well. We keep separate fine-tuning parameters for each task as they are fundamentally different prediction problems. Compositional Reader is shared. We generated $ 252,578 $ samples for this task, half of them positive. They were split into $ 180,578 $ training samples, validation and test sets of $ 36,400 $ samples each. We apply graph trimming for this task as well. We also perform out-sample evaluation for this learning task.

引用的实体预测

此任务也被设计为二进制分类。给定图$\mathcal{G} $,文档节点$ d_i $和引用的实体节点$ r_i $从$\mathcal{G} $中,任务是预测$ d_i $是否在$ d_i $中引用。要为此任务创建数据示例,我们将从数据图中调制文档,用通用$ < $ ENT $ > $令牌替换文档中最常用的引用实体的所有发生。我们在$\mathcal{G} $中删除$ r_i $和$ d_i $之间的链接。三联网($\mathcal{G} $,$ d_i $,$ r_i $)用作正数据样本。我们从图形中示出另一个引用的实体$ r_j $,在$ d_i $中未引用,以生成否定样本。在此学习任务背后的直觉是启用我们的模型,以了解文档中的作者,语言与引用实体之间的相关性。例如,在最近唐纳德特朗普的弹劾听证会上,考虑句子“X 需要面对他们行动的后果”。根据作者,X 可以是“或”“或”“。学习通过查看来自同一作者的其他文档来了解此类相关性是我们模型的有用培训任务。这也是一个难的学习问题而不是作者预测。我们使用与此任务的组成读卡器顶部的作者预测类似的微调架构。我们为每项任务保留单独的微调参数,因为它们是根本不同的预测问题。分享组成读卡器。我们为此任务生成$ 252,578 $样本,其中一半是正的。它们分为$ 180,578 $培训样本,验证和测试集$ 36,400 $样本。我们也适用于此任务的图形修剪。我们还对这项学习任务进行了对样本评估。

Evaluation

We evaluate our model and pre-training tasks in a systematic manner using several quantitative tasks and qualitative analysis. Quantitative evaluation includes NRA Grade Paraphrase' task, Grade Prediction' on NRA and LCV grades data followed by Bias Predication' task on AllSides news articles. Qualitative evaluation includes entity-stance visualization for issues. We compare our model's performance to BERT representations, the BERT adaptation baseline and representations from the Encoder module. Baselines and the evaluation tasks are detailed below.

评估

我们使用若干定量任务和定性分析评估我们的模型和预培训任务。定量评估包括 NRA 和 LCV 等级数据的 NRA 级别释义的任务,“等级预测”,后跟 Allsides 新闻文章的“偏差预测”任务。定性评估包括问题的实体 - 立场可视化。我们将模型的性能与 BERT 表示,BERT 适配基线和来自编码器模块的表示进行比较。基线和评估任务如下所述。

BaselinesBERT:

We compute the results obtained by using pooled BERT representations of relevant documents for each of the quantitative tasks. Details of the chosen documents and the pooling procedure is described in the relevant task subsections. We compare the performance of our model to the results obtained by using initial node embeddings generated from the Encoder for each of the quantitative tasks. We design a BERT adaptation baseline for the learning tasks. BERT adaptation is equivalent to using only the Encoder's initial node embeddings of the Compositional Reader model. While BERT adaptation and Encoder share exactly the same architecture, Encoder parameters are trained via back-propagation through the Composer, while BERT adaptation parameters are trained directly using our learning tasks. In BERT adaptation, once we generate the data graph, we pass the mean-pooled sentence-wise BERT embeddings of the node documents through a Bi-LSTM. We mean-pool the output of Bi-LSTM to get node embeddings. We use fine-tuning layers on top of thus obtained node embeddings for Authorship Prediction and Referenced Entity Prediction tasks. BERT Adaptation baseline allows us to showcase the importance of our proposed training tasks via comparison with representations as well as the effectiveness of our Composer architecture in comparison to Compositional Reader model.

BaselinesBERT:

我们计算通过为每个定量任务使用相关文件的池伯特表示获得的结果。相关任务小节中描述了所选文档和汇集过程的详细信息。我们将模型的性能与通过使用从编码器生成的每个定量任务中的每个初始节点嵌入获得的结果进行比较。我们为学习任务设计 BERT 适配基线。 BERT 适配相当于仅使用编码器的组成读卡器模型的初始节点嵌入。虽然 BERT 适配和编码器共享完全相同的架构,但是通过 Composer 通过反向传播训练编码器参数,而 BERT 适配参数使用我们的学习任务直接培训。在 BERT 适配中,一旦我们生成数据图,我们通过 BI-LSTM 通过节点文档的均值汇集句子 BERT 嵌入。我们的意思是 - 池池是 Bi-LSTM 的输出来获取节点嵌入品。我们在由此获得的节点嵌入的顶部使用微调层进行作者预测和引用的实体预测任务。 BERT 适配基线允许我们通过与表示的比较来展示我们提出的培训任务的重要性以及与 Composer 架构的有效性与组成读者模型相比。

NRA Grades Evaluation

National Rifle Association (NRA) assigns letter grades (A+, A, , F) to politicians based on candidate questionnaire and their gun-related voting. We evaluate our representations on their ability to predict these grades. Our intuition behind this evaluation is that the language in the tweets, press releases and perspectives of a politician directly helps in predicting their NRA grade. We evaluate our model on
$ 2 $ tasks, namely, Paraphrase Task' and Grades Prediction Task'. In the Paraphrase task, we evaluate the representations from our model directly without training on NRA grades data. In the Grade Prediction task, we use the representations from our model and fine-tune on grades data. We collected the historical data of politicians NRA grades from . Grade data is available for $ 349 $ out of
$ 455 $ politicians in focus. For each politician $ p_i $ , we obtain data for the query (pi
, guns, all guns-related events). We input the data to Compositional Reader and take the final node embeddings of nodes representing the politician $ \vec{n}_{auth} $ , issue $ \vec{n}_{guns} $
and referenced entity $ \vec{n}_{NRA} $ . For some politicians , $ \vec{n}_{NRA} $ is not available, depending on whether or not they referred to NRA in their discourse. These embeddings are used for both the prediction and paraphrase tasks. We repeat the Grade Prediction' task with grades from data for the issue . The tasks are detailed below. In this task, we evaluate our representations directly training on the NRA grade data. Grades are divided into two classes: higher than, and including, B+ are in the positive class and all grades from C+ to F are classified as negative. We formulate a representative sentence for each class: noitemsep

  • POSITIVE: I strongly support the NRA
  • NEGATIVE: I vehemently oppose the NRA

We compute BERT embeddings for the representative sentences to obtain $ \vec{pos}_{NRA} $ and $ \vec{neg}_{NRA} $ . We mean-pool the three embeddings $ \vec{n}_{auth} $ ,
$ \vec{n}_{guns} $ and $ \vec{n}_{NRA} $ to obtain $ \vec{n}_{stance} $ . We compute cosine similarity of $ \vec{n}_{stance} $ with $ \vec{pos}_{NRA} $ & $ \vec{neg}_{NRA} $ . Politician is assigned the higher similarity class. We compare our model's results to , BERT adaptation and Encoder embeddings. For , we compute $ \vec{n}_{stance} $ by mean-pooling the sentence-wise BERT embeddings of tweets, press releases and perspectives of the author on all events related to the issue .

NRA 评级评估

国家步枪协会(NRA)根据候选问卷及其枪支有关的投票分配给政治家的函件等级(A +,A,,F)。我们评估我们对预测这些成绩的能力的陈述。我们的直接背后的评估是推文中的语言,政治家的新闻稿和透视直接有助于预测他们的 NRA 等级。我们在$ 2 $任务上评估我们的模型,即“释义任务”和“成绩预测任务”。在释义任务中,我们直接评估我们的模型的表示,而不对 NRA 等级数据进行培训。在等级预测任务中,我们使用模型和微调对等级数据的陈述。我们收集了政治家 NRA 等级的历史数据。等级数据可用于$ 455 $在焦点中的$ 349 $。对于每个政治家$ p_i $,我们获取查询的数据($ p_i $ ,所有相关的事件)。我们将数据输入组成读卡器,并采取表示政治家的节点的最终节点嵌入($ \vec{n}_{auth} $),发出($ \vec{n}_{guns} $)和参考实体($ \vec{n}_{NRA} $)。对于某些政客来说,$ \vec{n}_{NRA} $不可用,具体取决于他们是否在其话语中提到 NRA。这些嵌入物用于预测和释义任务。我们重复“等级预测”任务与来自问题的数据等级。任务如下所述。在此任务中,我们评估我们的陈述直接培训 NA 级数据。等级分为两类:高于,包括,B + 位于正类,C + 到 F 的所有等级被归类为负。我们为每个班级制定代表句子:noitemsep - 肯定: 我强烈支持 Nra - 负面:我强烈地反对 NRA 我们计算代表句子的 BERT Embedings,以获取$ \vec{pos}_{NRA} $和$ \vec{neg}_{NRA} $。我们的意思是 - 池三个嵌入式$ \vec{n}_{auth} $,$ \vec{n}_{guns} $和$ \vec{n}_{NRA} $获取$ \vec{n}_{stance} $。使用$ \vec{pos}_{NRA} $ & $ \vec{neg}_{NRA} $计算
$ \vec{n}_{stance} $的余弦相似性。政治家被分配了更高的相似类。我们将模型的结果进行比较,BERT 适配和编码器 Embeddings。因为,通过均值汇集句子的句子,在所有与问题相关的所有事件上汇集$\vec{n}_{stance} $,按作者的所有事件的所有事件的句子。

Results are shown in Tab. quant_eval. This is as a $ 5 $ -class classification task, one class for each letter grade: {A, B, C, D & F}. We train a simple feed-forward network with $ 1 $ hidden layer of dimension $ 1000 $ . The network is given $ 2 $ inputs $ \vec{n}_{auth} $ & $ \vec{n}_{guns} $ . When $ \vec{n}_{NRA} $ is available for an entity, we set $ \vec{n}_{guns} $ = ( $ \vec{n}_{NRA} $ , $ \vec{n}_{guns} $ ). The network's output is a classification prediction. We randomly divide the NRA Grades data into $ k=10 $ folds and we train the model with $ 8 $ folds and check the performance on $ 1 $ test fold. We use $ 1 $ fold for validation. We repeat this experiment with each fold as the test fold and then the entire process for $ 5 $ random seeds. We perform this evaluation for , BERT adaptation, Encoder and Compositional Reader. To compute $ \vec{n}_{auth} $ for , we mean-pool the sentence-wise embeddings of all author documents on . For $ \vec{n}_{guns} $ , we use the background description document of issue . Results on the test set are in Tab. quant_eval. Further, we also perform experiments by training the model on a fraction of the data. We monitor the validation and test performances with change in training data percentage. We observe that, in general, the gap between Compositional Reader model and the BERT baseline widens with increase in training data. It hints that our representation likely captures more relevant information for this task. Results are included in the Appendix.

结果显示在标签中。 quant_eval。这是一个$ 5 $-class 分类任务,每个字母等级的一个类:\ {a,b,c,d \&f }。我们用$ 1 $隐藏层培训简单的前馈网络$ 1000 $。网络是给出$ 2 $输入$ \vec{n}_{auth} $ \&$ \vec{n}_{guns} $。当$ \vec{n}_{NRA} $可用于实体时,我们设置$ \vec{n}_{guns} $ =($\vec{n}_{NRA} $,$ \vec{n}_{guns} $)。网络的输出是分类预测。我们将 NRA 数据随机分为$ k=10 $折叠,我们用$ 8 $折叠培训模型,并检查$ 1 $测试折叠上的性能。我们使用$ 1 $折叠进行验证。我们用每个折叠重复该实验,作为测试折叠,然后是$ 5 $随机种子的整个过程。我们对 Bert 适配,编码器和组成读卡器进行此评估。计算$ \vec{n}_{auth} $ for,我们的意思是池池匿名欺诈所有作者文件的嵌入式。对于$ \vec{n}_{guns} $,我们使用问题的背景文档。测试集的结果在选项卡中。 quant_eval。此外,我们还通过培训模型在数据的一小部分上进行实验。我们监控验证和测试性能,随着培训数据百分比的变化。我们观察到,通常,组成读卡器模型与 BERT 基线之间的间隙随着训练数据的增加而加宽。它提示我们的代表可能会捕获此任务的更多相关信息。结果包含在附录中。

LCV Grade Prediction Task

This is similar to NRA Grade Prediction task. It is a $ 4 $ -way classification task. LCV assigns a score ranging between $ 0 $ - $ 100 $ to each politician depending upon their environmental voting activity. We segregate politicians into $ 4 $ classes ( $ 0-25 $ , $ 25-50 $ , $ 50-75 $ , $ 75-100 $ ). We obtain input to the prediction model by concatenating $ \vec{n}_(auth) $ and $ \vec{n}_{environment} $ . We use same fine-tuning architecture as NRA Grade Prediction task with a fresh set of parameters. Results are shown in Tab.

LCV 等级预测任务

这类似于 NA 级预测任务。它是一个$ 4 $ -way 分类任务。 LCV 根据其环境投票活动分配$ 0 $-$-$ 100 $之间的分数范围。我们将政客分离为$ 4 $类($ 0-25 $,$ 25-50 $,$ 50-75 $,$ 75-100 $)。通过连接$ \vec{n}_(auth) $和$ \vec{n}_{environment} $,我们通过连接到预测模型来获得输入。我们使用与 NRA 级预测任务相同的微调架构,具有一组新的参数。结果显示在标签中。

Bias Prediction in News Articles

In this task, we evaluate the ability of the graph-contextualized representations of the documents to predict bias in news articles. This tasks evaluates the usefulness of the composer architecture in enriching the representations of the documents by propagating information via the referenced entity nodes. We use news articles collected from for this task. These articles are different from the ones used in our learning tasks. The news displayed on is labeled left/right/center leaning by the website. We create an issue node, all news articles related to the issue and all the entities that are referenced in the news articles. We initiate the embeddings of the news articles with mean-pooled sentence-wise BERT embeddings of the articles. We use the description from for the issue node. Then, we compute updated representations for the articles by running the encoder-composer architecture on the graph. We use the updated representations for $ 3 $ -way bias prediction task. We don't train the encoder-composer parameters in this task. We use $ 5,828 $ training, $ 979 $ validation and $ 354 $ test examples. Results for this task are show in Tab quant_eval.

Model Paraphrase All Grades Paraphrase A/F Grades NRA Test Acc LCV Test Acc Bias Pred Test Acc Bias Pred Test F1
BERT $ 41.55% $ $ 38.52% $ $ 54.83 \pm 1.79 $ $ 52.63 \pm 1.21 $ $ 48.31 \pm 0.04 $ $ 31.47 \pm 0.04 $
BERT Adap. $ 37.54% $ $ 42.62% $ $ 69.95 \pm 3.33 $ $ 59.09 \pm 1.77 $ $ 50.11 \pm 0.01 $ $ 34.25 \pm 0.00 $
Encoder $ 56.16% $ $ 48.36% $ $ 81.34 \pm 0.86 $ $ 63.42 \pm 0.35 $ $ 44.80 \pm 0.05 $ $ 30.47 \pm 0.04 $
Comp. Reader $ 63.32% $ $ 63.93% $ $ 81.62 \pm 1.23 $ $ 62.24 \pm 0.56 $ $ 56.95 \pm 0.03 $ $ 41.52 \pm 0.02 $

偏置预测

在此任务中,我们评估文档的图形上下文化表示预测新闻文章中的偏差的能力。该任务通过通过引用的实体节点传播信息来评估作曲家架构在丰富文档的表示中的有用性。我们使用从此任务中收集的新闻文章。这些文章与我们学习任务中使用的文章不同。显示的新闻被标记为左/右/中心,由网站倾斜。我们创建一个问题节点,所有与问题相关的新闻文章和新闻文章中引用的所有实体。我们启动了新闻文章的嵌入式文章的含义汇总刑期嵌入。我们使用 Description from for 问题节点。然后,我们通过在图中运行编码器 - Composer 架构来计算文章的更新表示。我们使用$ 3 $ -way 偏置预测任务的更新表示。我们不会在此任务中培训编码器 - Composer 参数。我们使用$ 5,828 $培训,$ 979 $验证和$ 354 $测试示例。此任务的结果显示在 Quant_eval 中显示。

Opinion Descriptor Generation

This task demonstrates a simple way to interpret our contextualized representations as natural language descriptors. It is an unsupervised qualitative evaluation task. We generate opinion descriptors for authoring entities for specific issues. We use the final node embedding of the issue node ( $ \vec{n}_{issue} $ ) for each politician to generate opinion descriptors. Inspired from , we define our candidate space for descriptors as the set of adjectives used by the entity in their tweets, press releases and perspectives related to an issue. Although uses verbs as relationship descriptor candidates, we opine that adjectives describe opinions better. We compute the representative embedding for each descriptor by mean-pooling the contextualized embeddings of that descriptor from all its occurrences in the politician's discourse. This is the one of the key differences with prior descriptor generation works such as and . They work in a static word embedding space. But, our embeddings are contextualized and also reside in a higher dimensional space. In an unsupervised setting, this makes it more challenging to translate from distributional space to natural language tokens. Hence, we restrict the candidate descriptor space more than and . We rank all the candidate descriptors according to cosine similarity of its representative embedding with the vector $ \vec{n}_{issue} $ . We present some of the results in Tab. opinion_descs. In contrast to and , our model doesn't need the presence of both the entities in text to generate opinion descriptors. This is often the case in first person discourse. Results are shown in table opinion_descs. and both take a set of documents and entity pairs as inputs and generate relationship descriptors for the entity pairs in an unsupervised setting. They are both trained in an encoder-decoder style training process in an unsupervised manner. Given new text with an entity pair, they generate $ d $ descriptor embeddings that are used to rank candidate descriptors. uses entire vocabulary space while uses $ 500 $ most frequent verbs. In contrast, our model doesn't need the presence of both the entities in text to generate opinion descriptors. This often tends to be the case in tweets and press releases as they are generated directly by the author (first-person discourse). Our model is also capable of summarizing over multiple documents and generating descriptors for several referenced entities and issues at once while they deal with one entity-pair at a time.

意见描述符生成

此任务演示了一种简单的方法来将我们的上下围化表示作为自然语言描述符。这是一个无人监督的定性评估任务。我们为特定问题生成用于创作实体的意见描述符。我们使用问题节点($ \vec{n}_{issue} $)的最终节点嵌入到每个政治家来生成意见描述符。灵感来自,我们为描述符定义了我们的候选空间,作为其推文中的实体中使用的形容词集,新闻稿和与问题相关的透视图。虽然使用动词作为关系描述符候选人,但我们扮演形容词更好地描述意见。通过均值从政治家的话语中的所有情况汇总该描述符的上下文化嵌入式来计算每个描述符的代表性嵌入。这是与先前描述符生成工作的关键差异之一,例如和。他们在静态词嵌入空间中工作。但是,我们的嵌入品是情调化的,并且还存在于更高的尺寸空间。在一个无人监督的环境中,这使得从分布空间转换为自然语言令牌更具挑战性。因此,我们超出了候选描述符的空间。我们根据其代表性嵌入的余弦相似性与矢量$ \vec{n}_{issue} $等所有候选描述符。我们在标签中介绍了一些结果。意见_descs。与之相反,我们的模型不需要文本中的实体都存在以生成意见描述符。第一人称话语往往是这种情况。结果显示在表意见_descs 中。两者都占据一组文档和实体对作为输入,并在无监督设置中生成实体对的关系描述符。它们均以无监督的方式在编码器 - 解码器风格培训过程中培训。给定具有实体对的新文本,它们生成用于对候选描述符进行排名的$ d $描述符嵌入式。使用整个词汇空间,同时使用$ 500 $最常用的动词。相比之下,我们的模型不需要文本中的实体的存在来生成意见描述符。这通常往往是推文和新闻稿中的情况,因为作者(第一人称话语)直接生成。我们的模型还能够总结多个文档,并在一次处理一个实体对时,在处理一个实体和问题时生成描述符。

Results

In this section, we present the results of the learning tasks, followed by the quantitative and qualitative evaluation results. Results in table quant_eval show the usefulness of various components of the architecture. BERT adaptation shows the effectiveness of our learning tasks, while Encoder results show that the same architecture when trained along with Composer generates better representations. Compositional Reader results show the effectiveness of our entire model. Further, qualitative evaluation shows that our embeddings capture meaningful information about entities and issues both.

结果

在本节中,我们介绍了学习任务的结果,其次是定量和定性评估结果。表 Quant_eval 显示了架构各种组成部分的有用性。 BERT 适配显示了我们的学习任务的有效性,而编码器结果表明,当与作曲家一起训练时相同的架构产生更好的表示。组成读卡器结果表明了我们整个模型的有效性。此外,定性评估表明,我们的嵌入物捕获有关实体的有意义的信息和问题。

Learning Tasks

First, we present the results of Authorship Prediction and Referenced Entity Prediction tasks in tables auth_pred_res&ref_ent_pred_res respectively. Compositional Reader outperforms BERT adaptation baseline on all metrics. On Authorship Prediction, out-sample performance doesn't drop for either model, validating our graph formulation which allows the model to learn linguistic nuances as opposed to over-fitting. On Referenced Entity Prediction, F1 score for our model improves from $ 77.51 $ from in-sample to $ 78.62 $ on out-sample while BERT adaptation baseline's F1 drops slightly from $ 75.21 $ to $ 73.67 $ .

Model IS Acc IS F1 OS Acc OS F1
BERT Adap. 93.01 92.31 95.56 95.20
Comp. Reader 99.49 99.47 99.42 99.39
Model IS Acc IS F1 OS Acc OS F1
BERT Adap. 76.57 75.21 76.26 73.67
Comp. Reader 78.52 77.51 78.98 78.62
Issue Opinion Descriptors Issue Opinion Descriptors
Mitch McConnell Republican Nancy Pelosi Democrat
abortion fundamental, hard, eligible, embryonic, unborn abortion future, recent, scientific, technological, low
environment achievable, more, unobjectionable, favorable, federal environment forest, critical, endangered, large, clear
guns substantive, meaningful, outdone, foreign, several guns constitutional, ironclad, deductible, unlawful, fair
immigration federal, sanctuary, imminent, address, comprehensive immigration immigrant, skilled, modest, overall, enhanced
Donald Trump Republican Joe Biden Democrat
guns terrorist, public, ineffective, huge, inevitable, dangerous guns banning, prohibiting, ban, maintaining, sold
immigration early, dumb, birthright, legal, difficult taxes progressive, economic, across-the-board, annual, top

学习任务

首先,我们介绍了 Tables auth_pred_res \&ref_ent_pred_res 中的作者预测和引用实体预测任务的结果。组成读者在所有指标上占 BERT 适应基线。在作者预测上,Out 样本性能不会下降到任何一种模型,验证我们的图形制剂,允许模型学习语言细微差别,而不是过度拟合。在参考实体预测中,我们的模型的 F1 分数从样本中的$ 77.51 $改善为$ 78.62 $,而 BERT 适配基线的 F1 从$ 75.21 $略微下降到$ 73.67 $。

Quantitative EvaluationGrade Paraphrase

Further, we present the results of NRA Grade Paraphrase Task in Tab. quant_eval. Representations from Compositional Reader achieve $ 63.32% $ accuracy. If we use only Encoder output, we get $ 56.16% $ . Mean-pooled BERT-base embeddings get $ 41.55% $ . Using node embeddings from BERT adaptation model yields $ 37.54% $ . When we evaluate using only A or F grades, we obtain $ 63.93% $ accuracy for Compositional Reader, $ 48.36% $ for Encoder, $ 42.62% $ for BERT adaptation and $ 38.52% $ for mean-pooled BERT. Results of Grade Prediction task are shown in Tab. quant_eval. On , which is a $ 5 $ -way classification task, our model achieves an accuracy of $ 81.62 \pm 1.23 $ on the test set. Our model outperforms BERT representations by $ 26.79 \pm 3.02 $ absolute points on the test set. On task which is a $ 4 $ -way classification, our model achieves $ 9.61 \pm 1.77 $ point improvement over BERT representations. The results are shown in Tab. quant_eval. Compositional Reader achieves $ 8.64 \pm 0.07 $ point test accuracy improvement over BERT embeddings of the documents on this task. The task is a $ 3 $ -way classification task. The classes are imbalanced with fewer examples for center articles, hence we reported the macro-F1 scores.

定量评估释放释放

进一步,我们介绍了选项卡中的 NA 级别释义任务的结果。 quant_eval。作曲读取器的表示实现$ 63.32% $精度。如果我们仅使用编码器输出,我们会得到$ 56.16% $。均值粘合的 BERT 基础嵌入物获取$ 41.55% $。使用 BERT 适配模型的节点嵌入量产生$ 37.54% $。当我们使用仅使用“A”或“F”等级进行评估时,我们获得$ 63.93% $精度,用于编码器$ 48.36% $,用于 BERT 适配的$ 42.62% $,用于均值粘合的均衡器的$ 38.52% $。等级预测任务的结果显示在标签中。 quant_eval。 ON,这是$ 5 $ -way 分类任务,我们的模型在测试集上实现了$ 81.62 \pm 1.23 $的准确性。我们的模型在测试集上的$ 26.79 \pm 3.02 $绝对点优于 BERT 表示。在任务上是$ 4 $ -way 分类,我们的模型实现了$ 9.61 \pm 1.77 $点对 BERT 表示的改进。结果显示在标签中。 quant_eval。组成读卡器$ 8.64 \pm 0.07 $点测试精度改进在此任务上的文档的 BERT 嵌入。该任务是$ 3 $ -way 分类任务。对于中心文章的示例,类是不平衡的,因此我们报告了宏 F1 分数。

Qualitative Evaluation

image

image

We perform Principle Component Analysis (PCA) on issue embeddings ( $ \vec{n}_{issue} $ ) of politicians obtained using the same method as in NRA Grade prediction. We show one such interesting visualization in Fig. rooney_mcconnell. Mitch McConnell is a Republican who expressed right-wing views on both and . Bernie Sanders is a Democrat that expressed left-wing views on both. Francis Rooney is a Republican who expressed right-wing views on but left-wing views on . Fig. rooney_mcconnell demonstrates that this information is captured by our representations. Further examples are in the appendix. We present visualization of politicians on the issue in Fig. pol_vis. We observe that tends to be a polarizing issue. This shows that our representations are able to effectively capture relative stances of politicians. We have included such visualizations for other $ 7 $ issues in the appendix. We observe that issues that have traditionally had clear conservative vs liberal boundaries such as & are more polarized compared to issues that evolve with time such as &. We show the results of opinion descriptor generation for few politicians on table opinion_descs. These results show the most representative adjectives used by the politicians in context of each of the issues. It can be observed that these descriptors provide a fair reflection of these politicians' views on the issues in focus.

定性评价

我们对问题的嵌入进行主成分分析(PCA)($ \vec{n}_{issue} $)政治家获得使用与 NA 级预测相同的方法。我们在图中显示了一个如此有趣的可视化。Rooney_McConnell。 Mitch McConnell 是一个在两者和两者上表达右翼视图的共和党人。 Bernie Sanders 是一个民主党人,表达了两者的左翼观点。 Francis Rooney 是一名共和党人,他们在左翼观点上表达了右翼视图。图。Rooney_McConnell 展示了我们的表示捕获了此信息。进一步的例子是附录中。我们在图中的问题上显示了政客的可视化。POL_VIS。我们观察到这往往是一个偏振问题。这表明我们的代表能够有效地捕捉政治家的相对立场。我们在附录中包含了其他$ 7 $问题的可视化。我们遵守传统上具有明确保守的问题与\&更加偏振的问题,与随时间发展的问题相比,与\&。我们展示了桌子意见少数政客的意见描述符生成的结果。这些结果显示了政治家在每个问题的背景下使用的最具代表性的形容词。可以观察到这些描述符对这些政治家对焦点问题的看法提供了公平的反映。

Ablation Analysis

Further, we investigate the importance of various components of our model. We perform ablation study over various types of documents on the NRA Grades Paraphrase task. the results are shown in Tab.

Model All Grades
Comp.Reader 63.32%
-Tweets 63.32%
-Press Releases 63.04%
-Perspectives 59.31%
Only Tweets 40.11%
Only Press Releases 55.87%
Only Perspectives 60.74%

Results in Tab. ablation_study indicate that are most useful while are the least useful documents for the task. As are summarized ideological leanings of politicians, it is intuitive that they are more effective for this task. Tweets are informal discourse and tend to be very specific to a current event, hence they are not as useful for this task.

消融分析

进一步调查了我们模型各种组件的重要性。我们对 NRA 等级释义任务的各种文档进行了消融研究。结果显示在标签中。

结果选项卡。 ablation_study 表示最有用的,而任务是最不有用的文档。正如政治家的总结思想倾向,他们直观地对这项任务更有效。推文是非正式话语,往往是目前的事件非常具体,因此它们对此任务并不有用。

Conclusion

We propose a Compositional Reader model that builds upon representations from and generates more effective representations. We design learning tasks and train our model on large amounts of political data. We evaluate our model on several qualitative and quantitative tasks. We comprehensively outperform BERT-base model on both learning tasks and quantitative evaluation tasks. Results from our qualitative evaluation demonstrate that our representations effectively capture nuanced political information.

结论

我们提出了一种构成读者模型,在从陈述中建立并产生更有效的表示。我们设计学习任务并在大量的政治数据上培训我们的模型。我们以多种定性和量化任务评估我们的模型。我们全面倾向于学习任务和定量评估任务的频率基础模型。我们的定性评估结果表明,我们的代表有效地捕捉了细微的政治信息。