Large-scale Simple Question Answering with Memory Networks

基于记忆网络的大规模简单问答

Antoine Bordes

Abstract

摘要

Training large-scale question answering systems is complicated because training sources usually cover a small portion of the range of possible questions. This paper studies the impact of multitask and transfer learning for simple question answering; a setting for which the reasoning required to answer is quite easy, as long as one can retrieve the correct evidence given a question, which can be difficult in large-scale conditions. To this end, we introduce a new dataset of $100\mathrm{k}$ questions that we use in conjunction with existing benchmarks. We conduct our study within the framework of Memory Networks (Weston et al., 2015) because this perspective allows us to eventually scale up to more complex reasoning, and show that Memory Networks can be successfully trained to achieve excellent performance.

训练大规模问答系统十分复杂，因为训练数据通常只覆盖了潜在问题范围的很小一部分。本文研究了多任务学习和迁移学习在简单问答任务中的影响——在这种设定中，只要能够根据问题检索到正确证据（这在大规模场景下可能很困难），所需的推理过程其实相当简单。为此，我们引入了一个包含 $100\mathrm{k}$ 问题的新数据集，并将其与现有基准结合使用。我们在记忆网络（Memory Networks）框架[20]下开展研究，因为该框架最终能扩展到更复杂的推理场景。实验表明，通过适当训练，记忆网络能够取得卓越性能。

1 Introduction

1 引言

Open-domain Question Answering (QA) systems aim at providing the exact answer(s) to questions formulated in natural language, without restriction of domain. While there is a long history of QA systems that search for textual documents or on the Web and extract answers from them (see e.g. (Voorhees and Tice, 2000; Dumais et al., 2002)), recent progress has been made with the release of large Knowledge Bases (KBs) such as Freebase, which contain consolidated knowledge stored as atomic facts, and extracted from different sources, such as free text, tables in webpages or collaborative input. Existing approaches for QA from KBs use learnable components to either transform the question into a structured KB query (Berant et al., 2013) or learn to embed questions and facts in a low dimensional vector space and retrieve the answer by computing similarities in this embedding space (Bordes et al., 2014a). However, while most recent efforts have focused on designing systems with higher reasoning capabilities, that could jointly retrieve and use multiple facts to answer, the simpler problem of answering questions that refer to a single fact of the KB, which we call Simple Question Answering in this paper, is still far from solved.

开放域问答 (Open-domain QA) 系统旨在为自然语言表述的问题提供精确答案，且不受领域限制。虽然基于文本检索或网络搜索的问答系统已有较长发展历史 (例如 Voorhees 和 Tice，2000；Dumais 等，2002)，但随着 Freebase 等大型知识库 (Knowledge Base) 的发布，该领域取得了新进展——这些知识库以原子事实形式存储从自由文本、网页表格或众包输入等多源提取的整合知识。现有基于知识库的问答方法主要采用可学习组件：或将问题转换为结构化查询 (Berant 等，2013)，或将问题与事实嵌入低维向量空间并通过相似度计算检索答案 (Bordes 等，2014a)。然而，尽管近期研究多聚焦于设计具备多事实联合推理能力的系统，但针对仅涉及知识库单一事实的简单问答 (本文称为 Simple Question Answering) 这一基础问题，其解决程度仍远未完善。

Hence, existing benchmarks are small; they mostly cover the head of the distributions of facts, and are restricted in their question types and their syntactic and lexical variations. As such, it is still unknown how much the existing systems perform outside the range of the specific question templates of a few, small benchmark datasets, and it is also unknown whether learning on a single dataset transfers well on other ones, and whether such systems can learn from different training sources, which we believe is necessary to capture the whole range of possible questions.

因此，现有基准测试规模较小，主要覆盖事实分布的头部，且在问题类型、句法和词汇变化方面受限。如此一来，现有系统在少数小型基准数据集特定问题模板范围外的表现仍属未知，也无法确定单一数据集的学习能否良好迁移到其他数据集，以及此类系统能否从不同训练源中学习（我们认为这对覆盖所有可能问题范围至关重要）。

Besides, the actual need for reasoning, i.e. constructing the answer from more than a single fact from the KB, depends on the actual structure of the KB. As we shall see, for instance, a simple preprocessing of Freebase tremendously increases the coverage of simple QA in terms of possible questions that can be answered with a single fact, including list questions that expect more than a single answer. In fact, the task of simple QA itself might already cover a wide range of practical usages, if the KB is properly organized.

此外，实际对推理的需求（即从知识库(KB)中多个事实构建答案）取决于知识库的实际结构。例如，我们将看到，对Freebase进行简单的预处理能极大提高简单问答的覆盖率，即仅需单个事实即可回答的问题数量，包括需要多个答案的列表问题。事实上，如果知识库组织得当，简单问答任务本身可能已涵盖广泛的实用场景。

This paper presents two contributions. First, as an effort to study the coverage of existing systems and the possibility to train jointly on different data sources via multitasking, we collected the first large-scale dataset of questions and answers based on a KB, called Simple Questions. This dataset, which is presented in Section 2, contains more than $100\mathrm{k\Omega}$ questions written by human anno

本文提出了两项贡献。首先，为了研究现有系统的覆盖范围以及通过多任务处理在不同数据源上联合训练的可能性，我们收集了首个基于知识库 (KB) 的大规模问答数据集，称为Simple Questions。该数据集在第2节中介绍，包含超过$100\mathrm{k\Omega}$条由人工标注者撰写的问题。

What American cartoonist is the creator of Andy Lippincott? (andy lippincott, character created by, garry trudeau) Which forest is Fires Creek in? (fires creek, contained by, nantahala national forest) What is an active ingredient in childrens earache relief ? (childrens earache relief, active ingredients, capsicum) What does Jimmy Neutron do? (jimmy neutron, fictional character occupation, inventor) What dietary restriction is incompatible with kimchi? (kimchi, incompatible with dietary restrictions, veganism)

哪位美国漫画家创造了安迪·利平科特？ (andy lippincott, character created by, garry trudeau)
Fires Creek位于哪片森林？ (fires creek, contained by, nantahala national forest)
儿童耳痛缓解药的有效成分是什么？ (childrens earache relief, active ingredients, capsicum)
吉米·中子从事什么职业？ (jimmy neutron, fictional character occupation, inventor)
哪种饮食限制与韩国泡菜不兼容？ (kimchi, incompatible with dietary restrictions, veganism)

Table 1: Examples of simple QA. Questions and corresponding facts have been extracted from the new dataset Simple Questions introduced in this paper. Actual answers are underlined.

表 1: 简单问答示例。问题及对应事实均提取自本文提出的新数据集Simple Questions，正确答案以下划线标注。

tators and associated to Freebase facts, while the largest existing benchmark, Web Questions, contains less than 6k questions created automatically using the Google suggest API.

现有最大的基准测试Web Questions包含不到6k个问题，这些问题是通过Google建议API自动生成的，而我们的数据集则与人工注释者和Freebase事实相关联。

Second, in sections 3 and 4, we present an embedding-based QA system developed under the framework of Memory Networks (MemNNs) (Weston et al., 2015; Sukhbaatar et al., 2015). Memory Networks are learning systems centered around a memory component that can be read and written to, with a particular focus on cases where the relationship between the input and response languages (here natural language) and the storage language (here, the facts from KBs) is performed by embedding all of them in the same vector space. The setting of the simple QA corresponds to the elementary operation of performing a single lookup in the memory. While our model bares similarity with previous embedding models for QA (Bordes et al., 2014b; Bordes et al., 2014a), using the framework of MemNNs opens the perspective to more involved inference schemes in future work, since MemNNs were shown to perform well on complex reasoning toy QA tasks (Weston et al., 2015). We discuss related work in Section 5.

其次，在第3和第4节中，我们介绍了一个基于嵌入的问答系统，该系统是在记忆网络（MemNNs）框架下开发的（Weston等人，2015；Sukhbaatar等人，2015）。记忆网络是一种以记忆组件为核心的学习系统，该组件可读写，特别关注输入和响应语言（此处为自然语言）与存储语言（此处为知识库中的事实）之间的关系，通过将它们全部嵌入同一向量空间来实现。简单问答的设置对应于在记忆中进行单次查找的基本操作。尽管我们的模型与之前的问答嵌入模型（Bordes等人，2014b；Bordes等人，2014a）有相似之处，但使用记忆网络框架为未来工作中更复杂的推理方案开辟了前景，因为记忆网络已被证明在复杂的推理玩具问答任务中表现良好（Weston等人，2015）。我们将在第5节讨论相关工作。

We report experimental results in Section 6, where we show that our model achieves excellent results on the benchmark Web Questions. We also show that it can learn from two different QA datasets to improve its performance on both. We also present the first successful application of transfer learning for QA. Using the Reverb KB and QA datasets, we show that Reverb facts can be added to the memory and used to answer without retraining, and that MemNNs achieve better results than some systems designed on this dataset.

我们在第6节报告了实验结果，表明我们的模型在基准测试Web Questions上取得了优异表现。同时证明该模型能够通过两个不同问答数据集进行联合学习，从而提升双方性能。我们还首次成功实现了问答任务的迁移学习应用：基于Reverb知识库和问答数据集，证实无需重新训练即可将Reverb事实添加到记忆模块并用于回答，且记忆神经网络(MemNNs)在该数据集上的表现优于部分专用系统。

2 Simple Question Answering

2 简单问答

Knowledge Bases contain facts expressed as triples (subject, relationship, object), where subject and object are entities and relationship describes the type of (directed) link between these entities. The simple QA problem we address here consist in finding the answer to questions that can be rephrased as queries of the form (subject, relationship, ?), asking for all objects linked to subject by relationship. The question What do Jamaican people speak ?, for instance, could be rephrased as the Freebase query (jamaica, language spoken, ?). In other words, fetching a single fact from a KB is sufficient to answer correctly.

知识库 (Knowledge Bases) 包含以三元组 (subject, relationship, object) 形式表示的事实，其中 subject 和 object 是实体，relationship 描述这些实体之间的 (有向) 链接类型。我们在此处理的简单问答 (QA) 问题，涉及寻找可以重新表述为 (subject, relationship, ?) 形式查询的问题答案，即询问通过 relationship 与 subject 关联的所有 objects。例如，问题 "What do Jamaican people speak?" 可以重新表述为 Freebase 查询 (jamaica, language spoken, ?)。换句话说，从知识库中获取一个事实就足以正确回答问题。

The term simple $Q A$ refers to the simplicity of the reasoning process needed to answer questions, since it involves a single fact. However, this does not mean that the QA problem is easy per se, since retrieving this single supporting fact can be very challenging as it involves to search over millions of alternatives given a query expressed in natural language. Table 1 shows that, with a KB with many types of relationships like Freebase, the range of questions that can be answered with a single fact is already very broad. Besides, as we shall see, modiying slightly the structure of the KB can make some QA problems simpler by adding direct connections between entities and hence allow to bypass the need for more complex reasoning.

术语简单$Q A$指的是回答问题所需的推理过程简单，因为它只涉及单一事实。但这并不意味着问答问题本身容易，因为在自然语言表达的查询中，从数百万种候选项中检索出这一单一支持事实可能极具挑战性。表1显示，在拥有多种关系类型（如Freebase）的知识库(KB)中，仅凭单一事实就能回答的问题范围已经非常广泛。此外，我们将看到，通过添加实体间的直接连接来略微修改知识库结构，可以使某些问答问题变得更简单，从而绕过更复杂推理的需求。

2.1 Knowledge Bases

2.1 知识库

We use the KB Freebase1 as the basis of our QA system, our source of facts and answers. All Freebase entities and relationships are typed and the lexicon for types and relationships is closed. Freebase data is collaborative ly collected and curated, to ensure a high reliability of the facts. Each entity has an internal identifier and a set of strings that are usually used to refer to that entity in text, termed aliases. We consider two extracts of Freebase, whose statistics are given in Table 2. FB2M, which was used in (Bordes et al., 2014a), contains about 2M entities and 5k relationships. FB5M, is much larger with about 5M entities and more than $7.5\mathrm{k}$ relationships.

我们以KB Freebase1作为问答系统的基础，作为事实和答案的来源。所有Freebase实体和关系都经过类型标注，且类型和关系的词汇表是封闭的。Freebase数据通过协作收集和整理，以确保事实的高可靠性。每个实体都有一个内部标识符和一组通常在文本中用于指代该实体的字符串，称为别名。我们考虑了两个Freebase的子集，其统计信息如表2所示。FB2M（曾在Bordes等人[2014a]中使用）包含约200万个实体和5000种关系。FB5M规模更大，包含约500万个实体和超过$7.5\mathrm{k}$种关系。

We also use the KB Reverb as a secondary source of facts to study how well a model trained to answer questions using Freebase facts could be used to answer using Reverb’s as well, without being trained on Reverb data. This is a pure setting of transfer learning. Reverb is interesting for this experiment because it differs a lot from Freebase. Its data was extracted automatically from text with minimal human intervention and is highly unstructured: entities are unique strings and the lexicon for relationships is open. This leads to many more relationships, but entities with multiple references are not de duplicated, ambiguous referents are not resolved, and the reliability of the stored facts is much lower than in Freebase. We used the full extraction from (Fader et al., 2011), which contains 2M entities and $600\mathrm{k}$ relationships.

我们还使用 KB Reverb 作为次要事实来源，研究一个基于 Freebase 事实训练的问题回答模型在未经 Reverb 数据训练的情况下，能否同样有效地利用 Reverb 数据进行回答。这是一个纯粹的迁移学习场景。Reverb 之所以适合本实验，是因为它与 Freebase 存在显著差异：其数据通过自动化文本抽取获得（人工干预极少），且高度非结构化——实体以唯一字符串形式存在，关系词汇表完全开放。这导致关系数量大幅增加，但多重指代实体未去重、歧义指称未消解，且存储事实的可靠性远低于 Freebase。我们采用 (Fader et al., 2011) 的完整抽取结果，包含 200 万实体和 $600\mathrm{k}$ 种关系。

Table 2: Knowledge Bases used in this paper. FB2M and FB5M are two versions of Freebase.

表 2: 本文使用的知识库。FB2M 和 FB5M 是 Freebase 的两个版本。

	FB2M	FB5M	Reverb
实体 (ENTITIES)	2,150,604	4,904,397	2,044,752
关系 (RELATIONSHIPS)	6,701	7,523	601,360
原子事实 (ATOMICFACTS)	14,180,937	22,441,880	14,338,214
事实 (分组) (FACTS (grouped))	10,843,106	12,010,500

2.2 The Simple Questions dataset

2.2 Simple Questions数据集

Existing resources for QA such as WebQuestions (Berant et al., 2013) are rather small (few thousands questions) and hence do not provide a very thorough coverage of the variety of questions that could be answered using a KB like Freebase, even in the context of simple QA. Hence, in this paper, we introduce a new dataset of much larger scale for the task of simple QA called Simple Questions.2 This dataset consists of a total of 108,442 questions written in natural language by human English-speaking annotators each paired with a corresponding fact from FB2M that provides the answer and explains it. We randomly shuffle these questions and use $70%$ of them (75910) as training set, $10%$ as validation set (10845), and the remaining $20%$ as test set. Examples of questions and facts are given in Table 1.

现有问答资源如WebQuestions (Berant等人，2013) 规模较小(仅数千问题)，即便在简单问答场景下，也无法全面覆盖像Freebase这类知识库能解答的各类问题。为此，本文推出了一个规模更大的简单问答任务新数据集Simple Questions。该数据集包含108,442个由英语母语标注者撰写的自然语言问题，每个问题都对应FB2M知识库中提供答案并解释的事实。我们将这些问题随机打乱，分配70%(75,910条)作为训练集，10%(10,845条)作为验证集，剩余20%作为测试集。具体问题与事实示例见表1。

We collected Simple Questions in two phases. The first phase consisted of short listing the set of facts from Freebase to be annotated with questions. We used FB2M as background KB and removed all facts with undefined relationship type i.e. containing the word freebase. We also removed all facts for which the (subject, relationship) pair had more than a threshold number of objects. This filtering step is crucial to remove facts which would result in trivial uninformative questions, such as, Name a person who is an actor?. The threshold was set to 10.

我们分两个阶段收集简单问题。第一阶段包括从Freebase中筛选出待标注问题的事实集合。我们使用FB2M作为背景知识库，并移除了所有关系类型未定义的事实（即包含freebase一词的条目）。同时，我们还移除了那些（主语，关系）对拥有超过阈值数量宾语的所有事实。这一过滤步骤对于剔除会导致无意义简单问题的事实至关重要，例如"说出一个演员的名字？"。该阈值设定为10。

In the second phase, these selected facts were sampled and delivered to human annotators to generate questions from them. For the sampling, each fact was associated with a probability which defined as a function of its relationship frequency in the KB: to favor variability, facts with relationship appearing more frequently were given lower probabilities. For each sampled facts, annotators were shown the facts along with hyperlinks to freebase.com to provide some context while framing the question. Given this information, annotators were asked to phrase a question involving the subject and the relationship of the fact, with the answer being the object. The annotators were explicitly instructed to phrase the question differently as much as possible, if they encounter multiple facts with similar relationship. They were also given the option of skipping facts if they wish to do so. This was very important to avoid the annotators to write a boiler plate questions when they had no background knowledge about some facts.

在第二阶段，这些筛选出的事实经过抽样后被交付给人工标注员用于生成问题。抽样过程中，每条事实根据其在知识库(KB)中的关系频率被赋予相应概率：为增加多样性，关系出现频率较高的事实会被分配较低抽样概率。对于每条被抽样的事实，标注员会看到该事实及其在freebase.com上的超链接以提供上下文参考。基于这些信息，标注员需构建涉及事实主语和关系的问题，并以事实宾语作为答案。标注员被明确要求：当遇到具有相似关系的多个事实时，需尽可能采用不同的提问方式。他们也可选择跳过某些事实，这对避免标注员在缺乏背景知识时编写模板化问题至关重要。

3 Memory Networks for Simple QA

3 用于简单问答的记忆网络

A Memory Network consists of a memory (an indexed array of objects) and a neural network that is trained to query it given some inputs (usually questions). It has four components: Input map $(I)$ , Generalization $(G)$ , Output map $(O)$ and $R e$ - sponse $(R)$ which we detail below. But first, we describe the MemNNs workflow used to set up a model for simple QA. This proceeds in three steps:

记忆网络 (Memory Network) 由记忆体 (一个带索引的对象数组) 和一个经过训练可根据输入 (通常是问题) 进行查询的神经网络组成。它包含四个组件：输入映射 $(I)$ 、泛化 $(G)$ 、输出映射 $(O)$ 和响应 $(R)$ ，我们将在下文详细说明。但首先，我们描述用于构建简单问答 (QA) 模型的记忆神经网络 (MemNNs) 工作流程，该流程分为三个步骤：

Storing Freebase: this first phase parses Freebase (either FB2M or FB5M depending on the setting) and stores it in memory. It uses the Input module to preprocess the data.
存储Freebase：第一阶段解析Freebase（根据设置使用FB2M或FB5M）并将其存储在内存中。该阶段使用输入模块对数据进行预处理。
Training: this second phase trains the MemNN to answer question. This uses Input, Output and Response modules, the training concerns mainly the parameters of the embedding model at the core of the Output module.
训练：第二阶段训练MemNN以回答问题。这涉及输入(Input)、输出(Output)和响应(Response)模块，训练主要关注输出模块核心嵌入模型的参数。
Connecting Reverb: this third phase adds new facts coming from Reverb to the memory. This is done after training to test the ability of MemNNs to handle new facts without having to be re-trained. It uses the Input module to preprocess Reverb facts and the Generalization module to connect them to the facts already stored.
连接Reverb：第三阶段将来自Reverb的新事实添加到记忆中。这一步骤在训练后进行，用于测试记忆神经网络 (MemNNs) 处理新事实而无需重新训练的能力。它使用输入模块预处理Reverb事实，并通过泛化模块将这些事实与已存储的事实连接起来。

After these three stages, the MemNN is ready to answer any question by running the $I,O$ and $R$ modules in turn. We now detail the implementation of the four modules.

经过这三个阶段后，MemNN 就可以通过依次运行 $I$、$O$ 和 $R$ 模块来回答任何问题。我们现在详细介绍这四个模块的实现。

3.1 Input module

3.1 输入模块

This module pre processes the three types of data that are input to the network: Freebase facts that are used to populate the memory, questions that the system need to answer, and Reverb facts that we use, in a second phase, to extend the memory.

该模块对输入网络的三种数据类型进行预处理：用于填充记忆的Freebase事实、系统需要回答的问题，以及我们在第二阶段用于扩展记忆的Reverb事实。

Preprocessing Freebase The Freebase data is initially stored as atomic facts involving single entities as subject and object, plus a relationship between them. However, this storage needs to be adapted to the QA task in two aspects.

预处理Freebase
Freebase数据最初以原子事实的形式存储，包含作为主语和宾语的单个实体以及它们之间的关系。然而，这种存储方式需要在两个方面进行调整以适应问答任务。

First, in order to answer list questions, which expect more than one answer, we redefine a fact as being a triple containing a subject, a relationship, and the set of all objects linked to the subject by the relationship. This grouping process transforms atomic facts into grouped facts, which we simply refer to as facts in the following. Table 2 shows the impact of this grouping: on FB2M, this decreases the number of facts from 14M to 11M and, on FB5M, from 22M to 12M.

首先，为了回答需要多个答案的列表问题，我们将事实重新定义为包含主语、关系以及通过该关系与主语相关联的所有对象集合的三元组。这一分组过程将原子事实转化为分组事实，下文简称为事实。表 2 展示了这种分组的影响：在 FB2M 上，事实数量从 1400 万减少到 1100 万；在 FB5M 上，则从 2200 万降至 1200 万。

Second, the underlying structure of Freebase is a hypergraph, in which more than two entities can be linked. For instance dates can be linked together with two entities to specify the time period over which the link was valid. The underlying triple storage involves mediator nodes for each such fact, effectively making entities linked through paths of length 2, instead of 1. To obtain direct links between entities in such cases, we created a single fact for these facts by removing the intermediate node and using the second relationship as the relationship for the new condensed fact. This step reduces the need for searching the answer outside the immediate neighborhood of the subject referred to in the question, widely increasing the scope of the simple QA task on Freebase. On Web Questions, a benchmark not primarily designed for simple QA, removing mediator nodes allows to jump from around $65%$ to $86%$ of questions that can be answered with a single fact.

其次，Freebase 的基础结构是一种超图 (hypergraph)，其中可以链接两个以上的实体。例如，日期可以与两个实体链接，以指定链接有效的时间段。底层三元组存储为每个此类事实包含中介节点 (mediator nodes)，实际上使得实体通过长度为 2 而非 1 的路径链接。为了在这些情况下获得实体之间的直接链接，我们通过移除中间节点并将第二个关系作为新压缩事实的关系，为这些事实创建了一个单一事实。这一步骤减少了对问题中提到的主题直接邻域外搜索答案的需求，大大扩展了 Freebase 上简单问答 (QA) 任务的范围。在 Web Questions 这一并非主要为简单问答设计的基准测试中，移除中介节点使得可以用单一事实回答的问题比例从约 $65%$ 跃升至 $86%$。

Preprocessing Freebase facts A fact with $k$ objects $y~=~(s,r,{o_{1},...,o_{k}})$ is represented by a bag-of-symbol vector $f(y)$ in $\mathbb{R}^{N_{S}}$ , where $N_{S}$ is the number of entities and relationships. Each dimension of $f(y)$ corresponds to a relationship or an entity (independent of whether it appears as subject or object). The entries of the subject and of the relationship have value 1, and the entries of the objects are set to $1/k$ . All other entries are 0.

预处理Freebase事实
一个包含$k$个对象的事实$y~=~(s,r,{o_{1},...,o_{k}})$由词袋向量$f(y)$在$\mathbb{R}^{N_{S}}$中表示，其中$N_{S}$是实体和关系的数量。$f(y)$的每个维度对应一个关系或实体（不论其作为主语还是宾语出现）。主语和关系的条目值为1，对象的条目值设为$1/k$。其余条目均为0。

Preprocessing questions A question $q$ is mapped to a bag-of-ngrams representation $g(q)$ of dimension $\mathbb{R}^{N_{V}}$ where $N_{V}$ is the size of the vocabulary. The vocabulary contains all individual words that appear in the questions of our datasets, together with the aliases of Freebase entities, each alias being a single n-gram. The entries of $g(q)$ that correspond to words and $\mathbf{n}$ -grams of $q$ are equal to 1, all other ones are set to 0.

预处理问题
问题 $q$ 被映射到一个维度为 $\mathbb{R}^{N_{V}}$ 的词袋表示 $g(q)$ ，其中 $N_{V}$ 是词汇表的大小。词汇表包含数据集中所有问题出现的独立单词，以及 Freebase 实体的别名（每个别名是一个单独的 n-gram）。 $g(q)$ 中对应 $q$ 的单词和 $\mathbf{n}$ -gram 的条目值为 1，其余条目设为 0。

Preprocessing Reverb facts In our experiments with Reverb, each fact $y=\left(s,r,o\right)$ is represented as a vector $h(y)\in\mathbb{R}^{N_{S}+N_{V}}$ . This vector is a bagof-symbol for the subject $s$ and the object $o$ , and a bag-of-words for the relationship $r$ . The exact composition of $h$ is provided by the Generalization module, which we describe now.

预处理混响事实
在我们的混响实验中，每个事实 $y=\left(s,r,o\right)$ 被表示为向量 $h(y)\in\mathbb{R}^{N_{S}+N_{V}}$。该向量是主语 $s$ 和宾语 $o$ 的符号包 (bag-of-symbol)，以及关系 $r$ 的词袋 (bag-of-words)。$h$ 的具体组成由泛化模块 (Generalization module) 提供，我们将在下文详述。

3.2 Generalization module

3.2 泛化模块

This module is responsible for adding new elements to the memory. In our case, the memory has a multigraph structure where each node is a Freebase entity and labeled arcs in the multigraph are Freebase relationships: after their preprocessing, all Freebase facts are stored using this structure.

该模块负责向记忆中添加新元素。在我们的案例中，记忆采用多重图结构，其中每个节点都是Freebase实体，带标签的多重图边表示Freebase关系：经过预处理后，所有Freebase事实都使用这种结构存储。

We also consider the case where new facts, with a different structure (i.e. new kinds of relationship), are provided to the MemNNs by using Reverb. In this case, the generalization mod- ule is then used to connect Reverb facts to the Freebase-based memory structure, in order to make them usable and searchable by the MemNN.

我们还考虑了通过使用Reverb向记忆神经网络(MemNNs)提供具有不同结构(即新型关系)的新事实的情况。在这种情况下，泛化模块用于将Reverb事实与基于Freebase的记忆结构相连接，以便MemNN能够使用和搜索这些事实。

To link the subject and the object of a Reverb fact to Freebase entities, we use pre computed entity links (Lin et al., 2012). If such links do not give any result for an entity, we search for Freebase entities with at least one alias that matches the Reverb entity string. These two processes allowed to match $17%$ of Reverb entities to Freebase ones. The remainder of entities were encoded using bag-of-words representation of their strings, since we had no other way of matching them to Freebase entities. All Reverb relationships were encoded using bag-of-words of their strings. Using this approximate process, we are able to store each Reverb fact as a bag-of-symbols (words or Freebase entities) all already seen by the MemNN during its training phase based on

为了将Reverb事实的主语和宾语链接到Freebase实体，我们使用了预先计算的实体链接 (Lin et al., 2012)。如果这些链接无法为某个实体提供结果，我们会搜索具有至少一个别名与Reverb实体字符串匹配的Freebase实体。这两个过程使得17%的Reverb实体能够与Freebase实体匹配。剩余的实体则使用其字符串的词袋 (bag-of-words) 表示进行编码，因为我们没有其他方法将它们与Freebase实体匹配。所有Reverb关系均使用其字符串的词袋表示进行编码。通过这一近似过程，我们能够将每个Reverb事实存储为一个符号袋 (bag-of-symbols)（单词或Freebase实体），这些符号在MemNN训练阶段均已见过。

Freebase. We can then hope that what had been learned there could also be successfully used to query Reverb facts.

Freebase。我们可以期待在那里学到的知识也能成功用于查询Reverb事实。

3.3 Output module

3.3 输出模块

The output module performs the memory lookups given the input to return the supporting facts destined to eventually provide the answer given a question. In our case of simple QA, this module only returns a single supporting fact. To avoid scoring all the stored facts, we first perform an approximate entity linking step to generate a small set of candidate facts. The supporting fact is the candidate fact that is most similar to the question according to an embedding model.

输出模块根据输入执行记忆查找，返回支持事实，最终为给定问题提供答案。在我们的简单问答场景中，该模块仅返回单个支持事实。为避免对所有存储事实进行评分，我们首先执行近似实体链接步骤以生成一小部分候选事实。支持事实是嵌入模型中与问题最相似的候选事实。

Candidate generation To generate candidate facts, we match $n$ -grams of words of the question to aliases of Freebase entities and select a few matching entities. All facts having one of these entities as subject are scored in a second step.

候选生成
为了生成候选事实，我们将问题中的$n$元词组与Freebase实体的别名进行匹配，并筛选出若干匹配实体。在第二步中，所有以这些实体为主语的事实将被评分。

We first generate all possible $n$ -grams from the question, removing those that contain an interrogative pronoun or 1-grams that belong to a list of stopwords. We only keep the $n$ -grams which are an alias of an entity, and then discard all $n$ -grams that are a sub sequence of another $n$ -gram, except if the longer $n$ -gram only differs by in, of, for or the at the beginning. We finally keep the two entities with the most links in Freebase retrieved for each of the five longest matched $n$ -grams.

我们首先从问题中生成所有可能的 $n$ -gram，去除包含疑问代词或属于停用词列表的1-gram。仅保留作为实体别名的 $n$ -gram，然后舍弃所有作为另一个 $n$ -gram 子序列的 $n$ -gram，除非较长的 $n$ -gram 仅在开头多出 in、of、for 或 the。最后，我们为五个最长匹配的 $n$ -gram 各保留在 Freebase 中检索到链接最多的两个实体。

Scoring Scoring is performed using an embedding model. Given two embedding matrices ${\bf W}{V}\in\mathbb{R}^{d\times N_{V}}$ and $\mathbf{W}{S}\in~\mathbb{R}^{d\times N_{S}}$ , which respectively contain, in columns, the $d$ -dimensional embeddings of the words $/n$ -grams of the vocabulary and the embeddings of the Freebase entities and relationships, the similarity between question $q$ and a Freebase candidate fact $y$ is computed as:

评分
评分使用嵌入(embedding)模型进行。给定两个嵌入矩阵 ${\bf W}{V}\in\mathbb{R}^{d\times N_{V}}$ 和 $\mathbf{W}{S}\in~\mathbb{R}^{d\times N_{S}}$ ，它们分别按列包含词汇表单词/$n$-gram的$d$维嵌入以及Freebase实体和关系的嵌入，问题$q$与Freebase候选事实$y$之间的相似度计算如下：

$$
S_{Q A}(q,y)=\cos(\mathbf{W}{V}g(q),\mathbf{W}_{S}f(y)),
$$

with $\cos()$ the cosine similarity. When scoring a fact $y$ from Reverb, we use the same embeddings and build the matrix $\mathbf{W}{V S}\in\mathbb{R}^{d\times(N_{V}+N_{S})}$ , which contains the concatenation in columns of $\mathbf{W}{V}$ and ${\bf W}_{S}$ , and also compute the cosine similarity:

其中 $\cos()$ 表示余弦相似度。当对来自 Reverb 的事实 $y$ 进行评分时，我们使用相同的嵌入并构建矩阵 $\mathbf{W}{V S}\in\mathbb{R}^{d\times(N_{V}+N_{S})}$ ，该矩阵包含 $\mathbf{W}{V}$ 和 ${\bf W}_{S}$ 的列拼接，同样计算余弦相似度：

$$
S_{R V B}(q,y)=\cos(\mathbf{W}{V}g(q),\mathbf{W}_{V S}h(y)).
$$

The dimension $d$ is a hyper parameter, and the embedding matrices $\mathbf{W}{V}$ and ${\bf W}_{S}$ are the parameters learned with the training algorithm of Section 4.

维度 $d$ 是一个超参数，嵌入矩阵 $\mathbf{W}{V}$ 和 ${\bf W}_{S}$ 是通过第4节的训练算法学习得到的参数。

3.4 Response module

3.4 响应模块

In Memory Networks, the Response module postprocesses the result of the Output module to compute the intended answer. In our case, it returns the set of objects of the selected supporting fact.

在记忆网络(Memory Networks)中，响应模块(Response module)会对输出模块(Output module)的结果进行后处理，以计算出预期的答案。在我们的案例中，它会返回所选支持事实的对象集合。

4 Training

4 训练

This section details how we trained the scoring function of the Output module using a multitask training process on four different sources of data.

本节详述了我们如何利用四种不同数据源的多任务训练过程来训练输出模块的评分函数。

First, in addition to the new Simple Questions dataset described in Section 2, we also used WebQuestions, a benchmark for QA introduced in (Berant et al., 2013): questions are labeled with answer strings from aliases of Freebase entities, and many questions expect multiple answers. Table 3 details the statistics of both datasets.

首先，除了第2节中描述的新Simple Questions数据集外，我们还使用了WebQuestions（由Berant等人于2013年提出的问答基准）：问题标注了来自Freebase实体别名的答案字符串，许多问题需要多个答案。表3详细列出了这两个数据集的统计信息。

We also train on automatic questions generated from the KB, that is FB2M or FB5M depending on the setting, which are essential to learn embeddings for the entities not appearing in either Web Questions or Simple Questions. Statistics of FB2M or FB5M are given in Table 2; we generated one training question per fact following the same process as that used in (Bordes et al., 2014a).

我们还基于知识库（KB，即FB2M或FB5M，具体取决于设置）自动生成的问题进行训练，这对学习未出现在Web Questions或Simple Questions中的实体嵌入至关重要。FB2M和FB5M的统计信息如表2所示；我们按照(Bordes et al., 2014a)中使用的相同流程，为每个事实生成一个训练问题。

Following previous work such as (Fader et al., 2013), we also use the indirect supervision signal of pairs of question paraphrases. We used a subset of the large set of paraphrases extracted from WIKI ANSWERS and introduced in (Fader et al., 2014). Our Paraphrases dataset is made of 15M clusters containing 2 or more paraphrases each.

遵循 (Fader et al., 2013) 等先前工作，我们也采用问题复述对的间接监督信号。我们使用了从 WIKI ANSWERS 提取并引入 (Fader et al., 2014) 的大规模复述数据集的子集。我们的复述数据集包含 1500 万个簇，每个簇由 2 个或更多复述组成。

4.1 Multitask training

4.1 多任务训练

As in previous work on embedding models and Memory Networks (Bordes et al., 2014a; Bordes et al., 2014b; Weston et al., 2015), the em- beddings are trained with a ranking criterion. For QA datasets the goal is that in the embedding space, a supporting fact is more similar to the question than any other non-supporting fact. For the paraphrase dataset, a question should be more similar to one of its paraphrases than to any another question.

与之前关于嵌入模型和记忆网络 (Memory Networks) 的研究 (Bordes et al., 2014a; Bordes et al., 2014b; Weston et al., 2015) 类似，嵌入训练采用了排序准则。对于问答数据集，目标是在嵌入空间中，支持事实与问题的相似度应高于任何非支持事实。对于复述数据集，问题应与其某个复述版本的相似度高于其他任何问题。

The multitask learning of the embedding matrices $\mathbf{W}{V}$ and ${\bf W}_{S}$ is performed by alternating stochastic gradient descent (SGD) steps over the loss function on the different datasets. For the QA datasets, given a question/supporting fact pair $(q,y)$ and a non-supporting fact $y^{\prime}$ , we perform a step to minimize the loss function

嵌入矩阵 $\mathbf{W}{V}$ 和 ${\bf W}_{S}$ 的多任务学习通过在不同数据集的损失函数上交替执行随机梯度下降 (SGD) 步骤来完成。对于问答数据集，给定一

[论文翻译]基于记忆网络的大规模简单问答

原文地址：https://arxiv.org/pdf/1506.02075v1