由苏黎世Google Research的AI常驻研究员Julian Eisenschlos发布。识别文本蕴涵的任务(也称为自然语言推论)包括确定一条文本(前提)是否可以被另一条文本(假设)隐含或矛盾(或都不存在)。尽管这个问题通常被认为是机器学习(ML)系统推理技能的重要测试,并且已经针对纯文本输入进行了深入研究,但是将此类模型应用于结构化数据(如网站,表格)的工作却投入了很多然而,每当需要将表格的内容准确地汇总并呈现给用户时,识别文本含义就显得尤为重要,这对于高保真问答系统和虚拟助手。
Posted by Julian Eisenschlos, AI Resident, Google Research, ZürichThe task of recognizing* *textual entailment, also known as natural language inference, consists of determining whether a piece of text (a premise), can be implied or contradicted (or neither) by another piece of text (the hypothesis). While this problem is often considered an important test for the reasoning skills of machine learning (ML) systems and has been studied in depth for plain text inputs, much less effort has been put into applying such models to structured data, such as websites, tables, databases, etc. Yet, recognizing textual entailment is especially relevant whenever the contents of a table need to be accurately summarized and presented to a user, and is essential for high fidelity question answering systems and virtual assistants.
在《EMNLP 2020的发现》中发布的“了解具有中间预训练的表”中,我们介绍了针对表解析而定制的首批预训练任务,使模型能够更好,更快地从更少的数据中学习。我们以较早的TAPAS模型为基础,该模型是BERT双向Transformer模型的扩展,具有特殊的嵌入以在表中查找答案。将我们的新的预训练目标应用于TAPAS,可在涉及表的多个数据集上产生最新的技术水平。在TabFact上,例如,它将模型与人类绩效之间的差距减少了约50%。我们还系统地对选择相关输入以提高效率的方法进行了基准测试,在速度和内存方面实现了4倍的增益,同时保留了92%的结果。适用于不同任务和大小的所有模型都在GitHub repo上发布,您可以在colab Notebook中自己尝试这些模型。
In "Understanding tables with intermediate pre-training", published in Findings of EMNLP 2020, we introduce the first pre-training tasks customized for table parsing, enabling models to learn better, faster and from less data. We build upon our earlier TAPAS model, which was an extension of the BERT bi-directional Transformer model with special embeddings to find answers in tables. Applying our new pre-training objectives to TAPAS yields a new state of the art on multiple datasets involving tables. On TabFact, for example, it reduces the gap between model and human performance by ~50%. We also systematically benchmark methods of selecting relevant input for higher efficiency, achieving 4x gains in speed and memory, while retaining 92% of the results. All the models for different tasks and sizes are released on GitHub repo, where you can try them out yourself in a colab Notebook.
文本包含
当应用于表格数据时,与纯文本相比,文本包含的任务更具挑战性。例如,考虑来自Wikipedia的表格,其中有一些句子是从与其关联的表格内容中得出的。评估表的内容是否需要或与句子矛盾,可能需要查看多个列和行,并可能执行简单的数值计算,例如求平均值,求和,求差等。
Textual Entailment
The task of textual entailment is more challenging when applied to tabular data than plain text. Consider, for example, a table from Wikipedia with some sentences derived from its associated table content. Assessing if the content of the table entails or contradicts the sentence may require looking over multiple columns and rows, and possibly performing simple numeric computations, like averaging, summing, differencing, etc.
A table together with some statements from TabFact.The content of the table can be used to support or contradict the statements.
一个表以及一些来自TabFact的语句。该表的内容可用于支持或反对这些语句。
Following the methods used by TAPAS, we encode the content of a statement and a table together, pass them through a Transformer model, and obtain a single number with the probability that the statement is entailed or refuted by the table.
遵循TAPAS所使用的方法,我们将语句和表的内容一起编码,通过Transformer模型传递它们,并获得一个数字,表示该语句被表所包含或拒绝的可能性。
The TAPAS model architecture uses a BERT model to encode the statement and the flattened table, read row by row. Special embeddings are used to encode the table structure. The vector output of the first token is used to predict the probability of entailment.
该TAPAS模型架构使用BERT模式编码的语句和扁平表,读取一行一行。特殊的嵌入用于对表结构进行编码。第一个令牌的向量输出用于预测蕴含的可能性。
Because the only information in the training examples is a binary value (i.e., "correct" or "incorrect"), training a model to understand whether a statement is entailed or not is challenging and highlights the difficulty in achieving generalization in deep learning, especially when the provided training signal is scarce. Seeing isolated entailed or refuted examples, a model can easily pick-up on spurious patterns in the data to make a prediction, for example the presence of the word "tie" in "Greg Norman and Billy Mayfair tie in rank", instead of truly comparing their ranks, which is what is needed to successfully apply the model beyond the original training data.
因为训练示例中唯一的信息是二进制值(即“正确”或“不正确”),所以训练模型以了解是否需要陈述是一种挑战,并突出了实现深度学习泛化的难度,尤其是当所提供的训练信号不足时。看到孤立的需要或被反驳的示例,模型可以轻松地提取数据中的虚假模式以进行预测,例如“格雷格·诺曼和比利·梅菲尔排名”中存在“领带”一词,而不是真正比较他们的等级,这是成功地将模型应用到原始训练数据之外的条件。
Pre-training Tasks
Pre-training tasks can be used to “warm-up” models by providing them with large amounts of readily available unlabeled data. However, pre-training typically includes primarily plain text and not tabular data. In fact, TAPAS was originally pre-trained using a simple masked language modelling objective that was not designed for tabular data applications. In order to improve the model performance on tabular data, we introduce two novel pretraining binary-classification tasks called counterfactual and synthetic, which can be applied as a second stage of pre-training (often called intermediate pre-training).
In the counterfactual task, we source sentences from Wikipedia that mention an entity (person, place or thing) that also appears in a given table. Then, 50% of the time, we modify the statement by swapping the entity for another alternative. To make sure the statement is realistic, we choose a replacement among the entities in the same column in the table. The model is trained to recognize whether the statement was modified or not. This pre-training task includes millions of such examples, and although the reasoning about them is not complex, they typically will still sound natural.
For the synthetic task, we follow a method similar to semantic parsing in which we generate statements using a simple set of grammar rules that require the model to understand basic mathematical operations, such as sums and averages (e.g., "the sum of earnings"), or to understand how to filter the elements in the table using some condition (e.g.,"the country is Australia"). Although these statements are artificial, they help improve the numerical and logical reasoning skills of the model.
预训练任务
预训练任务可以通过为模型提供大量易于获得的未标记数据来“预热”模型。但是,预训练通常主要包括纯文本,而不包括表格数据。实际上,TAPAS最初是使用简单的屏蔽语言建模目标进行预训练的,而该目标并不是为表格数据应用设计的。为了提高表格数据的模型性能,我们引入了两个新颖的预训练二进制分类任务,称为反事实和合成,可以用作预训练的第二阶段(通常称为中间预训练)。
在反事实的任务中,我们从Wikipedia提取句子,该句子提到也出现在给定表中的实体(人,地点或事物)。然后,在50%的时间内,我们通过将实体替换为另一种方式来修改语句。为了确保该语句是现实的,我们在表的同一列中的实体之间选择一个替换项。对模型进行训练以识别该语句是否已被修改。这项预训练任务包括数百万个此类示例,尽管关于它们的推理并不复杂,但通常听起来仍然很自然。
对于合成任务,我们采用类似于语义解析的方法,在该方法中,我们使用一套简单的语法规则来生成语句,该语法规则集要求模型理解基本的数学运算,例如总和和平均值(例如,“收入总和”) ,或了解如何使用某种条件(例如,“国家/地区是澳大利亚”)过滤表格中的元素。尽管这些陈述是人为的,但它们有助于提高模型的数字和逻辑推理能力。
Example instances for the two novel pre-training tasks. Counterfactual examples swap entities mentioned in a sentence that accompanies the input table for a plausible alternative. Synthetic statements use grammar rules to create new sentences that require combining the information of the table in complex ways.
Results
We evaluate the success of the counterfactual and synthetic pre-training objectives on the TabFact dataset by comparing to the baseline TAPAS model and to two prior models that have exhibited success in the textual entailmen