Deep Learning Alone Isn’t Getting Us To Human-Like AI
Artificial intelligence has mostly been focusing on a technique called deep learning. It might be time to reconsider.
符号处理是逻辑学、数学和计算机科学中常见的过程,它将思维视为代数操作。近 70 年来,人工智能领域最根本的争论就是人工智能系统应该建立在符号处理的基础上还是类似于人脑的神经系统之上。
实际上还有作为中间立场的第三种可能——混合模型。通过将神经网络的数据驱动学习与符号处理的强大抽象能力相结合,混合模型试图获得两全其美的能力。
Yann LeCun 和 Jacob Browning 在发表于 NOEMA 杂志的文章中首次正式回应「」这个观点,表示「从一开始,批评者就过早地认为神经网络已经遇到了不可翻越的墙,但每次都被证明只是一个暂时的障碍。」
在文章的开头,他们似乎反对混合模型,混合模型通常被定义为是结合了神经网络深度学习和符号处理的系统。但到最后,LeCun 一反常态,用很多话承认混合系统的存在——它们很重要,它们是一种可能的前进方式,而且我们一直知道这一点。文章本身就是矛盾的。
至于为什么会出现这种矛盾,我唯一能想到的原因是 LeCun 和 Browning 以某种方式相信:学习了符号处理的模型并不是混合模型。但学习是一个发展中的问题(系统是如何产生的?),而已经发展好的系统如何运作(是用一种机制还是两种)是一个计算问题:无论以哪种合理的标准来衡量,同时利用了符号和神经网络两种机制的系统都是一个混合系统。(也许他们真正想说的是,AI 更像是一种习得的混合系统(learned hybrid),而不是先天的混合系统(innate hybrid)。但习得的混合系统仍然是混合系统。)
About the only sense I can make of this apparent contradiction is that LeCun and Browning somehow believe that a model isn’t hybrid if it learns to manipulate symbols. But the question of learning is a developmental one (how does the system arise?), whereas the question of how a system operates once it has developed (e.g. does it use one mechanism or two?) is a computational one: Any system that leverages both symbols and neural networks is by any reasonable standard a hybrid. (Maybe what they really mean to say is that AI is likely to be a learned hybrid, rather than an innate hybrid. But a learned hybrid is still a hybrid.)
在 2010 年左右,符号处理被深度学习的支持者看作是一个糟糕的词;而到了 2020 年,了解符号处理的来源成了我们的首要任务。
我认为符号处理要么是与生俱来的,要么是其他东西间接地促成了符号处理的获得。我们越早弄清楚是什么基础允许系统学习符号抽象,我们就能够越早地构建适当利用世界上所有知识的系统,系统也将更安全、更可信和可解释。
“In the 2010s, symbol manipulation was a dirty word among deep learning proponents; in the 2020s, understanding where it comes from should be our top priority.”
I would argue that either symbol manipulation itself is directly innate, or something else — something we haven’t discovered yet — is innate, and *that *something else indirectly enables the acquisition of symbol manipulation. All of our efforts should be focused on discovering that possibly indirect basis. The sooner we can figure out what basis allows a system to get to the point where it can learn symbolic abstractions, the sooner we can build systems that properly leverage all the world’s knowledge, hence the closer we might get to AI that is safe, trustworthy and interpretable. (We might also gain insight into human minds, by examining the proof of concept that any such AI would be.)
然而,首先我们需要了解人工智能发展史上这场重要辩论的来龙去脉。
We can’t really ponder LeCun and Browning’s essay at all, though, without first understanding the peculiar way in which it fits into the intellectual history of debates over AI.
早期的人工智能先驱 Marvin Minsky 和 John McCarthy 认为符号处理是唯一合理的前进方式,而神经网络先驱 Frank Rosenblatt 认为人工智能将更好地建立在类似神经元的「节点」集合并可处理数据的结构上,以完成统计数据的繁重工作。
Early AI pioneers like Marvin Minsky and John McCarthy assumed that symbol manipulation was the only reasonable way forward, while neural network pioneer Frank Rosenblatt argued that AI might instead be better built on a structure in which neuron-like “nodes” add up and process numeric inputs, such that statistics could do the heavy lifting.
这两种可能并不相互排斥。人工智能所使用的「神经网络」并不是字面上的生物神经元网络。相反,它是一个简化的数字模型,与实际生物大脑有几分相似,但复杂度很小。原则上,这些抽象神经元可以以许多不同的方式连接起来,其中一些可以直接实现逻辑和符号处理。早在 1943 年,该领域最早的论文之一《A Logical Calculus of the Ideas Inmanent in Nervous Activity》就明确承认了这种可能性。
It’s been known pretty much since the beginning that these two possibilities aren’t mutually exclusive. A “neural network” in the sense used by AI engineers is not literally a network of biological neurons. Rather, it is a simplified digital model that captures some of the flavor (but little of the complexity) of an actual biological brain.
In principle, these abstractions can be wired up in many different ways, some of which might directly implement logic and symbol manipulation. (One of the earliest papers in the field, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” written by Warren S. McCulloch & Walter Pitts in 1943, explicitly recognizes this possibility).
20 世纪 50 年代的 Frank Rosenblatt 以及 1980 年代的 David Rumelhart 和 Jay McClelland,提出了神经网络作为符号处理的替代方案;Geoffrey Hinton 也普遍支持这一立场。
这里不为人知的历史是,早在 2010 年代初期,LeCun、Hinton 和 Yoshua Bengio 对这些终于可以实际应用的多层神经网络非常热情,他们希望完全消灭符号处理。到 2015 年,深度学习仍处于无忧无虑、热情洋溢的时代,LeCun、Bengio 和 Hinton 在 Nature 上撰写了一份关于深度学习的宣言。这篇文章以对符号的攻击结束,认为「需要新的范式来通过对大型向量的操作取代基于规则的符号表达式操作」。
Others, like Frank Rosenblatt in the 1950s and David Rumelhart and Jay McClelland in the 1980s, presented neural networks as an alternative to symbol manipulation; Geoffrey Hinton, too, has generally argued for this position.
The unacknowledged history here is that, back in the early 2010s, LeCun, Hinton and Yoshua Bengio — his fellow deep-learning pioneers, with whom he shared the Turing Award — were so enthusiastic about these neural networks with multiple layers, which had just then finally became practical, that they hoped they might banish symbol manipulation entirely. By 2015, with deep learning still in its carefree, enthusiastic days, LeCun, Bengio and Hinton wrote a manifesto on deep learning in Nature. The article ended with an attack on symbols, arguing that “new paradigms [were] needed to replace rule-based manipulation of symbolic expressions by operations on large vectors.”
事实上,那时的 Hinton 非常确信符号处理是一条死胡同,以至于同年他在斯坦福大学做了一个名为「Aetherial Symbols」的演讲——将符号比作科学史上最大的错误之一。
类似地,20 世纪 80 年代,Hinton 的合作者 Rumelhart 和 McClelland 也提出了类似的观点,他们在 1986 年的一本著作中辩称:符号不是「人类计算的本质」。
In fact, Hinton was so confident that symbols were a dead end that he gave a talk at Stanford that the same year, called “Aetherial Symbols” — likening symbols to one of the biggest blunders in scientific history. (Similar arguments had been made in the 1980s as well, by two of his former collaborators, Rumelhart and McClelland, who argued in a famous 1986 book that symbols are not “of the essence of human computation,” sparking the great “past tense debate” of the 1980s and 1990s.)
当我在 2018 年写了一篇文章为符号处理辩护时,LeCun 在 Twitter 上称我的混合系统观点「大部分是错误的」。彼时,Hinton 也将我的工作比作在「汽油发动机」上浪费时间,而「电动发动机」才是最好的前进方式。甚至在 2020 年 11 月,Hinton 还声称「深度学习将无所不能」。
When I wrote a 2018 essay defending some ongoing role for symbol manipulation, LeCun scorned my entire defense of hybrid AI, dismissing it on Twitter as “mostly wrong.” Around the same time, Hinton likened focusing on symbols to wasting time on gasoline engines when electric engines were obviously the best way forward. Even as recently as November 2020, Hinton told Technology Review, “Deep learning is going to be able to do everything.”
因此,当 LeCun 和 Browning 现在毫不讽刺地写道:「在深度学习领域工作的每个人都同意符号处理是创建类人 AI 的必要特征」,他们是在颠覆几十年的辩论史。正如斯坦福大学人工智能教授 Christopher Manning 所说:「LeCun 的立场发生了一些变化。」
“The sooner we can figure out what basis allows a system to get to the point where it can learn symbolic abstractions, the closer we might get to AI that is safe, trustworthy and interpretable.”
So when LeCun and Browning write, now, without irony, that “everyone working in DL agrees that symbolic manipulation is a necessary feature for creating human-like AI,” they are walking back decades of history. As Stanford AI Professor Christopher Manning put it, “I sense some evolution in @ylecun’s position. … Was that really true a decade ago, or is it even true now?!?”
显然,十年前的方法现在已经不适用了。
2010 年代,机器学习社区中许多人断言(没有真正的论据):「符号在生物学上不可信」。而十年后,LeCun 却正在考虑一种包含符号处理的新方案,无论符号处理是与生俱来的还是后天习得的。LeCun 和 Browning 的新观点认为符号处理是至关重要的,这代表了深度学习领域的巨大让步。
In the context of what actually transpired throughout the 2010s, and after decades in which many in the machine learning community asserted (without real argument) that “symbols aren’t biologically plausible,” the fact that LeCun is even considering a hypothesis that embraces symbol manipulation, learned or otherwise, represents a monumental concession, if not a complete about-face. The real news here is the walk-back.
Because here’s the thing: on LeCun and Browning’s new view, symbol manipulation is actually vital — exactly as the late Jerry Fodor argued in 1988, and as Steven Pinker and I have been arguing all along.
人工智能历史学家应该将 NOEMA 杂志的文章视为一个重大转折点,其中深度学习三巨头之一的 LeCun 首先直接承认了混合 AI 的必然性。
值得注意的是,今年早些时候,深度学习三巨头的另外两位也表示支持混合 AI 系统。计算机科学家吴恩达和 LSTM 的创建者之一 Sepp Hochreiter 也纷纷表示支持此类系统。而 Jürgen Schmidhuber 的 AI 公司 NNAISANCE 近期正围绕着符号处理和深度学习的组合进行研究。
Historians of artificial intelligence should in fact see the Noema essay as a major turning point, in which one of the three pioneers of deep learning first directly acknowledges the inevitability of hybrid AI. Significantly, two other well-known deep learning leaders also signaled support for hybrids earlier this year. Andrew Ng signaled support for such systems in March. Sepp Hochreiter — co-creator of LSTMs, one of the leading DL architectures for learning sequences — did the same, writing “The most promising approach to a broad AI is a neuro-symbolic AI … a bilateral AI that combines methods from symbolic and sub-symbolic AI” in April. As this was going to press I discovered that Jürgen Schmidhuber’s AI company NNAISENSE revolves around a rich mix of symbols and deep learning. Even Bengio (who explicitly denied the need for symbol manipulation in a December 2019 debate with me) has been busy in recent years trying to get Deep Learning to do “System 2” cognition — a project that looks suspiciously like trying to implement the kinds of reasoning and abstraction that made many of us over the decades desire symbols in the first place.
LeCun 和 Browning 的文章的其余内容大致可以分为三个部分:
对我的立场的错误描述;
努力缩小混合模型的范围;
讨论为什么符号处理是后天习得的而非与生俱来的。
The rest of LeCun and Browning’s essay can be roughly divided into three parts: mischaracterizations of my position (there are remarkable number of them); an effort to narrow the scope of what might be counted as hybrid models; and an argument for why symbol manipulation might be learned rather than innate.
例如,LeCun 和 Browning 说:「Marcus 认为,如果你一开始没有符号处理,那你后面也不会有(if you don’t have symbolic manipulation at the start, you’ll never have it)。」而事实上我在 2001 年的《代数思维(The Algebraic Mind)》一书中明确表示:我们不确定符号处理是否是与生俱来的。
Some sample mischaracterizations: LeCun and Browning say, “For Marcus, if you don’t have symbolic manipulation at the start, you’ll never have it,” when I in fact explicitly acknowledged in my 2001 book “The Algebraic Mind” that we didn’t know for sure whether symbol manipulation was innate. They say that I expect deep learning “is incapable of further progress” when my actual view is not that there will be no more progress of any sort on any problem whatsoever, but rather that deep learning on its own is the wrong tool for certain jobs: compositionality, reasoning and so forth.
他们还称我预计深度学习「无法取得进一步进展」,而我的实际观点并不是在任何问题上都不会再有任何进展,而是深度学习对于某些工作(例如组合性问题、因果推理问题)来说本身就是错误的工具。
他们还说我认为「符号推理对于一个模型来说是 all-or-nothing 的,因为 DALL-E 没有用符号和逻辑规则作为其处理的基础,它实际上不是用符号进行推理,」而我并没有说过这样的话。DALL·E 不使用符号进行推理,但这并不意味着任何包含符号推理的系统必须是 all-or-nothing 的。至少早在 20 世纪 70 年代的专家系统 MYCIN 中,就有纯粹的符号系统可以进行各种定量推理。
Similarly, they say that “[Marcus] broadly assumes symbolic reasoning is all-or-nothing — since DALL-E doesn’t have symbols and logical rules underlying its operations, it isn’t actually reasoning with symbols,” when I again never said any such thing. DALL-E doesn’t reason with symbols, but that doesn’t mean that any system that incorporates symbolic reasoning has to be all-or-nothing; at least as far back as the 1970s’ expert system MYCIN, there have been purely symbolic systems that do all kinds of quantitative reasoning.
除了假设「包含习得符号的模型不是混合模型」,他们还试图将混合模型等同于「包含不可微分符号处理器的模型」。他们认为我将混合模型等同于「两种东西简单的结合:在一个模式完善(pattern-completion)的深度学习模块上插入一个硬编码的符号处理模块。」而事实上,每个真正从事神经符号 AI 工作的人都意识到这项工作并不是这么简单。
Aside from tendentiously presuming that a model is not a hybrid if it has symbols but those symbols are learned, they also try to equate hybrid models with “models [that contain] a non-differentiable symbolic manipulator,” when symbols in themselves do not inherently preclude some sort of role for differentiation. And they suggest I equate hybrid models with “simply combining the two: inserting a hard-coded symbolic manipulation module on top of a pattern-completion DL module,” when, in fact, everyone actually working in neurosymbolic AI realizes that the job is not that simple.
相反,正如我们都意识到的那样,问题的关键就是构建混合系统的正确方法。人们考虑了许多不同方法来组合符号和神经网络,重点关注从神经网络中提取符号规则、将符号规则直接转换为神经网络、构建允许在神经网络和符号系统之间传递信息的中间系统等技术,并重构神经网络本身。许多途径都正在探索中。
Rather, as we all realize, the whole game is to discover the right way of building hybrids. People have considered many different ways of combining symbols and neural networks, focusing on techniques such as extracting symbolic rules from neural networks, translating symbolic rules directly into neural networks, constructing intermediate systems that might allow for the transfer of information between neural networks and symbolic systems, and restructuring neural networks themselves. Lots of avenues are being explored.
最后,我们来看一下最关键的问题:符号处理是否可以通过学习学得而不需要从一开始就内置?
Finally, we come to the key question: could symbol manipulation be learned rather than built in from the start?
我直截了当地回答:当然可以。据我所知,没有人否认符号处理是可以习得的。2001 年,我在《代数思维》的第 6.1 节中回答过这个问题,虽然我认为这不太可能,但我没有说这是绝对不可能的。相反,我的结论是:「这些实验和理论肯定不能保证符号处理的能力是与生俱来的,但它们确实符合这一观点。」
The straightforward answer: of course it could. To my knowledge, nobody has ever denied that symbol manipulation might be learnable. In 2001, in section 6.1 of “The Algebraic Mind,” I considered it, and while I suggested it was unlikely, I hardly said it was impossible. Instead, I concluded rather mildly that, “These experiments [and theoretical considerations reviewed here] surely do not guarantee that the capacities of symbol manipulation are innate, but they are consistent with such a view, and they do pose a challenge for any theory of learning that depends on a great deal of experience.”
总的来说,我的观点包括以下两部分:
I had two main arguments.
第一是「可学习性」观点:在《代数思维》整本书中,我展示了某些类型的系统(基本是当今更深层系统的前身)未能学得符号处理的各个方面,因此不能保证任何系统都能够学习符号处理。正如我书中原话:
The first was a “learnability” argument: throughout the book, I showed that certain kinds of systems — basically 3-layer forerunners to today’s more deeply layered systems — failed to acquire various aspects of symbol manipulation, and therefore there was no guarantee that any system regardless of its constitution would ever be able to learn symbol manipulation. As I put it then:
有些东西必须是与生俱来的。但「先天」和「后天」这两者并没有真正的冲突。大自然提供了一套允许我们与环境互动的机制、一套从世界中提取知识的工具,以及一套利用这些知识的工具。如果没有一些与生俱来的学习工具,我们也根本就不会学习。
Something has to be innate. Although “nature” is sometimes crudely pitted against “nurture,” the two are not in genuine conflict. Nature provides a set of mechanisms that allow us to interact with the environment, a set of tools for extracting knowledge from the world, and a set of tools for exploiting that knowledge. Without some innately given learning device, there could be no learning at all.
发展心理学家 Elizabeth Spelke 曾说:「我认为一个具有一些内置起点(例如对象、集合、用于符号处理的装置等)的系统将比纯粹的白板更有效地了解世界。」事实上,LeCun 自己最著名的卷积神经网络工作也能说明这一点。
Leaning on a favorite quotation from the developmental psychologist Elizabeth Spelke, I argued that a system that had some built-in starting point (e.g., objects, sets, places and the apparatus of symbol manipulation) would be more able to efficiently and effectively learn about the world than a purely blank slate. Indeed, LeCun’s own most famous work — on convolutional neural networks — is an example of precisely this: an innate constraint on how a neural network learns, leading to a strong gain in efficiency. Symbol manipulation, well integrated, might lead to even greater gains.
第二点是人类婴儿表现出一些拥有符号处理能力的证据。在我实验室的一组经常被引用的规则学习实验中,婴儿将抽象模式的范围泛化了,超越了他们训练中的具体例子。人类婴儿隐含逻辑推理能力的后续工作会进一步证实这一点。
The second argument was that human infants show some evidence of symbol manipulation. In a set of often-cited rule-learning experiments conducted in my lab, infants generalized abstract patterns beyond the specific examples on which they had been trained. Subsequent work in human infant’s capacity for implicit logical reasoning only strengthens that case. The book also pointed to animal studies showing, for example, that bees can generalize the solar azimuth function to lighting conditions they had never seen.
不幸的是,LeCun 和 Browning 完全回避了我这两个观点。奇怪的是,他们反而将学习符号等同于较晚习得的东西,例如「地图、图像表示、仪式甚至社会角色),显然没有意识到我和其他几位认知科学家从认知科学的大量文献中汲取的关于婴儿、幼儿和非人类动物的思考。如果一只小羊在出生后不久就可以爬下山坡,那么为什么一个新生的神经网络不能加入一点符号处理呢?
Unfortunately, LeCun and Browning ducked both of t