约翰·卡马克（John Carmack）：学习神经网络这一周

编者按：熟悉电子游戏的读者也许听说过约翰·卡马克（John Carmack），在游戏领域，这个名字称得上是偶像级别的存在：3D引擎之父、第一人称射击游戏教父、电脑怪才……和那个时代的科技大牛一样，他自学成才，考上名校，继而辍学入职，投身游戏软件开发。他是个兼具数学家和哲学家气质的程序员，也是个只想写出好代码的纯粹的天才。那么，面对汹涌而来的人工智能浪潮，这位老一辈的程序员会有什么看法呢？

经历了一周努力学习，近日，约翰·卡马克在他的Facebook上发表了一篇名为1-week experience learning neural networks from scratch的学习心得，引起了网友的广泛关注，以下是论智编译的原文（后附英文版）：

隔了这么多年后，我终于找了一个远离工作事务纷扰的角落，捡起编程，以一个隐居者的心态学习了一周。在过去的几年里，我的妻子一直慷慨地提供给我这样的环境，但我始终无法从工作中脱身，连休假期间都不得安宁。

而现在，随着我在Oculuss中工作进展的变化，我想从头开始用C++编写一些神经网络实现。我计划挑选的操作系统是标准、准确的OpenBSD，有人说我的选择太随意了，可事实证明它确实没问题。

说实在的，虽然我一直很欣赏OpenBSD的想法——这是一个相对简单同时又颇具见地的操作系统，它目标精准，重视程式码的品质和工艺，但之前我并没有用过它。Linux什么都好，可惜的是这些优点它都没有。

这倒也不是说我是个UNIX geek。我最喜欢的还是Windows的Visual Studio，所以其实我完全可以回避这些问题。我只是单纯觉得在老式UNIX风格下进行长达一周的沉浸式工作会很有趣，即使进度会慢一些。这是复古计算的一次冒险——是fvwm & vi，而不是vim，是BSD vi。

而且我并没有真正探索完整个系统，因为我把95%的时间都花在基础的 vi/make/gdb 操作上了。我很喜欢那些实用的帮助手册页面，虽然一直在摸索自己能在这个系统里做什么，但我实在不想上网直接搜。试想一下，我是在查阅30几年前的老东西，如Tektronix terminal的手册，这简直不能更有趣。

有一点让我比较惊讶，就是OpenBSD对C++的支持有点烂。G++不支持C++11，LLVM C++也不能很好地和gdb配合使用。我做gdb时系统崩了几次，我怀疑是C++导致的。当然，你不用跟我说它可以升级，我就想用最基础的操作系统。

现在回过头来看，我应该是完全复古了，而且写的东西完全符合ANSI C标准。和许多老程序员一样，有几天我会忍不住反思：也许C++并没有我们想得那么好……虽然我还喜欢很多其他的东西，但用普通的C语言写个小项目还难不倒我。当然，如果还有下次的话，我会试试Emacs，这是另一个我没怎么接触过的领域。

在这之前，我其实已经对大多数机器学习算法有了成熟的了解，而且也做过一些线性分类器和决策树之类的工作。但出于某些原因，我还没碰过神经网络，这在某种程度上可能是因为深度学习太时髦了，导致我对它持保守意见，或许也有一些反思性的偏见。我还不能接受“把所有东西丢进神经网络里，然后让它自己整理”这种套路。

而本着复古主义精神，我打印了几篇Yann LeCun的旧论文，然后脱机工作，假装自己正身处某地的山间小屋，但现实是——我还是偷偷在YouTube上看了不少斯坦福CS231N的视频，并从中学到了很多东西。我一般很少看这种演讲视频，会觉得有点浪费时间，但这样“见风使舵”的感觉也不赖。

我其实不认为自己对神经网络有什么独特的想法和建议，但就个人体验而言，这是高效的一周，因为我把书本上的知识固化成了真实经验。我的实践模式也很常规：先用hacky代码写一版，再根据视频教程重写一个全新的、整洁的版本，所以两者可以交叉检查，不断优化。

我也曾在反向传播上反复跌倒了好几次，最后得出的经验是比较数值差异非常重要！但很有趣的一点是，即使每个部分好像都错得离谱，神经网络似乎还是能正常训练的——甚至只要大多数时候符号是正确的，它就能不断进步。

对于最终得到的多层神经网络代码，我是很满意的，也产生了未来继续完善的想法。是的，对于这类非常严肃的问题，我一般会直接用已有的第三方库，但在过去的一周内，很多时候我会自己写单独的.cpp文件和.h文件进行编译，这也很方便。

现在我的CNN代码还需要优化，我大概会花几天时间做出一个干净、灵活的实现。之前我把没加进卷积的初始神经网络放到MNIST上测试时，发现它居然比LeCun论文里的结果更好——单个包含100个节点的隐藏层在测试集上的error是2%，论文里的网络更广更深，但它的error有3%。这个发现有点出乎我的意料，最后我总结的原因是激活函数——ReLU和Softmax。

如果要说这一周的学习有什么最精彩的心得，那应该就是神经网络非常简单，它只需寥寥几行代码就能实现突破性的进步。我觉得这和图形学中的光线追踪有异曲同工之妙，只要我们有足够的数据、时间和耐心，追踪与光学表面发生交互作用的光线，得到光线经过路径的物理模型，我们就能生成最先进的图像。

同样的，通过探索一系列训练参数，我也对 overtraining/generalization/regularization 有了更深的理解。回家的前一夜，我开心地调起了超参数。为了保持专注，“训练”肯定比“编译”更糟糕！

现在，看来我得睁大双眼，为自己找个能用上“新技能”的“新工作”啦~明天邮箱里或者办公桌上会不会堆满邀请信呢？好像有点小期待。

小结

读了约翰·卡马克的心得，不知各位读者获得了怎样的体验。作为一名成功的资深程序员，卡马克一直以来对编程的严苛要求是大家有目共睹的，而他这次抽出一周时间，以这么复古的形式学习神经网络，这样的娱乐精神堪称业界清流。这项“新技能”可能不会给他带来世俗的快乐，但在代码中创造世界的乐趣使他感到幸福，卡马克还是那个卡马克。

除此之外，许多国外读者又对深度学习的理论基础产生了担忧。正如文中所说的：“即使每个部分好像都错得离谱，神经网络还是能正常训练——甚至只要大多数时候符号是正确的，它还能不断进步”。就梯度下降而言，理论上我们想要的是不断走“下坡路”，走“上坡路”的结果注定是不理想的。但从长远来看，如果两者都能提高模型预测的结果，那走下坡的意义又在哪里？

这个领域内的所有人都在追求各种进步，担当他们被问及为什么，大多数答案只是“just work”，如果只看结果，我们又该怎么判断哪些是真正重要的东西？也许对于什么是“进步”，我们也需要重新设定一个更严格的标准。

英语原文

After a several year gap, I finally took another week-long programming retreat, where I could work in hermit mode, away from the normal press of work. My wife has been generously offering it to me the last few years, but I’m generally bad at taking vacations from work.

As a change of pace from my current Oculus work, I wanted to write some from-scratch-in-C++ neural network implementations, and I wanted to do it with a strictly base OpenBSD system. Someone remarked that is a pretty random pairing, but it worked out ok.

Despite not having actually used it, I have always been fond of the idea of OpenBSD — a relatively minimal and opinionated system with a cohesive vision and an emphasis on quality and craftsmanship. Linux is a lot of things, but cohesive isn’t one of them.

I’m not a Unix geek. I get around ok, but I am most comfortable developing in Visual Studio on Windows. I thought a week of full immersion work in the old school Unix style would be interesting, even if it meant working at a slower pace. It was sort of an adventure in retro computing — this was fvwm and vi. Not vim, actual BSD vi.

In the end, I didn’t really explore the system all that much, with 95% of my time in just the basic vi / make / gdb operations. I appreciated the good man pages, as I tried to do everything within the self contained system, without resorting to internet searches. Seeing references to 30+ year old things like Tektronix terminals was amusing.

I was a little surprised that the C++ support wasn’t very good. G++ didn’t support C++11, and LLVM C++ didn’t play nicely with gdb. Gdb crashed on me a lot as well, I suspect due to C++ issues. I know you can get more recent versions through ports, but I stuck with using the base system.

In hindsight, I should have just gone full retro and done everything in ANSI C. I do have plenty of days where, like many older programmers, I think “Maybe C++ isn’t as much of a net positive as we assume…”. There is still much that I like, but it isn’t a hardship for me to build small projects in plain C.

Maybe next time I do this I will try to go full emacs, another major culture that I don’t have much exposure to.

I have a decent overview understanding of most machine learning algorithms, and I have done some linear classifier and decision tree work, but for some reason I have avoided neural networks. On some level, I suspect that Deep Learning being so trendy tweaked a little bit of contrarian in me, and I still have a little bit of a reflexive bias against “throw everything at the NN and let it sort it out!”

In the spirit of my retro theme, I had printed out several of Yann LeCun’s old papers and was considering doing everything completely off line, as if I was actually in a mountain cabin somewhere, but I wound up watching a lot of the Stanford CS231N lectures on YouTube, and found them really valuable. Watching lecture videos is something that I very rarely do — it is normally hard for me to feel the time is justified, but on retreat it was great!

I don’t think I have anything particularly insightful to add about neural networks, but it was a very productive week for me, solidifying “book knowledge” into real experience.

I used a common pattern for me: get first results with hacky code, then write a brand new and clean implementation with the lessons learned, so they both exist and can be cross checked.

I initially got backprop wrong both times, comparison with numerical differentiation was critical! It is interesting that things still train even when various parts are pretty wrong — as long as the sign is right most of the time, progress is often made.

I was pretty happy with my multi-layer neural net code; it wound up in a form that I can just drop it into future efforts. Yes, for anything serious I should use an established library, but there are a lot of times when just having a single .cpp and .h file that you wrote ever line of is convenient.

My conv net code just got to the hacky but working phase, I could have used another day or two to make a clean and flexible implementation.

One thing I found interesting was that when testing on MNIST with my initial NN before adding any convolutions, I was getting significantly better results than the non-convolutional NN reported for comparison in LeCun ‘98 — right around 2% error on the test set with a single 100 node hidden layer, versus 3% for both wider and deeper nets back then. I attribute this to the modern best practices —ReLU, Softmax, and better initialization.

This is one of the most fascinating things about NN work — it is all so simple, and the breakthrough advances are often things that can be expressed with just a few lines of code. It feels like there are some similarities with ray tracing in the graphics world, where you can implement a physically based light transport ray tracer quite quickly, and produce state of the art images if you have the data and enough runtime patience.

I got a much better gut-level understanding of overtraining / generalization / regularization by exploring a bunch of training parameters. On the last night before I had to head home, I froze the architecture and just played with hyperparameters. “Training!” Is definitely worse than “Compiling!” for staying focused.

Now I get to keep my eyes open for a work opportunity to use the new skills!

I am dreading what my email and workspace are going to look like when I get into the office tomorrow.

—— by John Carmack

编辑于 2018-03-09 18:51

深度学习（Deep Learning）

神经网络

约翰·卡马克 (John Carmack)

小结

英语原文

文章被以下专栏收录

论智