Making Toonify Yourself 怎样照片卡通化 - 附网站搭建指南

0 / 1335

Making Toonify Yourself 怎样照片卡通化 - 附网站搭建指南

Last touched September 20, 2020

2020年9月20日

If you’d like to keep Toonify Yourself free for everyone to play with, please consider donating to cover running costs at Ko-fi:

So Doron Adler and I recently released our toonification translation model at our Toonify Yourself website. It turned out to be pretty popular with tens of thousands people visiting in the 22 hours it was running for, submitting almost a quarter of a million images for toonification.

所以 Doron Adler 和我最近在我们的 tooify Yourself 网站上发布了我们的 toonification 翻译模型。这张照片在22小时内吸引了成千上万的访问者,提交了将近25万张照片供卡通化。

It got quite a bit of interest on social media, and picked up on a few websites. Unfortunately we had to turn off the toonification server before costs started to get out of hand, but we’re working on bringing it back so people can carry on playing with the model for free.

它在社交媒体上引起了相当大的兴趣,并在一些网站上被转载。不幸的是,在成本开始失控之前,我们不得不关闭了卡通化服务器,但是我们正在努力把它带回来,这样人们就可以继续免费玩这个模型了。

A lot of people have expressed interest in how the model work and how the website was run. So here’s a blog post with some details on the traffic and running costs, as well as the technical details of how to run a deep neural network in the cloud serving tens of thousands of requests an hour!

很多人都对这个模型的工作原理和网站的运行方式表示了兴趣。因此,这里有一篇博文,详细介绍了流量和运行成本,以及如何在云中运行深层神经网络的技术细节,该网络每小时可以处理数以万计的请求!

Making an efficient Toonification model

制作一个有效的卡通化模型

If you want to know about the details of the original Toonification model, see this blog post.

如果你想了解原始卡通化模型的细节,请看这篇博文。

The original Toonification method involved an expensive optimisation process to encode a person’s face using the blended StyleGAN model which can take several minutes to run even on a GPU. Clearly this wasn’t going to cut it as a web app! A common pattern in deep learning is replacing expensive optimisations with more neural networks[1]. We used the basic idea described in StyleGAN2 Distillation for Feed-Forward Image Manipulation[2], i.e. training a pix2pixHD model to apply the transformation to any arbitrary image, rather than first having to perform the optimisation step.

最初的卡通化方法包括一个昂贵的优化过程,使用混合的 StyleGAN 模型来编码一个人的脸部,即使在 GPU 上运行也需要几分钟。很明显,作为一个网络应用程序,这并不能解决问题!深度学习的一个常见模式是用更多的神经网络代替昂贵的优化。我们使用了 StyleGAN2 Distillation for Feed-Forward Image Manipulation [2]中描述的基本思想,即训练一个 pix2pixHD 模型,将变换应用于任意图像,而不是首先执行优化步骤。

Left: Original, Middle: Optimised, Right: pix2pixHDLeft: Original, Middle: Optimised, Right: pix2pixHD

左: 原始,中: 优化,右: pix2pixHD

The novel part here is that the pairs of images we use for the training process are pairs produced by the original FFHQ model and the blended model[3]. Although the pix2pixHD model is only trained on image generated by the two StyleGAN models, when it’s done we should be able to apply it to any image and get the same toonification result

这里的新颖之处在于,我们在训练过程中使用的图像对是由原始的 FFHQ 模型和混合模型产生的图像对。虽然 pix2pixHD 模型只是针对由两个 StyleGAN 模型生成的图像进行训练,但是当它完成后,我们应该能够将它应用到任何图像上,并得到相同的卡通化结果

It even works on paintings!It even works on paintings!

它甚至可以用在绘画上!

Deploying the model

部署模型

So after the initial interest on Twitter about my experiments putting together a local web app for myself to run the Toonify model I decided to take a crack at putting up a website where anybody could run it on their own images.

因此,在 Twitter 上最初对我为自己整合一个本地网络应用程序来运行 tooify 模式感兴趣之后,我决定尝试建立一个网站,任何人都可以在自己的图片上运行它。

First things first I wasn’t going to be running anything on a GPU, so I needed to get the pix2pixHD model runnable on a CPU. The original pix2pixHD repo has some bugs which prevent inference without a GPU and I fixed these on my fork if anyone is interested. In the end I actually decided to export the model to ONNX format so I could run it using the ONNX runtime. This makes the dependencies more lightweight than trying to run using PyTorch and (I hope) the ONNX runtime is built with performance in mind.

首先,我不打算在 GPU 上运行任何东西,所以我需要在 CPU 上运行 pix2pixHD 模型。原来的 pix2pixHD repo 有一些错误,防止推理没有 GPU 和我修复这些在我的叉子,如果有人感兴趣。最后,我决定将模型导出到 ONNX 格式,这样我就可以使用 ONNX 运行时来运行它。这使得依赖关系比尝试使用 PyTorch 运行更轻量级,而且(我希望) ONNX 运行时在构建时考虑了性能。

I went for Google Cloud Run as a means of deploying the web app. All the thing needs to do is accept an image, run inference, and return the result. It’s totally stateless so a good fit for the Cloud Run model. The use of Docker containers in Cloud Run meant that it was easy for me to bundle up any required dependencies, scalability was built right in and there is a generous free allowance (not generous enough it would turn out!).

我选择了 Google Cloud Run 作为部署 web 应用程序的一种方式。所需要做的就是接受一个映像,运行推理,然后返回结果。它是完全无状态的,所以非常适合 Cloud Run 模型。在 Cloud Run 中使用 Docker 容器意味着我可以很容易地捆绑任何所需的依赖关系,可伸缩性是内置的,并且有一个很大的免费额度(不够慷慨!).

So after a few evenings of putting together a small app using Flask and Bootstrap, things were ready to deploy!

所以,在使用 Flask 和 Bootstrap 编写了一个小应用程序几个晚上之后,就可以部署了!

Toonification in the Wild

野外的卡通化作用

So after some beta testing with friends I announced the release of the Toonify Yourself website on Twitter. It quickly got some reasonable traffic and people seemed to be enjoying trying the model out on themselves.

因此,在和朋友们进行了一些测试后,我在 Twitter 上宣布了 tooify Yourself 网站的发布。它很快获得了一些合理的流量,人们似乎很享受在自己身上试验这个模型。

toonify

Some were complaining that their faces were never detected no matter what they submitted, and I fairly quickly figured out (and many helpful people online started to point out) that it was an issue with image rotation on iPhones[4].

有些人抱怨说,无论他们提交什么,都没有检测到他们的脸,我很快就发现(许多网上的热心人士开始指出) ,这是 iphone 上图像旋转的问题[4]。

By the next morning traffic started to really pick up, partly due to getting on the front page of Hacker News. I was starting to get a little bit twitchy seeing the number of containers spun up on Cloud Run steadily increasing. As lunch time approached we were getting close to 25,000 page views an hour, at times this was requiring 100 containers to service the traffic, and thing were going up fast.

到第二天早上,流量开始真正回升,部分原因是黑客新闻的头版。看到云运行上的容器数量稳步增加,我开始有点焦虑。随着午餐时间的临近,我们的页面访问量接近每小时25,000次,有时需要100个集装箱来满足交通需求,而且访问量增长很快。

page views

The measly number of free cpu and ram minutes had long since evaporated, and I was getting a little concerned about what the cloud bill was going to be after I came back from an afternoon out of the house. So rather than limit things to a level that most people would get a non-response from the site, I decided to turn off the model and switch to an apology message.

微不足道的空闲 cpu 和内存分钟早就消失殆尽了,我开始有点担心我下午出门回来后云账单会变成什么样子。因此,我决定关闭这个模型,转而发送一条道歉信息,而不是把事情限制在大多数人从网站得不到回复的程度。

page views

The numbers

数字

In the end we had close to a quarter of a million page views before I shut down the toonification. Not every one of these corresponds to someone submitting an image, but it’s not far off, and each user submitted around 3 or 4 images for toonification.

最后,在我关闭卡通化程序之前,我们的页面浏览量接近25万。并不是每一张图片都对应某人提交的图片,但也不是很遥远,每个用户提交了大约3到4张图片以供卡通化。

So how much did this all cost? Not quite as much as I thought, I had set a personal limit of spending around $100 on this and I didn’t break that. But here are some details of what it costs to run a service like this.

那么这一切花了多少钱呢?没有我想象的那么多,我设置了一个个人的花费上限,大约100美元,我没有打破。但是这里有一些运行这样一个服务的成本细节。

The model takes about 5 seconds to run inference on whatever hardware Cloud Run uses, and occupies around 1.4 GB of memory whilst doing it. It also takes a slightly astonishing 20 seconds to load the model the first time a container is brought up (and this happens more than I’d like), and the memory peaks at well over 2GB during this period. All this meant that processing a thousand images probably costs around 30 cents (footnote: there are also some other smaller costs like network egress to think about, but that’s not much extra), which isn’t too bad, but when you’re trying to process 25,000 an hour, starts to add up fast!

该模型在 Cloud Run 使用的任何硬件上运行 inference 大约需要5秒钟,并且在运行时占用大约1.4 GB 的内存。在第一次启动容器的时候,加载模型也要花费20秒的时间(这种情况比我想象的要多) ,而且在这段时间内,内存的峰值远远超过2gb。所有这些都意味着处理一千张图片的成本大概在30美分左右(脚注: 还有一些其他更小的成本,比如网络出口需要考虑,但这并不是太多的额外费用) ,这还不算太糟糕,但是当你试图每小时处理25000张图片时,开始加起来很快!

I’m still pretty amazed that it was so easy to build a site which could service so much traffic and do some serious image processing in the cloud. I’ve never used any of this scalable serverless technology before, but it was incredibly easy to get going!

我仍然非常惊讶,建立一个网站是如此容易,可以服务这么多的流量和做一些严重的图像处理云。我以前从来没有使用过这种可扩展的无服务器技术,但是这种技术非常容易使用!

Feedback

反馈

A lot of people commented that the images produced didn’t preserve enough of the original character of the person, that they ended up looking pretty generic, and that a human cartoonist could do a far better job. I fully agree, there is not way deep learning is going to outperform a skilled artist any time soon! But there is also no way you could get skilled human artists to make cartoon versions of people for 30 cents per thousand.

很多人评论说,这些照片没有足够地保留这个人的原始性格,最终看起来很普通,一个人类漫画家可以做得更好。我完全同意,深度学习不可能在短时间内超越一个熟练的艺术家!但是你也没有办法让有经验的人类艺术家以每千人30美分的价格制作卡通人物。

Despite me putting a line in the FAQ assuring people I was not storing or collecting their images (how on earth would I have afforded that!?) several people commented on it with scepticism, surely something free online must be harvesting your data somehow? But honestly once the toonification was done the original image and the result were gone forever (for me at least). Plus I don’t really see why people were worried, people have already happily uploaded millions of photos of themselves to the internet and social media sites, who explicitly do make money from your data. If companies want to collect face data, a silly website like mine is not the way to do it!

尽管我在 FAQ 上写了一行,向人们保证我没有储存或者收集他们的图片(我怎么可能负担得起呢?)一些人对此持怀疑态度,网上一定有免费的东西在收集你的数据吧?但老实说,一旦卡通化完成了原始图像和结果永远消失了(至少对我来说)。另外,我真的不明白为什么人们会担心,人们已经很高兴地将数百万张自己的照片上传到互联网和社交媒体网站上,而这些网站明确地从你的数据中赚钱。如果公司想要收集面部数据,像我这样愚蠢的网站是不可行的!

The future

未来

UPDATE: Toonify Yourself is back thanks to the fantastic support of generous supporters on Ko-fi, as well as model hosting thanks to DeepAI. There’s still costs involved in running the site, so if you’d like to help keep things free for everyone to play with please think about supporting me on Ko-fi.

更新: tooify Yourself 回来了,这要感谢 Ko-fi 上慷慨的支持者们的大力支持,也要感谢 DeepAI 的模特托管服务。运营这个网站仍然需要成本,所以如果你想让每个人都可以免费玩,请考虑在 Ko-fi 上支持我。

ko-fi

Twitter

推特

Lots of people shared fun examples on Twitter, here are a few:

很多人在 Twitter 上分享了一些有趣的例子,这里有一些:

Coverage

覆盖范围

Here are some links to the coverage this got.

这里有一些链接到这得到的报道。

and many many more…

还有更多..。

I was even interviewed on the excellent Cold Fusion YouTube channel:

我甚至接受了优秀的 Cold Fusion YouTube 频道的采访:


  1. For a classic example see the work on neural Style Transfer. 有关一个经典的例子,请参阅关于神经样式转换的工作
  2. Viazovetskyi, Yuri, Vladimir Ivashkin, and Evgeny Kashin. ‘StyleGAN2 Distillation for Feed-Forward Image Manipulation’. ArXiv:2003.03581 维亚佐维特斯基,尤里,弗拉基米尔 · 伊瓦什金,和叶夫根尼 · 卡申. ‘ StyleGAN2 Distillation for Feed-Forward Image Manipulation’. ArXiv: 2003.03581[Cs] [ Cs ], 7 March 2020. ,2020年3月7日http://arxiv.org/abs/2003.03581.
  3. Doron actually spent some time assembling new datasets and training a new models so the results you see are a bit different to the ones I originally shared. 多伦实际上花了一些时间组装新的数据集和训练一个新的模型,所以你看到的结果与我最初分享的有点不同
  4. Portrait images are actually saved as landscape on iPhone and the rotation is embedded in the metadata. You can apply this rotation in Pillow using 人像图像实际上保存为横向在 iPhone 和旋转嵌入在元数据。你可以在枕头上使用这个旋转ImageOps.exif_transpose