I recently wanted to try deep learning so I started learning Chainer a month or two ago. As I'm learning it, I decided to experiment with coloring some anime sketches just for the fun of it.
The task of coloring sketches belongs to the supervised learning category of machine learning. And as all machine learning tasks, it is crucial to have large amounts of training data.
I used opencv to get the rough sketches from already colored images. Here's one example:
I finished gathering images of anime characters and
made the dataset (The size of the dataset is around 600,000)
Regarding the network structure, I used a network called "U-net", which has an interconnected structure of convolution and deconvolution layers. With this structure, the convolutional neural network can learn to use the semantic information after convolution as well as some less processed information before convolution. (Translator note: please check the source code and the U-net paper for details) The loss is simply the l2 loss between the generated image and the colored image.
When defining the network using chainer, I understood that generally it should be ok if each layer's output is the same as the input of the next layer, but I still found it hard to use a dataset that I gathered without any examples.
I trained for roughly one night and the result was as follows:↓
Hum. It seemed like the network is trying to tell me "I understood roughly what is the color of the skin but I don’t get the rest. How could I possibly know the color of the hair or the color of the clothes?"
Here comes the Adversarial Network to save the day. l call it the Ad-boss.
The Ad-boss learns the difference between the real image and the generated image, and tries to tell them apart from each other. If the generator network keep producing images with sepia tone and only color the skin, it will just throw the generated images back at the generator's face and tell it to work harder. (Translator: I kept thinking of those manga painters who got rejected by firms and magazines. I imagine it's not a very nice scene when manga painters are rejected in Japan…)
But if the Ad-boss is too harsh, the generator network will just rebel and output weird-looking paintings like the one below:
There is color but now it is hard to tell where the sketches are. You might call it a type of modern art but let's not go towards that side path just yet.
Here are the result after the adversarial network spent a while learning the difference between generated images and real images.
Let there be color!
But that's not the full power of CNN combined with Adversarial net yet. More lies ahead so keep it up.
At the first stage, I shrunk the 512x512 sketches
128x128 images and colored them. Now at the second stage, I let the generator
network color 512x512 images with some extra hints as additional layers to feed
in. (Translator: more details about the hints will be mentioned
Here are the results without adversarial network:
Pretty good right?
Not bad at all!
It seems pretty good at coloring images in the test
set but what about real sketches? I borrowed sketches from pixiv with tag
"Feel free to color it". (Since it uses CNN changes in height:width
ratio does not affect the output).
It ends without too much difficulty (Translator: the author made it sound super easy, but it's not...)
As a second thought, isn't it better that we can at
least do some part of the coloring? So I changed the input from one color
channel to four color channels, adding hints for what color to use.
I can make requests like: "I want her to have brown hair with light blue sweater".
It's ok that the hint given is not so exact. The network can adapt.
(You see, sometimes I feel that it's already good enough to have sketches like this, but it's even better with colors. )
It is also possible to give a lot of more detailed hints. It might be hard to understand, but if you just drop some colored dots here and there:
Merry Christmas! (Translater: it's almost February now… I was procrastinating over the last month).
Now I'm promoted from an engineer to a painter!
I feel that now the result looks promising enough with
program to automatically color sketches, it’s time to wrap up.
I don't think it can compete with painters' work just yet, but if you just want to roughly color it, it's pretty easy to use. Much easier than using toner to color Manga.
(The network is very good at skin colors… You know what I mean? →.→)
It does have some weak points though. For example if you try to use both adversarial network and the hint, the hint will affect the adversarial network too much that the result becomes unstable.
↑I just wanted
to color the bikini, but the rest was affected by accident.
If you just want to use it as a simple tool, maybe it's better to just train with hints.
Other side notes: since it learned to color shrunk sketches, if the lines are too thin/fat, the color might just go over the sketch lines and it gets pretty bad after that. There could also be cases when giving details hints won't improve the result.
It might be cool to have one neural network that solves all kinds of specific problems, but you'll still need to adjust it for whatever purposes you use the network for.
Here are the address from where I borrowed sketches.
** Regarding the dataset and the sketches, I was not immediately able to find where I took them from. I apologize for the inconvenience.
Here are the structures of the network. I used the same structure for both the first and the second stage.
c0 = L.Convolution2D(4, 32, 3, 1, 1),
c1 = L.Convolution2D(32, 64, 4, 2, 1),
c2 = L.Convolution2D(64, 64, 3, 1, 1),
c3 = L.Convolution2D(64, 128, 4, 2, 1),
c4 = L.Convolution2D(128, 128, 3, 1, 1),
c5 = L.Convolution2D(128, 256, 4, 2, 1),
c6 = L.Convolution2D(256, 256, 3, 1, 1),
c7 = L.Convolution2D(256, 512, 4, 2, 1),
c8 = L.Convolution2D(512, 512, 3, 1, 1),
dc8 = L.Deconvolution2D(1024, 512, 4, 2, 1),
dc7 = L.Convolution2D(512, 256, 3, 1, 1),
dc6 = L.Deconvolution2D(512, 256, 4, 2, 1),
dc5 = L.Convolution2D(256, 128, 3, 1, 1),
dc4 = L.Deconvolution2D(256, 128, 4, 2, 1),
dc3 = L.Convolution2D(128, 64, 3, 1, 1),
dc2 = L.Deconvolution2D(128, 64, 4, 2, 1),
dc1 = L.Convolution2D(64, 32, 3, 1, 1),
dc0 = L.Convolution2D(64, 3, 3, 1, 1),
bnc0 = L.BatchNormalization(32),
bnc1 = L.BatchNormalization(64),
bnc2 = L.BatchNormalization(64),
bnc3 = L.BatchNormalization(128),
bnc4 = L.BatchNormalization(128),
bnc5 = L.BatchNormalization(256),
bnc6 = L.BatchNormalization(256),
bnc7 = L.BatchNormalization(512),
bnc8 = L.BatchNormalization(512),
bnd8 = L.BatchNormalization(512),
bnd7 = L.BatchNormalization(256),
bnd6 = L.BatchNormalization(256),
bnd5 = L.BatchNormalization(128),
bnd4 = L.BatchNormalization(128),
bnd3 = L.BatchNormalization(64),
bnd2 = L.BatchNormalization(64),
bnd1 = L.BatchNormalization(32)
test = False):
e0 = F.relu(self.bnc0(self.c0(x), test=test))
e1 = F.relu(self.bnc1(self.c1(e0), test=test))
e2 = F.relu(self.bnc2(self.c2(e1), test=test))
e3 = F.relu(self.bnc3(self.c3(e2), test=test))
e4 = F.relu(self.bnc4(self.c4(e3), test=test))
e5 = F.relu(self.bnc5(self.c5(e4), test=test))
e6 = F.relu(self.bnc6(self.c6(e5), test=test))
e7 = F.relu(self.bnc7(self.c7(e6), test=test))
e8 = F.relu(self.bnc8(self.c8(e7), test=test))
d8 = F.relu(self.bnd8(self.dc8(F.concat([e7, e8])), test=test))
d7 = F.relu(self.bnd7(self.dc7(d8), test=test))
d6 = F.relu(self.bnd6(self.dc6(F.concat([e6, d7])), test=test))
d5 = F.relu(self.bnd5(self.dc5(d6), test=test))
d4 = F.relu(self.bnd4(self.dc4(F.concat([e4, d5])), test=test))
d3 = F.relu(self.bnd3(self.dc3(d4), test=test))
d2 = F.relu(self.bnd2(self.dc2(F.concat([e2, d3])), test=test))
d1 = F.relu(self.bnd1(self.dc1(d2), test=test))
d0 = self.dc0(F.concat([e0, d1]))
c1 = L.Convolution2D(3, 32, 4, 2, 1),
c2 = L.Convolution2D(32, 32, 3, 1, 1),
c3 = L.Convolution2D(32, 64, 4, 2, 1),
c4 = L.Convolution2D(64, 64, 3, 1, 1),
c5 = L.Convolution2D(64, 128, 4, 2, 1),
c6 = L.Convolution2D(128, 128, 3, 1, 1),
c7 = L.Convolution2D(128, 256, 4, 2, 1),
l8l = L.Linear(None, 2, wscale=0.02*math.sqrt(8*8*256)),
bnc1 = L.BatchNormalization(32),
bnc2 = L.BatchNormalization(32),
bnc3 = L.BatchNormalization(64),
bnc4 = L.BatchNormalization(64),
bnc5 = L.BatchNormalization(128),
bnc6 = L.BatchNormalization(128),
bnc7 = L.BatchNormalization(256),
test = False):
h = F.relu(self.bnc1(self.c1(x), test=test))
h = F.relu(self.bnc2(self.c2(h), test=test))
h = F.relu(self.bnc3(self.c3(h), test=test))
h = F.relu(self.bnc4(self.c4(h), test=test))
h = F.relu(self.bnc5(self.c5(h), test=test))
h = F.relu(self.bnc6(self.c6(h), test=test))
h = F.relu(self.bnc7(self.c7(h), test=test))
Notes from the translator:
I was actually working on something similar to that when I found this blog. The result was so good that I decided to reimplement it myself. At that time the author did not release his source code yet and I still have not successfully reproduced his result. I personally think it's amazing that he can learn deep learning and get something so impressive within 2 months. Making manga and animation is heavy labor work so this has lots of potential. The author released the code as well as a website to test it out here:
If you would like to post this somewhere else, please cite the original author "taizan" and the blog "初心者がchainerで線画着色してみた。わりとできた。 - Qiita"
If you have any questions regarding the translation, please don't hesitate to contact me at jiamingli2017(at)http://u.northwestern.edu or jerrylijiaming(at)gmail.com. I'll try my best to respond to each email in 1~3 days. I'll also check the post from time to time but no guarantees on when I'll reply.
Here's the link to the Chinese translation in case you need it: 知乎专栏