このページは http://www.slideshare.net/enakai/dcgan-how-does-it-work の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

約1ヶ月前 (2016/09/23)にアップロードinテクノロジー

Explaining the basic mechanism of DNGAN.

Panned to be presented at TensorFlow study meetup (5) i...

Explaining the basic mechanism of DNGAN.

Panned to be presented at TensorFlow study meetup (5) in Tokyo.

http://connpass.com/event/38073/

2016/09/23 ver1.0 Upload

2016/09/26 ver1.1 Correct some wordings

- DCGAN How does it work?

Etsuji Nakai

Cloud Solutions Architect at Google

2016/09/26 ver1.1

GIF Animation

https://goo.gl/zXL1bV

Google confidential | Do not distribute - What is DCGAN?
- What is DCGAN?

▪ DCGAN: Deep Convolutional Generative Adversarial Networks

● It works in the opposite direction of the image classifier (CNN).

● CNN transforms an image to a class label (list of probabilities).

● DCGAN generates an image from random parameters.

DCGAN

CNN

Random parameters

deer dog cat human ...

(0.01, 0.05, 0.91, 0.02, ...)

(0.01, 0.05, 0.91, 0.02, ...)

What do these

Probabilities of each entry.

numbers mean? - Examples of Convolutional Filters

▪ Convolutional filters are ... just an image filter you sometimes apply in Photoshop!

Filter to blur images

Filter to extract vertical edges - Convolutional Filters in CNN

▪ CNN applies a lot of filters to extract various features from a single image.

▪ CNN applies multi-layered filters to a single image (to extract features of

features?)

▪ A filtered image becomes smaller to drop off unnecessary details.

Extracting vertical and horizontal edges using two filters. - Convolutional Filters in CNN

▪ This shows how filters are

Filter A

applied to a multi-layered image.

Output image A

Input image

Filter B

Output image B

Apply independent

filters to each layer

Sum up resulting images

from each layer - Typical CNN Filtering Layers

▪ Starting from a single RGB image on the right, multiple filtering layers are applied

to produce smaller (and more) images.

128 layers of

RGB layers of a

32x32 images.

256 layers of

single 64x64 image.

16x16 images.

A list of

・・・

probabilities

http://arxiv.org/abs/1511.06434 - Image Generation Flow of DCGAN

▪ Basically, it's just flipping the direction. No magic!

・・・

512 layers of

RGB layers of a

8x8 images.

single 64x64 image.

1024 layers of

A list of random

4x4 images.

numbers

http://arxiv.org/abs/1511.06434 - Illustration of Convolution Operations

▪ Convolutional filters in CNN and transposed-convolutional filters in DCGAN works

in the opposite directions. Here's a good Illustration how they work.

Convolution:

Transposed-convolution:

(Up to) 3x3 blue pixels contribute to

A single green pixel contributes to

generate a single green pixel. Each

generate (up to) 3x3 blue pixels.

of 3x3 blue pixels is multiplied by

Each green pixel is multiplied by

the corresponding filter value, and

each of 3x3 filter values, and the

the results from different blue

results from different green pixels

pixels are summed up to be a

are summed up to be a single blue

single green pixel.

pixel.

GIF Animation

https://goo.gl/tAY4BL

http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html - Training Strategy of DCGAN

▪ We train two models simultaneously.

● CNN: Classifying authentic and fake images.

●

"Authentic" images are provided as training data to CNN.

● DCGAN: Trained to generate images classified as authentic by CNN.

●

By trying to fool CNN, DCGAN learns to generate images similar to the training data.

It's a fake!

CNN

DCGAN

Training data - Training Loop of DCGAN

Random numbers

Modify parameters such that

P(A) becomes large

▪ By repeating this loop, CNN

DCGAN

becomes more accurate and

Generated image A

DCGAN becomes more crafty.

P(A) : Probability that

CNN

Training data B

A is authentic.

P(B) : Probability that

B is authentic.

Modify parameters such that

P(A) becomes small

and P(B) becomes large - Model

▪ Training data : MNIST (28x28 pixels, grayscale images)

▪ DCGAN : Generate a single 28x28 image from 64 parameters.

● → 128 x (7x7) → 64 x (14x14) → 1 x (28x28)

▪ CNN : Calculate a probability that a single 28x28 image is authentic.

● 1 x (28x28) → 64 x (14✕14) → 128 x (7x7) → Probability of authentic image

▪ Batch size : 32

● Modify filter parameters using 32 generated images and 32 MNIST images at a

time. - Learning Process

▪ This shows the evolution of images

generated from the same input parameters

during the training loop. (DCGAN's filters are

initialized with random values.) - Playing with Input Parameters

▪ If we change the input parameter, the shape of generated image changes too. By

making small, contiguous changes to the input, we can achieve a morphing effect.

▪ Since the input parameter is a point in the 64 dimensional space, we can draw a

straight line between two points. The end points represent images before and

after morphing. - Playing with Input Parameters

▪ Using more complicated closed loop in the parameter space, we can even make a

dancing image :)

▪ The sample image on this page is generated from the trajectory over a sphere

(embedded in the 64 dimensional space.)

GIF Animation

https://goo.gl/zXL1bV - Interpretation of Input Parameters

▪ In the DCGAN paper, it is suggested that the input parameters could use a

semantic structure as in the following example.

Smile

Smiling Woman

Smiling Man

Woman

Man

Neutral Woman

Neutral Man

Neutral

http://arxiv.org/abs/1511.06434 - Thank you!