How does the average employee of Novatec look like? In this post we want to investigate deeper generative models in order to solve this question. Generative Adversarial Nets (GANs) can be understood as an adversarial process to estimate generative models. Here, we show how we trained such a GAN model in Python so that it creates some fake images from Novatec employees. But first, let’s take a look at the main idea behind GANs and then we’ll turn back to our question/problem.

## What are Generative Adversarial Networks?

According to Yann LeCunn, Facebook’s AI research director: Generative Adversarial Networks are “the most interesting idea in the last 10 years in ML” (2016). GANs were introduced by Ian Goodfellow in 2014 in this paper (click here) and their main feature is the deep neural net architecture comprised of two separate nets defined by default as Multi-Layer-Perceptron, pitting one against the other.

### How GANs work?

As the name suggests, a GAN system is created from two adversarial models: a generator and a discriminator. The generative part learns the distribution of the data and is trying to fool the discriminator by creating some fake samples. The counterpart is represented by the discriminator, which classifies his input as a real or fake sample. In fact, the discriminative part learns the boundary between classes and evaluates the input data (alternating data from the generator and the real dataset) for authenticity.

It’s a ‘Min-Max-game’: The generator wants to minimize the success value of the discriminator. The discriminator tries to maximize this value. A great metaphor to understand the mechanism of GANs is the forger-detective-metaphor of Goodfellow. The generator was a team of forgers trying to generate fake paintings, while the discriminator was a team of detectives trying to tell the difference between real and fake. The forgers never get to see the real paintings — only the feedback of the detectives. They are *blind* forgers.

Over time both models get better until the generator gets a master forger and the discriminator cannot say if the input is real or fake. At this point, the discriminator’s probability should be always around 50/50. Given that the trained generator becomes over time very powerful in generating realistic outputs which cannot be distinguished from the training data.

## Implementation of the GAN model

As an implementation example, we’ve developed an application to generate some fake Novatec employee images. To do so, we fed real Novatec employee images into the discriminator and trained the generator to be able to create some fake images after the training.

### Setting up

In order to achieve this model, we used the code from this repository: https://github.com/FelixMohr/Deep-learning-with-Python/blob/master/DCGAN-face-creation.ipynb. Let’s take a look at the code and check how to build a GANs model.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
tf.reset_default_graph() batch_size = 16 n_noise = 16 X_in = tf.placeholder(dtype=tf.float32, shape=[None, 40, 40, 3], name='X') noise = tf.placeholder(dtype=tf.float32, shape=[None, n_noise]) keep_prob = tf.placeholder(dtype=tf.float32, name='keep_prob') is_training = tf.placeholder(dtype=tf.bool, name='is_training') def lrelu(x): return tf.maximum(x, tf.multiply(x, 0.2)) def binary_cross_entropy(x, z): eps = 1e-12 return (-(x * tf.log(z + eps) + (1. - x) * tf.log(1. - z + eps))) |

The dataset we used was a collection of pictures all of them of Novatec employees. In order to allow a reasonable training time, we downscaled the image data to 40 x 40 pixels. Our hyperparameters can be found in the following table:

Hyperparameter |
Value |

Activation function
Dropout rate Feature maps per filter (Generator) Filter width Mini-batch size Random noise |
leaky ReLU
0.6 256, 128, 64 256, 128, 64, 3 5 16 16 |

At the moment of running our experiments, the leaky ReLU function wasn’t supported by TensorFlow. Therefore, we took the self-implemented function from the repository. We also ran experiments using the standard ReLU but we only got generated black squares as output.

### Discriminator

First, we implement our detective – the discriminator. On the one hand, it takes real employee images from the data set as input. On the other hand, it is also fed with some fake images created by our generator. Here we don’t apply the default MLP architecture but we use a series of convolutions, which results on a special type of GAN, called Deep Convolutional Generative Adversarial Network or simpler DCGAN. We use sigmoid as the activation function for the last layer calculate the probability of the input image being a real profile picture of a Novatec employee.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
def discriminator(img_in, reuse=None, keep_prob=keep_prob): activation = lrelu with tf.variable_scope("discriminator", reuse=reuse): x = tf.reshape(img_in, shape=[-1, 40, 40, 3]) x = tf.layers.conv2d(x, kernel_size=5, filters=256, strides=2, padding='same', activation=activation) x = tf.layers.dropout(x, keep_prob) x = tf.layers.conv2d(x, kernel_size=5, filters=128, strides=1, padding='same', activation=activation) x = tf.layers.dropout(x, keep_prob) x = tf.layers.conv2d(x, kernel_size=5, filters=64, strides=1, padding='same', activation=activation) x = tf.layers.dropout(x, keep_prob) x = tf.contrib.layers.flatten(x) x = tf.layers.dense(x, units=128, activation=activation) x = tf.layers.dense(x, units=1, activation=tf.nn.sigmoid) return x |

### Generator

The generator, our blind forger, takes random noise and learns to transform this noise into images looking as similar as possible as real training examples. The parameters of the generator have to be tuned to train him effectively. For example, we included batch normalization and tried to find the right dimensions within the different layers. Here is how we got our best result:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
def generator(z, keep_prob=keep_prob, is_training=is_training): activation = lrelu momentum = 0.9 with tf.variable_scope("generator", reuse=None): x = z d1 = 4 d2 = 3 x = tf.layers.dense(x, units=d1 * d1 * d2, activation=activation) x = tf.layers.dropout(x, keep_prob) x = tf.contrib.layers.batch_norm(x, is_training=is_training, decay=momentum) x = tf.reshape(x, shape=[-1, d1, d1, d2]) x = tf.image.resize_images(x, size=[10, 10]) x = tf.layers.conv2d_transpose(x, kernel_size=5, filters=256, strides=2, padding='same', activation=activation) x = tf.layers.dropout(x, keep_prob) x = tf.contrib.layers.batch_norm(x, is_training=is_training, decay=momentum) x = tf.layers.conv2d_transpose(x, kernel_size=5, filters=128, strides=2, padding='same', activation=activation) x = tf.layers.dropout(x, keep_prob) x = tf.contrib.layers.batch_norm(x, is_training=is_training, decay=momentum) x = tf.layers.conv2d_transpose(x, kernel_size=5, filters=64, strides=1, padding='same', activation=activation) x = tf.layers.dropout(x, keep_prob) x = tf.contrib.layers.batch_norm(x, is_training=is_training, decay=momentum) x = tf.layers.conv2d_transpose(x, kernel_size=5, filters=3, strides=1, padding='same', activation=tf.nn.sigmoid) return x |

### Losses

After defining the discriminator and the generator functions, we initiate and put them together. However, it is necessary to create two discriminator objects, one for real images and one for the fake images the generator makes. The idea is that both discriminators share their variables. Therefore, the reuse-boolean has to be set to True. Out of these objects we need to calculate losses. We need one loss for real images, when the discriminator learns to compute values near one, which means the image is real. The other loss function is for fake images and values near zero. In that case the discriminator is confident the image comes from the generator and is fake.

In contrast to the discriminator, the generator tries to make the discriminator to fail, so he assigns values near one to fake images. To save and restore our model afterwards we implement a saver object.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
d_real = discriminator(X_in) d_fake = discriminator(g, reuse=True) vars_g = [var for var in tf.trainable_variables() if var.name.startswith("generator")] vars_d = [var for var in tf.trainable_variables() if var.name.startswith("discriminator")] d_reg = tf.contrib.layers.apply_regularization(tf.contrib.layers.l2_regularizer(1e-6), vars_d) g_reg = tf.contrib.layers.apply_regularization(tf.contrib.layers.l2_regularizer(1e-6), vars_g) loss_d_real = binary_cross_entropy(tf.ones_like(d_real), d_real) loss_d_fake = binary_cross_entropy(tf.zeros_like(d_fake), d_fake) loss_g = tf.reduce_mean(binary_cross_entropy(tf.ones_like(d_fake), d_fake)) loss_d = tf.reduce_mean(0.5 * (loss_d_real + loss_d_fake)) update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): optimizer_d = tf.train.RMSPropOptimizer(learning_rate=0.0001).minimize(loss_d + d_reg, var_list=vars_d) optimizer_g = tf.train.RMSPropOptimizer(learning_rate=0.0002).minimize(loss_g + g_reg, var_list=vars_g) saver = tf.train.Saver() sess = tf.Session() sess.run(tf.global_variables_initializer()) saver.restore(sess,'model.ckpt') |

### Training

Now let’s train our net! We input random noise in our generator, who will learn to create employee images using that noise. We apply loss balancing, so that the generator and the discriminator learn constantly and none of them becomes much stronger than the other.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
for i in range(60000): train_d = True train_g = True keep_prob_train = 0.6 n = np.random.uniform(0.0, 1.0, [batch_size, n_noise]).astype(np.float32) batch = [b for b in next_batch(num=batch_size)] d_real_ls, d_fake_ls, g_ls, d_ls = sess.run([loss_d_real, loss_d_fake, loss_g, loss_d], feed_dict={X_in: batch, noise: n, keep_prob: keep_prob_train, is_training:True}) d_fake_ls_init = d_fake_ls d_real_ls = np.mean(d_real_ls) d_fake_ls = np.mean(d_fake_ls) g_ls = g_ls d_ls = d_ls if g_ls * 1.35 < d_ls: train_g = False pass if d_ls * 1.35 < g_ls: train_d = False pass if train_d: sess.run(optimizer_d, feed_dict={noise: n, X_in: batch, keep_prob: keep_prob_train, is_training:True}) if train_g: sess.run(optimizer_g, feed_dict={noise: n, keep_prob: keep_prob_train, is_training:True}) |

### Results

Here are the images drawn by our generator after a training duration of 20 hours on a GPU Tesla K80. Given the fact that our training images were rescaled to 40 x 40 pixels, our generated images don’t have a good resolution. Please notice that if you don’t have a strong GPU, it would take much longer. Our next step will be running some experiments with images of higher resolution.

If you take a look at the images, you may recognize some new Novatec coworkers. They consist of mixed features of real Novatec employees. Imagine one of these could be one of our teammates, with whom you want to spend your next coffee break! Let’s keep in mind that the neural network never has seen any people and especially any Novatec employee before, and how little effort we had to put into the implementation to develop a model that knows what characterizes a Novatecler.

We, the ML community of Novatec, think that this technique obviously has a lot of potential for new amazing applications in the future. And yes, we agree with Mr. LeCunn’s opinion: GANs are the most interesting idea in the last decade in terms of ML.