自动编码器如何工作？

自编码器是数据集中的模型，它通过利用神经网络的极端非线性来找到低维表示。自编码器由两部分组成：

Encoder – This transforms the input (high-dimensional into a code that is crisp and short.
Decoder – This transforms the shortcode into a high-dimensional input.

编程需要懂一点英语

假设从数据生成过程中， pdata(x) ，如果 X 是一组抽取的样本。假设 xi >>n;但是，不要对支撑结构进行任何限制。例如，对于 RGB 图像，xi >> n×m×3。

这是通用自动编码器的简单说明：

对于 p 维向量编码，参数化函数，e(•) 是编码器的定义：

以类似的方式，解码器是另一个参数化函数d(•)：

因此，当给定一个输入样本 xi，一个完整的自动编码器，一个合并函数，将提供最好的替代输出：

自编码器经常使用基于均方误差成本函数的反向传播算法进行训练，原因是自编码器通常通过神经网络应用。

另一方面，如果您考虑数据生成的过程，您可以查看参数化条件分布 q(•) 来重申目标：

这导致成本函数发展为 pdata(•) 和 q(•) 之间的 Kullback-Leibler 散度：

使用优化过程，可以排除pdata ，因为它的熵是恒定的。现在， pdata和 q 和散度之间的交叉熵的最小化是相等的。 Kullback-Leibler 成本函数和均方误差相等。如果假设pdata和 q 是高斯分布的，则可以互换这两种方法。

在某些情况下，您可以为pdata和 q 实现伯努利分布。但是，这只有在将数据范围标准化为 (0, 1) 时才有可能。这在正式的注释上并不完全正确，尽管因为伯努利分布是二进制的并且xi ? {0, 1}d .使用 sigmoid 输出单元也会导致对连续样本的有效优化， xi? (0, 1)d 。现在，成本函数将如下所示：

实现深度卷积自动编码器——

现在让我们看一个基于 TensorFlow 的深度卷积自动编码器的示例。我们将使用 Olivetti 人脸数据集，因为它体积小、适合目的并且包含许多表达式。

步骤 #1：加载 400 个 64 × 64 灰度图像样本以准备训练集：

from sklearn.datasets import fetch_olivetti_faces
  
faces = fetch_olivetti_faces(shuffle=True, random_state=1000)
X_train = faces['images']

第 2 步：现在，为了提高计算速度，我们将它们的大小调整为 32 × 32。这也有助于避免任何内存问题。我们可能会失去轻微的视觉精度。请注意，如果您有大量计算资源，则可以跳过此步骤。第 3 步：让我们定义主要常量。

-> number of epochs (nb_epochs)
-> batch_size
-> code_length
-> graph

import tensorflow as tf
  
nb_epochs = 600
batch_size = 50
code_length = 256 
width = 32
height = 32
  
graph = tf.Graph()

第 4 步：每批使用 50 个样本，我们现在将训练模型 600 个 epoch。图像大小为 64 × 64 = 4, 096，我们将得到 4, 096/256 = 16 倍的压缩比。您始终可以尝试不同的配置，以最大限度地提高收敛速度和最终精度。步骤#5：使用这些层对编码器进行建模。

-> 2D convolution with 16 (3 × 3) filters, (2 × 2) strides, ReLU activation, and the same padding.
-> 2D convolution with 32 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
-> 2D convolution with 64 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
-> 2D convolution with 128 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.

编程需要懂一点英语

步骤#6：解码器实现反卷积（一系列转置卷积）。

-> 2D transpose convolution with 128 (3 × 3) filters, (2 × 2) strides, ReLU activation, and the same padding.
-> 2D transpose convolution with 64 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
-> 2D transpose convolution with 32 (3 × 3) filters, (1 × 1) strides, ReLU activation, and the same padding.
-> 2D transpose convolution with 1 (3 × 3) filter, (1 × 1) strides, Sigmoid activation, and the same padding.

编程需要懂一点英语

损失函数基于重建图像与原始图像之间差异的 L2 范数。这里，Adam 是学习率 α =0.001 的优化器。现在，让我们看一下 TensorFlow DAG 的编码器部分：

import tensorflow as tf
  
with graph.as_default():
    input_images_xl = tf.placeholder(tf.float32, 
                                     shape=(None, X_train.shape[1],
                                     X_train.shape[2], 1))
  
    input_images = tf.image.resize_images(input_images_xl,
                                          (width, height),
                    method=tf.image.ResizeMethod.BICUBIC)
  
    # Encoder
    conv_0 = tf.layers.conv2d(inputs=input_images,
                              filters=16,
                              kernel_size=(3, 3),
                              strides=(2, 2),
                              activation=tf.nn.relu,
                              padding='same')
  
    conv_1 = tf.layers.conv2d(inputs=conv_0,
                              filters=32,
                              kernel_size=(3, 3),
                              activation=tf.nn.relu,
                              padding='same')
  
    conv_2 = tf.layers.conv2d(inputs=conv_1,
                              filters=64,
                              kernel_size=(3, 3),
                              activation=tf.nn.relu,
                              padding='same')
  
    conv_3 = tf.layers.conv2d(inputs=conv_2,
                              filters=128,
                              kernel_size=(3, 3),
                              activation=tf.nn.relu,
                              padding='same')

以下是 DAG 的编码部分：

import tensorflow as tf
  
with graph.as_default(): 
    
    # Code layer
    code_input = tf.layers.flatten(inputs=conv_3)
  
    code_layer = tf.layers.dense(inputs=code_input,
                                 units=code_length,
                                 activation=tf.nn.sigmoid)
  
    code_mean = tf.reduce_mean(code_layer, axis=1)

现在，让我们看一下 DAG 解码器：

import tensorflow as tf
  
with graph.as_default(): 
  
    # Decoder
    decoder_input = tf.reshape(code_layer,
                      (-1, int(width / 2),
                       int(height / 2), 1))
  
    convt_0 = tf.layers.conv2d_transpose(inputs=decoder_input,
                                         filters=128,
                                         kernel_size=(3, 3),
                                         strides=(2, 2),
                                         activation=tf.nn.relu,
                                         padding='same')
  
    convt_1 = tf.layers.conv2d_transpose(inputs=convt_0,
                                         filters=64,
                                         kernel_size=(3, 3),
                                         activation=tf.nn.relu,
                                         padding='same')
  
    convt_2 = tf.layers.conv2d_transpose(inputs=convt_1,
                                         filters=32,
                                         kernel_size=(3, 3),
                                         activation=tf.nn.relu,
                                         padding='same')
  
    convt_3 = tf.layers.conv2d_transpose(inputs=convt_2,
                                         filters=1,
                                         kernel_size=(3, 3),
                                         activation=tf.sigmoid,
                                         padding='same')
  
    output_images = tf.image.resize_images(convt_3, (X_train.shape[1],
                                                    X_train.shape[2]), 
                                  method=tf.image.ResizeMethod.BICUBIC)

第 7 步：以下是定义损失函数和 Adam 优化器的方法——

import tensorflow as tf
  
with graph.as_default():
    # Loss
    loss = tf.nn.l2_loss(convt_3 - input_images)
  
    # Training step
    training_step = tf.train.AdamOptimizer(0.001).minimize(loss)

步骤#8：现在我们已经定义了完整的 DAG，我们可以启动会话并初始化所有变量。

import tensorflow as tf
  
session = tf.InteractiveSession(graph=graph)
tf.global_variables_initializer().run()

第 9 步：我们可以在 TensorFlow 初始化后开始训练过程：

import numpy as np
  
for e in range(nb_epochs):
    np.random.shuffle(X_train)
  
    total_loss = 0.0
    code_means = []
  
    for i in range(0, X_train.shape[0] - batch_size, batch_size):
        X = np.expand_dims(X_train[i:i + batch_size, :, :],
                                axis=3).astype(np.float32)
  
        _, n_loss, c_mean = session.run([training_step, loss, code_mean],
                                        feed_dict={input_images_xl: X})
  
        total_loss += n_loss
        code_means.append(c_mean)
  
    print('Epoch {}) Average loss per sample: {} (Code mean: {})'.
          format(e + 1, total_loss / float(X_train.shape[0]),
          np.mean(code_means)))

输出：

Epoch 1) Average loss per sample: 11.933397521972656 (Code mean: 0.5420681238174438)
Epoch 2) Average loss per sample: 10.294102325439454 (Code mean: 0.4132006764411926)
Epoch 3) Average loss per sample: 9.917563934326171 (Code mean: 0.38105469942092896)
...
Epoch 600) Average loss per sample: 0.4635812330245972 (Code mean: 0.42368677258491516)

当训练过程达到高潮时，0.46（考虑 32 × 32 图像）是每个样本的平均损失，0.42 是代码的平均值。这证明编码相对密集，使平均值达到 0.5。我们的重点是在结果比较期间查看稀疏性。

一些示例图像导致自动编码器的以下输出：

当图像放大到 64 × 64 时，重建的质量会受到部分影响。但是，我们可以降低压缩比并增加代码长度以获得更好的结果。如何对自编码器去噪？
当自动编码器的应用依赖于从输入到输出的转换过程时，它会很有帮助。它不一定与自动编码器找到低维表示的能力有关。

让我们看一个例子，我们假设 X 是一个以零为中心的数据集和一个嘈杂的版本，其样本将具有如下结构：

在这里，自动编码器的重点是去除噪声项并带回原始样本 xi。如果我们从数学的角度来看，标准自动编码器和去噪自动编码器是一回事，但我们需要考虑考虑这些模型的容量需求。由于他们必须恢复原始样本，给定一个损坏的输入（其特征占据更大的样本空间），层的数量和维度可能大于标准自动编码器。

当然，考虑到复杂性，不经过几次测试就不可能有清晰的洞察力；因此，我强烈建议从较小的模型开始并增加容量，直到最优成本函数达到合适的值。您可以使用各种策略来添加噪声，例如破坏每个批次中包含的样本，使用噪声层作为编码器的输入 1，或使用丢失层作为编码器的输入 1。

最集体的选择之一是假设噪声是高斯的。如果是这样，我们可以创建同方差和异方差噪声。第一种情况的方差对于所有分量都保持不变（即，n(i) ? N(0, ?2I)），而第二种情况的分量有自己的方差。从问题的性质来看，我们可以选择另一种恰当的解决方案。但是，最好使用异方差噪声来提高整个系统的稳定性。

如何向自动编码器添加噪声——

我们将修改我们的深度卷积自动编码器，以便它可以管理嘈杂的输入样本。由于 DAG 几乎相等，我们需要包含原始和嘈杂的图像。

import tensorflow as tf
  
with graph.as_default():
    input_images_xl = tf.placeholder(tf.float32, 
                         shape=(None, X_train.shape[1],
                         X_train.shape[2], 1))
    input_noisy_images_xl = tf.placeholder(tf.float32, 
                             shape=(None, X_train.shape[1],
                             X_train.shape[2], 1))
  
    input_images = tf.image.resize_images(input_images_xl, 
                                          (width, height), 
                     method=tf.image.ResizeMethod.BICUBIC)
  
    input_noisy_images = tf.image.resize_images(input_noisy_images_xl,
                                                      (width, height), 
                                 method=tf.image.ResizeMethod.BICUBIC)
  
    # Encoder
    conv_0 = tf.layers.conv2d(inputs=input_noisy_images,
                              filters=16,
                              kernel_size=(3, 3),
                              strides=(2, 2),
                              activation=tf.nn.relu,
                              padding='same')

考虑到新图像，我们可以计算损失函数：

# Loss
loss = tf.nn.l2_loss(convt_3 - input_images)
  
# Training step
training_step = tf.train.AdamOptimizer(0.001).minimize(loss)

一旦变量的标准初始化完成，我们可以开始考虑加性噪声的训练过程，ni ? N(0, 0.45)（即 ? ? 0.2）：

import numpy as np
  
for e in range(nb_epochs):
    np.random.shuffle(X_train)
  
    total_loss = 0.0
    code_means = []
  
    for i in range(0, X_train.shape[0] - batch_size, batch_size):
        X = np.expand_dims(X_train[i:i + batch_size, :, :],
                                 axis=3).astype(np.float32)
  
        Xn = np.clip(X + np.random.normal(0.0, 0.2, 
                     size=(batch_size, X_train.shape[1],
                     X_train.shape[2], 1)), 0.0, 1.0)
  
        _, n_loss, c_mean = session.run([training_step, loss, code_mean],
                                       feed_dict={ input_images_xl: X,
                                             input_noisy_images_xl: Xn })
  
        total_loss += n_loss
        code_means.append(c_mean)
  
    print('Epoch {}) Average loss per sample: {} (Code mean: {})'.
                format(e + 1, total_loss / float(X_train.shape[0]),
                                             np.mean(code_means)))

现在我们已经训练了模型，我们使用一些嘈杂的样本来测试它。这是此操作的输出 -

我们已经成功地教自动编码器去噪输入图像，无论输入图像的质量如何。您可以继续尝试不同的数据集以实现最大的噪声方差。

参考：
动手使用Python进行无监督学习。