Autoencoders

Pratibha
Dec 3, 2020
4 min read

Updated: Feb 17, 2021

What is an Autoencoder?

An autoencoder is a type of artificial neural network that applies backpropagation, setting the target values to be equal to the inputs in an unsupervised manner. Autoencoders compress the input into a lower-dimensional code and then reconstructs the output from this representation. The code is a compact “summary” or “compression” of the input, which is also known as the latent-space representation. The input in this kind of neural network is unlabelled, meaning the network is capable of learning without supervision. If one needs to retrieve the original data, they can reconstruct it from the compressed data.

The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise”. Along with the reduction side, a reconstructing side is learnt, where the autoencoder tries to generate from the reduced encoding a representation as close as possible to its original input, hence its name.

Talking of dimensionality reduction, the first thing that may cross your mind is Principal Component Analysis (PCA), which is another machine learning algorithm that performs the same task. So then why do we need Autoencoders? Let’s discuss that in brief.

Autoencoders are better than PCA because:

An autoencoder can learn non-linear transformations with a non-linear activation function and multiple layers.
It doesn’t have to learn dense layers. It can use convolutional layers to learn which is better for video, image and series data.
It is more efficient to learn several layers with an autoencoder rather than learn one huge transformation with PCA.
An autoencoder provides a representation of each layer as the output.
It can make use of pre-trained layers from another model to apply transfer learning to enhance the encoder/decoder.

Architecture of Autoencoders

An autoencoder consists of 3 components:

encoder,
code and
decoder.

Encoder: This part of the network compresses the input into a latent space representation. The encoder layer encodes the input image as a compressed representation in a reduced dimension. The compressed image is the distorted version of the original image.

Code: This part of the network represents the compressed input which is fed to the decoder. The code is also known as Bottleneck. This is a well-designed approach to decide which aspects of observed data is relevant information and what aspects can be discarded. It does this by balancing two criteria:

Compactness of representation, measured as the compressibility.
It retains some behaviourally relevant variables from the input.

Decoder: This layer decodes the encoded image back to the original dimension. The decoded image is a lossy reconstruction of the original image and it is reconstructed from the latent space representation. The decoder architecture is the mirror image of an encoder.

Properties and Hyperparameters

Properties of Autoencoders:

Data-specific: Autoencoders are only able to compress data similar to what they have been trained on. Therefore, we can’t expect an autoencoder trained on handwritten digits to compress landscape photos.
Lossy: The decompressed outputs will be degraded compared to the original inputs. We won't be getting the exact inputs as the output, there will be some disortion added during the reconstruction phase.
Learned automatically from examples: It is easy to train specialized instances of the algorithm that will perform well on a specific type of input.

Hyperparameters of Autoencoders:

There are 4 hyperparameters that we need to set before training an autoencoder:

Code size: It represents the number of nodes in the middle layer. Smaller size results in more compression.
Number of layers: The autoencoder can consist of as many layers as we want.
Number of nodes per layer: The number of nodes per layer decreases with each subsequent layer of the encoder, and increases back in the decoder. The decoder is symmetric to the encoder in terms of the layer structure.
Loss function: We either use mean squared error or binary cross-entropy. If the input values are in the range [0, 1] then we typically use cross-entropy, otherwise, we use the mean squared error.

While building an autoencoder, the aim is to make sure that the autoencoder does not memorize all the information i.e. it should not simply copy and paste the input as the output. In order to do so, constraints should be added to the network to prioritize which information should be kept and which information should be discarded. This constraint is introduced in the following ways:

1. Reducing the number of units or nodes in the layers.

2. Adding some noise to the input images.

3. Adding some regularization.

Summarising the working of an autoencoder:

We will now describe the general working of an auto encoder.

A typical autoencoder is defined with an input, an internal representation and an output (an approximation of the input). The learning occurs in the layers attached to the internal representation. In fact, there are two main blocks of layers which looks like a traditional neural network. The slight difference is the layer containing the output must be equal to the input. In the picture below, the original input goes into the first block called the encoder. This internal representation compresses (reduces) the size of the input. In the second block occurs the reconstruction of the input. This is the decoding phase.

The model will update the weights by minimizing the loss function. The model is penalized if the reconstruction output is different from the input.

Types of Autoencoders

Convolution Autoencoders :

Autoencoders in their traditional formulation does not take into account the fact that a signal can be seen as a sum of other signals. Convolutional Autoencoders use the convolution operator to exploit this observation. They learn to encode the input in a set of simple signals and then try to reconstruct the input from them, modify the geometry or the reflectance of the image.

The convolutional autoencoder uses convolutional, relu and pooling layers in the encoder. In the decoder, the pooling layer is replaced by the upsampling layer for increasing the dimensions of the feature maps.

Used in the following:

Image Reconstruction
Image Colorization
latent space clustering
generating higher resolution images

Sparse Autoencoders (SAE):

Sparse autoencoders offers an alternative method for introducing an information bottleneck without requiring a reduction in the number of nodes at our hidden layers. Instead, the loss function is created in such a way that we penalize activations within a layer.