# A Tutorial on Autoencoders for Deep Learning

December 31, 2015

Despite its somewhat initially-sounding cryptic name, autoencoders are a fairly basic machine learning model (and the name is not cryptic at all when you know what it does).

Autoencoders belong to the neural network family, but they are also closely related to PCA (principal components analysis).

• It is an unsupervised learning algorithm (like PCA)
• It minimizes the same objective function as PCA
• It is a neural network
• The neural network’s target output is its input

The last point is key here. This is the architecture of an autoencoder: So the dimensionality of the input is the same as the dimensionality of the output, and essentially what we want is x’ = x.

It can be shown that the objective function for PCA is:

$$J = \sum_{n=1}^{N} |x(n) – \hat{x}(n)|^2$$

Where the prediction $$\hat{x}(n) = Q^{-1}Qx(n)$$.

Q can be the full transformation matrix (which would result in getting exactly the old x back), or it can be a “rank k” matrix (i.e. keeping the k-most relevant eigenvectors), which would then result in only an approximation of x.

So the objective function can be written as:

$$J = \sum_{n=1}^{N} |x(n) – Q^{-1}Qx(n)|^2$$

Recall that to get the value at the hidden layer, we simply multiply the input->hidden weights by the input.

Like so:

$$z = f(Wx)$$

And to get the value at the output, we multiply the hidden->output weights by the hidden layer values, like so:

$$y = g(Vz)$$

The choice of $$f$$ and $$g$$ is up to us, we just have to know how to take the derivative for backpropagation.

We are of course free to make them “identity” functions, such that:

$$y = g(V f(Wx)) = VWx$$

This gives us the objective:

$$J = \sum_{n=1}^{N} |x(n) – VWx(n)|^2$$

Which is the same as PCA!

## If autoencoders are similar to PCA, why do we need autoencoders?

Autoencoders are much more flexible than PCA.

Recall that with neural networks we have an activation function – this can be a “ReLU” (aka. rectifier), “tanh” (hyperbolic tangent), or sigmoid.

This introduces nonlinearities in our encoding, whereas PCA can only represent linear transformations.

The network representation also means you can stack autoencoders to form a deep network.

## Cool theory bro, but what can autoencoders actually do for me?

Good question!

Similar to PCA – autoencoders can be used for finding a low-dimensional representation of your input data. Why is this useful?

Some of your features may be redundant or correlated, resulting in wasted processing time and overfitting in your model (too many parameters).

It is thus ideal to only include the features we need.

If your “reconstruction” of x is very accurate, that means your low-dimensional representation is good.

You can then use this transformation as input into another model.

## Training an autoencoder

Since autoencoders are really just neural networks where the target output is the input, you actually don’t need any new code.

Suppose we’re working with a sci-kit learn-like interface.

model.fit(X, Y)


You would just have:

model.fit(X, X)


Pretty simple, huh?

All the usual neural network training strategies work with autoencoders too:

• backpropagation
• regularization
• dropout

If you want to get good with autoencoders – I would recommend trying to take some data and an existing neural network package you’re comfortable with – and see what low-dimensional representation you can come up with. How many dimensions are there?

Autoencoders are part of a family of unsupervised deep learning methods, which I cover in-depth in my course, Unsupervised Deep Learning in Python. We discuss how to stack autoencoders to build deep belief networks, and compare them to RBMs which can be used for the same purpose. We derive all the equations and write all the code from scratch – no shortcuts. Ask me for a coupon so I can give you a discount!

P.S. “Autoencoders” means “encodes itself”. Not so cryptic now, right?