Multiple Linear Regression

Code for this tutorial is here: http://bit.ly/2IB45pp

Prerequisites:

Today we will continue our discussion of linear regression by extending the ideas from simple linear regression to multiple linear regression.

Recall that in simple linear regression, the input is 1-D. In multiple linear regression, the input is N-dimensional (any number of dimensions). The output is still just a scalar (1-D).

So now our input data looks like this:

(X1, Y1), (X2, Y2), …, (Xm, Ym)

Where X is a vector and Y is a scalar.

But now instead of our hypothesis, h(), looking like this:

h(X) = aX + b

It looks like this:

image

Where each subscripted x is a scalar.

beta0 is also known as the “bias term”.

Another, more compact way of writing this is:

image

Where beta and x are vectors. When we transpose the first vector this is also called a “dot product” or “inner product”.

In this representation, we introduce a dummy variable x0 = 1, so that beta and x both contain the same number of elements (n+1).

 

In the case where the dimensionality of the input data is 2, we can still visualize our model, which is no longer a line, but a “plane of best fit”.

plane

 

To solve for the beta vector, we do the same thing we did for simple linear regression: define an error function (we’ll use sum of squared error again), and take the derivative of J with respect to each parameter (beta0, beta1, …) and set them to 0 to solve for each beta.

image

This is a lot more tedious than in the 1-D case, but I would suggest as an exercise attempting at least the 2-D case.

As before, there is a “closed form” solution for beta:

image

Here, each (Xi, Yi) is a “sample” from the data.

Notice that in the first term we transpose the second Xi. This is an “outer product” and the result is an (n+1) x (n+1) vector.

The superscript -1 denotes a matrix inverse.

An even more compact form of this equation arises when we consider all the samples of X together in an m x (n+1) matrix, and all the samples of Y together in an m x 1 matrix:

image

As in the 1-D case, we use the R-square to measure how well the model fits the actual data (the formula is exactly the same).

Learn more about Linear Regression in the course Deep Learning Prerequisites: Linear Regression in Python.