Lazy Programmer

Your source for the latest in deep learning, big data, data science, and artificial intelligence. Sign up now

Multiple Linear Regression

July 26, 2014

Code for this tutorial is here: http://bit.ly/2IB45pp

Prerequisites:

Today we will continue our discussion of linear regression by extending the ideas from simple linear regression to multiple linear regression.

Recall that in simple linear regression, the input is 1-D. In multiple linear regression, the input is N-dimensional (any number of dimensions). The output is still just a scalar (1-D).

So now our input data looks like this:

(X1, Y1), (X2, Y2), …, (Xm, Ym)

Where X is a vector and Y is a scalar.

But now instead of our hypothesis, h(), looking like this:

h(X) = aX + b

It looks like this:

image

Where each subscripted x is a scalar.

beta0 is also known as the “bias term”.

Another, more compact way of writing this is:

image

Where beta and x are vectors. When we transpose the first vector this is also called a “dot product” or “inner product”.

In this representation, we introduce a dummy variable x0 = 1, so that beta and x both contain the same number of elements (n+1).

 

In the case where the dimensionality of the input data is 2, we can still visualize our model, which is no longer a line, but a “plane of best fit”.

plane

 

To solve for the beta vector, we do the same thing we did for simple linear regression: define an error function (we’ll use sum of squared error again), and take the derivative of J with respect to each parameter (beta0, beta1, …) and set them to 0 to solve for each beta.

image

This is a lot more tedious than in the 1-D case, but I would suggest as an exercise attempting at least the 2-D case.

As before, there is a “closed form” solution for beta:

image

Here, each (Xi, Yi) is a “sample” from the data.

Notice that in the first term we transpose the second Xi. This is an “outer product” and the result is an (n+1) x (n+1) vector.

The superscript -1 denotes a matrix inverse.

An even more compact form of this equation arises when we consider all the samples of X together in an m x (n+1) matrix, and all the samples of Y together in an m x 1 matrix:

image

As in the 1-D case, we use the R-square to measure how well the model fits the actual data (the formula is exactly the same).

Learn more about Linear Regression in the course Deep Learning Prerequisites: Linear Regression in Python.

#linear regression #machine learning #multiple linear regression #statistics