Lazy Programmer

Your source for the latest in deep learning, big data, data science, and artificial intelligence. Sign up now

This website is using cookies. That's Fine

Multiple Linear Regression

July 26, 2014

Code for this tutorial is here:


Today we will continue our discussion of linear regression by extending the ideas from simple linear regression to multiple linear regression.

Recall that in simple linear regression, the input is 1-D. In multiple linear regression, the input is N-dimensional (any number of dimensions). The output is still just a scalar (1-D).

So now our input data looks like this:

(X1, Y1), (X2, Y2), …, (Xm, Ym)

Where X is a vector and Y is a scalar.

But now instead of our hypothesis, h(), looking like this:

h(X) = aX + b

It looks like this:


Where each subscripted x is a scalar.

beta0 is also known as the “bias term”.

Another, more compact way of writing this is:


Where beta and x are vectors. When we transpose the first vector this is also called a “dot product” or “inner product”.

In this representation, we introduce a dummy variable x0 = 1, so that beta and x both contain the same number of elements (n+1).


In the case where the dimensionality of the input data is 2, we can still visualize our model, which is no longer a line, but a “plane of best fit”.



To solve for the beta vector, we do the same thing we did for simple linear regression: define an error function (we’ll use sum of squared error again), and take the derivative of J with respect to each parameter (beta0, beta1, …) and set them to 0 to solve for each beta.


This is a lot more tedious than in the 1-D case, but I would suggest as an exercise attempting at least the 2-D case.

As before, there is a “closed form” solution for beta:


Here, each (Xi, Yi) is a “sample” from the data.

Notice that in the first term we transpose the second Xi. This is an “outer product” and the result is an (n+1) x (n+1) vector.

The superscript -1 denotes a matrix inverse.

An even more compact form of this equation arises when we consider all the samples of X together in an m x (n+1) matrix, and all the samples of Y together in an m x 1 matrix:


As in the 1-D case, we use the R-square to measure how well the model fits the actual data (the formula is exactly the same).

Learn more about Linear Regression in the course Deep Learning Prerequisites: Linear Regression in Python.

#linear regression #machine learning #multiple linear regression #statistics