Lazy Programmer

Your source for the latest in deep learning, big data, data science, and artificial intelligence. Sign up now

Linear Programming for Linear Regression

November 14, 2014

This is not a tutorial on linear programming (LP), but rather a tutorial on how one might apply linear programming to the problem of linear regression.

linear-regression-python-scikit-learn-1

Linear Programming Recap

Linear programming solves optimization problems whereby you have a linear combination of inputs x,

c(1)x(1) + c(2)x(2) + c(3)x(3) + … + c(D)x(D)

that you want to maximize or minimize, subject to constraints of the form:

a(1,1)x(1) + a(1,2)x(2) + … + a(1,D)x(D) <= b(1)
a(2,1)x(1) + a(2,2)x(2) + … + a(2,D)x(D) <= b(2)

where each A(i,j) is an entry of a matrix.

In compact form:

A<= b

So if x is a vector of length D, A is a matrix of size N * D, where N is the number of constraints.

Note that “greater than or equal to” (>=) and “equality” (==) and “non-equality” (!=) can be converted into “less than or equal to” (<=) constraints.

Most linear programming packages allow constraints of any form, however.

2 popular algorithms for solving LP problems (that you don’t need to understand in order to understand this tutorial) are the “simplex method” and the “interior point” method. Using an LP library will hide these details from us.

Setting up the linear regression objective function

Suppose you are given a set of points on or near some line:

S = {(0,1), (1,2), (2,3), (3,4), (10,12), …}

And you would like to find a line of the form:

ax + by + c = 0

That best fits the given set of points.

I already showed how to solve this problem by minimizing the mean squared error here.

What if we want to minimize the absolute (rather than squared) error?

Or in another sense, minimize the maximum absolute error, i.e.

min ( max |ax(i) + by(i) + c| )

For all x(i), y(i) in S.

Intuitively, you can think about it this way:

ax + by + c = 0 is the line.

If, given a point (x, y), we get:

ax + by + c > 0

Or

ax + by + c < 0

We are not on the line.

Thus, we want all of our (x, y) pairs to make |ax + by + c| as close to 0 as possible, thus, we want to minimize it.

The next question is, how do we turn this into “standard form” for an LP solver?:

maximize: image

subject to: image

The Solution

Create a variable, “z”, such that:

z = max |ax(i) + by(i) + c| for all x(i),y(i) in S

Then our constraints become:

|ax(i) + by(i) + c| <= z

Or:

ax(i) + by(i) + c <= z and ax(i) + by(i) + c>= -z

The objective function is just “z”.

Note that our variables in are NOT (x, y, z) but (a, b, c, z).

x and y simply represent the different data points.

The variables are the parameters of the line.

Practical considerations

Note that for a line there are really two parameters, not 3, since a line can be represented by:

y = mx + p

So if your best fit line is y = 2x, you might get strange answers like:

a = 2000, b = -1000, c = 0

a = 20000, b = -10000, c = 0

Because all of these are valid solutions that lead to the same line.

You can thus write your constraints as:

|mx – y + p| <= z

And solve for the variables (m, p, z).

Some example code can be found here: http://bit.ly/2IAM5iX

#linear programming #linear regression