Lazy Programmer

Your source for the latest in deep learning, big data, data science, and artificial intelligence. Sign up now

New course! Reinforcement Learning in Python

January 27, 2017


I would like to announce my latest course – Artificial Intelligence: Reinforcement Learning in Python.

This has been one of my most requested topics since I started covering deep learning. This course has been brewing in the background for months.

The result: This is my most MASSIVE course yet.

Usually, my courses will introduce you to a handful of new algorithms (which is a lot for people to handle already). This course covers SEVENTEEN (17!) new algorithms.

This will keep you busy for a LONG time.

If you’re used to supervised and unsupervised machine learning, realize this: Reinforcement Learning is a whole new ball game.

There are so many new concepts to learn, and so much depth. It’s COMPLETELY different from anything you’ve seen before.

That’s why we build everything slowly, from the ground up.

There’s tons of new theory, but as you’ve come to expect, anytime we introduce new theory it is accompanied by full code examples.

What is Reinforcement Learning? It’s the technology behind self-driving cars, AlphaGo, video game-playing programs, and more.

You’ll learn that while deep learning has been very useful for tasks like driving and playing Go, it’s in fact just a small part of the picture.

Reinforcement Learning provides the framework that allows deep learning to be useful.

Without reinforcement learning, all we have is a basic (albeit very accurate) labeling machine.

With Reinforcement Learning, you have intelligence.

Reinforcement Learning has even been used to model processes in psychology and neuroscience. It’s truly the closest thing we have to “machine intelligence” and “general AI”.

What are you waiting for? Sign up now!!


#artificial intelligence #deep learning #reinforcement learning

Go to comments

New course! Bayesian Machine Learning in Python: A/B Testing

November 17, 2016


[If you already know you want to sign up for my Bayesian machine learning course, just scroll to the bottom to get your $10 coupon!]

Boy, do I have some exciting news today!

You guys have already been keeping up with my deep learning series.

Hopefully, you’ve noticed that I’ve been releasing non-deep learning machine learning courses as well, in parallel (and they often tie into the deep learning series quite nicely).

Well today, I am announcing the start of a BRAND NEW series on Bayesian machine learning.

Bayesian methods require an entirely new way of thinking – a paradigm shift.

But don’t worry, it’s not just all theory.

In fact, the first course I’m releasing in the series is VERY practical – it’s on A/B testing.

Every online advertiser, e-commerce store, marketing team, etc etc etc. does A/B testing.

But did you know that traditional A/B testing is both horribly confusing and inefficient?

Did you know that there are cool, new adaptive methods inspired by reinforcement learning that improve on those old crusty tests?

(Those old methods, and the way they are traditionally taught, are probably the reason you cringe when you hear the word “statistics”)

Well, Bayesian methods not only represent a state-of-the-art solution to many A/B testing challenges, they are also surprisingly theoretically simpler!

You’ll end the course by doing your own simulation – comparing and contrasting the various adaptive A/B testing algorithms (including the final Bayesian method).

This is VERY practical stuff and any digital media, newsfeed, or advertising startup will be EXTREMELY IMPRESSED if you know this stuff.

This WILL advance your career, and any company would be lucky to have someone that knows this stuff on their team.

Awesome coincidence #1: As I mentioned above, a lot of these techniques cross-over with reinforcement learning, so if you are itching for a preview of my upcoming deep reinforcement learning course, this will be very interesting for you.

Awesome coincidence #2: Bayesian learning also crosses over with deep learning, one example being the variational autoencoder, which I may incorporate into a more advanced deep learning course in the future. They heavily rely on concepts from both Bayesian learning AND deep learning, and are very powerful state-of-the-art algorithms.

Due to all the black Friday madness going on, I am going to do a ONE-TIME ONLY $10 special for this course. With my coupons, the price will remain at $10, even if Udemy’s site-wide sale price goes up (which it will).

See you in class!

As promised, here is the coupon:

UPDATE: The Black Friday sale is over, but the early bird coupon is still up for grabs:

LAST THING: Udemy is currently having an awesome Black Friday sale. $10 for ANY course starting Nov 15, but the price goes up by $1 every 2 days, so you need to ACT FAST.

I was going to tell you earlier but I was hard at work on my course. =)

Just click this link to get ANY course on Udemy for $10 (+$1 every 2 days):

#bayesian #data science #machine learning #statistics

Go to comments

Linear Regression

July 25, 2014

Code for this tutorial is here:


Prerequisites for understanding this material:

  • calculus (taking partial derivatives)

Linear regression is one of the simplest machine learning techniques you can use. It is often useful as a baseline relative to more powerful techniques.

To start, we will look at a simple 1-D case.

Like all regressions, we wish to map some input X to some input Y.


Y = f(X)

With linear regression:

Y = aX + b

Or we can say:

h(X) = aX + b

Where “h” is our “hypothesis”.

You may recall from your high school studies that this is just the equation for a straight line.


When X is 1-D, or when “Y has one explanatory variable”, we call this “simple linear regression”.

When we use linear regression, we are using it to model linear relationships, or what we think may be linear relationships.

As with all supervised machine learning problems, we are given labeled data points:

(X1, Y1), (X2, Y2), (X3, Y3), …, (Xn, Yn)

And we will try to fit the line (aX + b) as best we can to these data points.


This means we have to optimize the parameters “a” and “b”.

How do we do this?

We will define an error function and then find the “a” and “b” that will make the error as small as possible.

You will see that many regression problems work this way.

What is our error function?

We could use the difference between the predicted Y and the actual Y like so:


But if we had equal amounts of errors where Y was bigger than the prediction, and where Y was smaller than the prediction, then the errors would cancel out, even though the absolute difference in errors is large.

Typically in machine learning, the squared error is a good place to start.


Now, whether or not the difference in the actual and predicted output is positive or negative, its contribution to the total error is still positive.

We call this sum the “sum of squared errors”.

Recall that we want to minimize it.

Recall from calculus that to minimize something, you want to take its derivative.


Because there are two parameters, we have to take the derivatives both with respect to a and with respect to b, set them to 0, and solve for a and b.

Luckily, because the error function is a quadratic it increases as (a,b) get further and further away from the minimum.

As an exercise I will let you calculate the derivatives.

You will get 2 equations (the derivatives) and 2 unknowns (a, b). From high school math you should know how to solve this by rearranging the terms.

Note that these equations can be solved analytically. Meaning you can just plug and chug the values of your inputs and get the final value of a and b by blindly using a formula.

Note that this method is also called “ordinary least squares”.

Measuring the error (R-squared)

To determine how well our model fits the data, we need a measure called the “R-square”.

Note that in classification problems, we can simply use the “classification rate”, which is the number of correctly classified inputs divided by the total number of inputs. With the real-valued outputs we have in regression, this is not possible.

Here are the equations we use to predict the R-square.


SS(residual) is the sum of squared error between the actual and predicted output. This is the same as the error we were trying to minimize before!

SS(total) is the sum of squared error between each sample output and the mean of all the sample outputs, i.e. What the residual error would be if we just predicted the average output every time.

So the R-square then, is just how much better our model is compared to predicting the mean each time. If we just predicted the mean each time, the R-square would be 1-1=0. If our model is perfect, then the R-square would be 1-0=1.

Something to think about: If our model performs worse than predicting the mean each time, what would be the R-square value?

Limitations of Linear Regression

  • It only models linear equations. You can model higher order polynomials (link to later post) but the model is still linear in its parameters.
  • It is sensitive to outliers. Meaning if we have one data point very far away from all the others, it could “pull” the regression line in its direction, away from all the other data points, just to minimize the error.
#calculus #linear regression #machine learning #statistics

Go to comments