Monte Carlo with Importance Sampling for Reinforcement Learning

March 7, 2021

In this post, we’ll extend our toolset for Reinforcement Learning by considering the Monte Carlo method with importance sampling.

In my course, “Artificial Intelligence: Reinforcement Learning in Python“, you learn about the Monte Carlo method. But that’s just the beginning. There is still more that can be done to improve the agent’s learning capabilities.

Review of Monte Carlo for Reinforcement Learning

Let’s begin by reviewing the regular Monte Carlo method covered in my Reinforcement Learning course.

Your job in a reinforcement learning task is to program an agent (characterized by a policy) that interacts with an environment (characterized by state transition dynamics). A picture of this process (more precisely, this article discusses a Markov Decision Process) is shown below:

The agent reads in a state \( S_t \) and decides what action \( A_t \) to perform based on the state. This is called the policy and can be characterized by a probability distribution, \( \pi( A_t | S_t) \).

As the agent does this action, it changes the environment which results in the next state \( S_{t+1} \). A reward signal \( R_{t+1} \) is also given to the agent.

The goal of an agent is to maximize its sum of future rewards, called the return, \( G_t \). The discounted return is defined as:

$$ G_t = R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3} + … + \gamma^{T – t – 1} R_T $$

Since both the policy and environment transitions can be random, the return can also be random. Because of this, we can’t maximize “the” return (since there are many possible values the return can ultimately be), but only the expected return.

The expected return given that the agent is in state \( S_t \) and performs action \( A_t \) at time \( t \) is given by the Q-table. Specifically:

$$ Q_\pi(s, a) = E_\pi[ G_t | S_t = s, A_t = a] $$

The Q-table can be used to determine what the best action will be since we can just choose whichever action \( a \) maximizes \( Q(s,a) \).

The problem is, we do not know \( Q(s,a) \)! Furthermore, we cannot calculate it directly since the expected value requires summing over the transition distribution \( p(s’, r | s, a) \).

Generally speaking, this is unknown. e.g. Imagine building a self-driving car.

The Monte Carlo approach is to estimate the action-value using the sample average. i.e.

$$ Q_\pi(s, a) \approx \frac{1}{N}\sum_{i=1}^{N} G^{(i)}(s,a) $$

Where \( G^{(i)}(s,a) \) was the sample return when the agent was in state \( s \) and performed action \( a \) during the \( i \)’th episode.

Put simply: play a bunch of episodes, collect all the state-action-reward sequences, calculate the returns (by summing up the rewards), and then compute the average return for each state-action pair.


How can we ensure that we visit every state-action pair so that the whole Q-table is filled up with a sufficient number of samples?

Practically, we usually employ some kind of exploration strategy, such as epsilon-greedy. With epsilon-greedy, we perform the optimal action \( 1-\varepsilon \) of the time, and we pick a random action \( \varepsilon \) of the time. So \( \varepsilon \) is the probability of exploration.

The problem with this approach is that it leads to a suboptimal policy. Why? It means that approximately \( \varepsilon \) of the time, you are going to do something suboptimal!

(Note: it’s not exactly \( \varepsilon \) since choosing an action randomly can still lead to choosing the optimal action by chance.)

This is where we transition to the main topic of this article, which is how Importance Sampling can help us overcome this problem.

Monte Carlo with epsilon-greedy exploration is called an on-policy control method, because the action-value (Q-table) being estimated corresponds to the policy that the agent is following.

On the other hand, off-policy methods allow the agent to act according to one policy (called the behavior policy), while the action-value is computed for a different, eventually optimal policy (called the target policy).

Henceforth we will denote the target policy as \( \pi( a | s) \) and the behavior policy as \( b(a | s) \).

Importance Sampling

Suppose that we would like to estimate the expected value of some function \( s \) of a random variable \( X \) under some distribution \( \pi \). We write this as:

$$ E_\pi[ s(X) ] $$

If we know \( \pi \), then (assuming \( X \) is a discrete random variable) this can be computed as:

$$ E_\pi[ s(X) ] = \sum_x \pi(x)s(x) $$

If we don’t know \( \pi \), but \( X \) is sampled according to \( \pi \), then this expectation can be estimated by using the sample mean:

$$ E_\pi[ s(X) ] \approx \frac{1}{N}\sum_{i=1}^N s(X_i) $$

Now suppose that something prevents us from gathering samples according to the distribution \( \pi \), but it’s still possible to gather samples from a different distribution \( b \). i.e. We want \( X \sim \pi \) but we have \( X \sim b \) instead.

In this case, the above sample average does not give us the desired expectation. We would be estimating \( E_b[ s(X)] \) instead of \( E_\pi[ s(X) ] \).

The importance sampling solution is found by recognizing the following equalities:

&& E_\pi \left[ s(X) \right] \\
&=& \sum_x \pi(x)s(x) \\
&=& \sum_x \pi(x)s(x)\frac{b(x)}{b(x)} \\
&=& \sum_x b(x)s(x)\frac{\pi(x)}{b(x)} \\
&=& E_b \left[ s(X)\frac{\pi(X)}{b(X)} \right]

This tells us that it’s possible to estimate the expectation under \( \pi \) even when the samples are drawn according to \( b \). All we have to do is multiply by the importance sampling ratio, \( \frac{\pi(X)}{b(X)} \).

The only requirement is that \( b(x) \) is not 0 when \( \pi(x) \) is not 0. This is called “coverage”.

Applying Importance Sampling to Reinforcement Learning

In reinforcement learning, the return \( G_t \) is generated by acting according to the behavior policy \( b(a | s) \) with transition dynamics \( p(s’, r | s, a) \). But we would like to know the expectation of \( G_t \) under the target policy \( \pi(a | s) \) with the same transition dynamics.

In this case, the importance sampling ratio is a bit more complicated but can still be derived. \( G_t \) is a sample from the distribution of \( p(A_t, S_{t+1}, A_{t+1}, …, S_T | S_t, A_\tau \sim b \mspace{5mu} \forall \tau) \).

Basically this says: “the distribution of all the actions and states that happened after arriving in state \( S_t \), following the policy \( b \)”.

The distribution we want the expectation with respect to is the same thing, but with actions drawn according to \( \pi \). i.e. \( p(A_t, S_{t+1}, A_{t+1}, …, S_T | S_t, A_\tau \sim \pi \mspace{5mu} \forall \tau) \).

Thanks to the Markov property these distributions are easy to expand.

$$ p(A_t, S_{t+1}, A_{t+1}, …, S_T | S_t, A_\tau \sim b) = \prod_{\tau=t}^{T-1} b(A_\tau | S_\tau)p(S_{\tau+1} | S_\tau, A_\tau)$$

$$ p(A_t, S_{t+1}, A_{t+1}, …, S_T | S_t, A_\tau \sim \pi) = \prod_{\tau=t}^{T-1} \pi(A_\tau | S_\tau)p(S_{\tau+1} | S_\tau, A_\tau)$$

The importance sampling ratio is then just:

$$ \frac{p(A_t, S_{t+1}, A_{t+1}, …, S_T | S_t, A_\tau \sim \pi)}{p(A_t, S_{t+1}, A_{t+1}, …, S_T | S_t, A_\tau \sim b)} = \prod_{\tau=t}^{T-1} \frac{\pi(A_\tau | S_\tau)}{b(A_\tau | S_\tau)}$$

The transition dynamics cancel out because they are the same on both top and bottom.

This is convenient, because we know \( \pi \) and we know \( b \), but we do not know \( p \) (which is why we have to use Monte Carlo in the first place).

Let’s define this importance sampling ratio using the symbol \( \rho \).

$$ \rho_{t:T-1} \dot{=} \prod_{\tau=t}^{T-1} \frac{\pi(A_\tau | S_\tau)}{b(A_\tau | S_\tau)}$$

Using this importance sampling ratio, we can estimate \( Q_\pi(s,a) \) even though we are acting according to a different policy \( b \) and using the returns generated from this other policy.

$$ Q_\pi(s,a) \approx \frac{1}{N}\sum_{i=1}^N \rho^{(i)}(s,a) G^{(i)}(s,a) $$

Where \( G^{(i)}(s,a) \) was the sample return when the agent was in state \( s \) and performed action \( a \) during the \( i \)’th episode, and \( \rho^{(i)}(s,a) \) was the corresponding importance sampling ratio.

Importance Sampling for Monte Carlo Implementation

At this point, you know all the theory. All that you have to do now is plug in the above importance sampling ratio in the appropriate places in your existing Monte Carlo code, and you’ll be doing Monte Carlo with importance sampling.

Here are some important considerations.

Like the return \( G_t \), the importance sampling ratio is defined in terms of future values. i.e. the importance sampling ratio at time \( t \) depends on the probabilities of the behavior and target policies at time \( t+1, t+2, … \).

Therefore, it would make sense to loop backwards in time to compute this ratio, just like we loop backwards in time to compute the return.

Just like the return, the importance sampling ratio can be computed recursively.

Finally, you’ll recall that for the regular unweighted sample mean, it’s possible to perform constant-time updates every time we collect a new sample, instead of naively summing up all the samples and dividing by N.

$$ Q^{(i)}(s,a) \leftarrow Q^{(i-1)}(s,a) – \frac{1}{i}(G^{(i)}(s,a) – Q^{(i-1)}(s,a)) $$

Similarly, it’s possible to express the weighted sample mean update using a similar constant-time operation. i.e.

$$ Q^{(i)}(s,a) \leftarrow Q^{(i-1)}(s,a) – \alpha^{(i)}(G^{(i)}(s,a) – Q^{(i-1)}(s,a)) $$

As an exercise, try to derive what \( \alpha^{(i)} \) should be.

Last point: I haven’t discussed weighted importance sampling, which can be used to reduce the variance of the estimate. The weighted importance sampling estimate looks like this:

$$ Q_\pi(s,a) \approx \frac{\sum_{i=1}^N \rho^{(i)}(s,a) G^{(i)}(s,a)}{ \sum_{i=1}^N \rho^{(i)}(s,a) } $$


Let’s review why this is different from regular Monte Carlo.

Regular Monte Carlo (what I covered in my Reinforcement Learning course) is an on-policy control method.

We use epsilon-greedy because exploration is required to collect enough samples for all state-action pairs. Epsilon-greedy is suboptimal by definition. Our Q-table and final policy will thus be suboptimal.

What might be better is an off-policy control method, where we act according to a behavior policy which allows for exploration, but compute the Q-table according to the target greedy policy (the optimal policy).

#reinforcement learning

Go to comments

[VIP COURSE UPDATE] Artificial Intelligence: Reinforcement Learning in Python

March 18, 2020

Artificial Intelligence: Reinforcement Learning in Python

VIP Promotion

Hello all!

In this post, I am announcing the VIP coupon to my course titled “Artificial Intelligence: Reinforcement Learning in Python”.

There are 2 places to get the course.

  1. Udemy, with this VIP coupon: (expires May 25, 2022)
  2. Deep Learning Courses (coupon automatically applied):

You may recognize this course as one that has already existed in my catalog – however, the course I am announcing today contains ALL-NEW material. The entire course has been gutted and every lecture contained within the course did not exist in the original version.

One of the most common questions I get from students in my PyTorch, Tensorflow 2, and Financial Engineering courses is: “How can I learn reinforcement learning?”

While I do cover RL in those courses, it’s very brief. I’ve essentially summarized 12 hours of material into 2. So by necessity, you will be missing some things.

While that serves as a good way to scratch the surface of RL, it doesn’t give you a true, in-depth understanding that you will get by actually learning each component of RL step-by-step, and most importantly, getting a chance to put everything into code!

This course covers:

  • The explore-exploit dilemma and the Bayesian bandit method
  • MDPs (Markov Decision Processes)
  • Dynamic Programming solution for MDPs
  • Monte Carlo Method
  • Temporal Difference Method (including Q-Learning)
  • Approximation Methods using Radial Basis Functions
  • Applying your code to OpenAI Gym with zero effort / code changes
  • Building a stock trading bot (different approach in each course!)


When you get the version, note that you will get both versions (new and old) of the course – totalling nearly 20 hours of material.

If you want access to the tic-tac-toe project, this is the version you should get.

Otherwise, if you prefer to use Udemy, that’s fine too. If you purchase on Udemy but would like access to, I will allow this since they are the same price. Just send me an email and show me your proof of purchase.

Note that I’m not able to offer the reverse (can’t give you access to Udemy if you purchase on, due to operational reasons).

So what are you waiting for?


Go to comments

SPECIAL SALE 90% OFF: Avoid public spaces; study Deep Learning

March 3, 2020


Hello deep learning and AI enthusiasts!

As we all know, the near future is somewhat uncertain. With an invisible virus spreading around the world at an alarming rate, some experts have suggested that it may reach a significant portion of the population.

Schools may close, you may be ordered to work from home, or you may want to avoid going outside altogether. This is not fiction – it’s already happening.

There will be little warning, and as students of science and technology, we should know how rapidly things can change when we have exponential growth (just look at AI itself).

Have you decided how you will spend your time?

I find moments of quiet self-isolation to be excellent for learning advanced or difficult concepts – particularly those in machine learning and artificial intelligence.

To that end, I’ll be releasing several coupons today – hopefully that helps you out and you’re able to study along with me.

Modern Deep Learning in Python


Despite the fact that I just released a huge course on Tensorflow 2, this course is more relevant than ever. You might take a course that uses batch norm, adam optimization, dropout, batch gradient descent, etc. without any clue how they work. Perhaps, like me, you find doing “batch norm in 1 line of code” to be unsatisfactory. What’s really going on?

And yes, although it was originally designed for Tensorflow 1 and Theano, everything has been done in Tensorflow 2 as well (you’ll see what I mean).

Cutting-Edge AI: Deep Reinforcement Learning in Python

Learn about awesome algorithms such as A2C, DDPG, and Evolution Strategies (ES). This course continues where my first Deep Reinforcement Learning course left off and is the third course in my Reinforcement Learning series.

Support Vector Machines


A lot of people think SVMs are obsolete. Wrong! A lot of you students want a nice “plug-and-play” model that works well out of the box. Guess what one of the best models is for that? SVM!

Many of the concepts from SVMs are extremely useful today – like quadratic programming (used for portfolio optimization) and constrained optimization.

Constrained optimization appears in modern Reinforcement Learning, for you non-believers (see: TRPO, PPO).


GANs and Variational Autoencoders


Well, I don’t need to tell you how popular GANs are. They sparked a mini-revolution in deep learning with the ability to generate photo-realistic images, create music, and enhance low-resolution photos.

Variational autoencoders are a great (but often forgotten by those beginner courses) tool for understanding and generating data (much like GANs) from a principled, probabilistic viewpoint.

Ever seen those cool illustrations where they can change a picture of a person from smiling to frowning on a continuum? That’s VAEs in action!


Supervised Machine Learning in Python


This is one of my favorite courses. Every beginner ML course these days teaches you how to plug into scikit-learn.

This is trivial. Everyone can do this. Nobody will give you a job just because you can write 3 lines of code when there are 1000s of others lining up beside you who know just as much.

It’s so trivial I teach it for FREE.

That’s why, in this course (a real ML course), I teach you how to not just use, but implement each of the algorithms (the fundamental supervised models).

At the same time, I haven’t forgotten about the “practical” aspect of ML, so I also teach you how to build a web API to serve your trained model.

This is the eventual place where many of your machine learning models will end up. What? Did you think you would just write a script that prints your accuracy and then call it a day? Who’s going to use your model?

The answer is, you’re probably going to serve it (over a server, duh) using a web server framework, such as Django, Flask, Tornado, etc.

Never written your own backend web server application before? I’ll show you how.
Alright, that’s all from me. Stay safe out there folks!

Note: these coupons will last 31 days – don’t wait!

Go to comments

What is the difference between epsilon-greedy and epsilon-soft policies?

February 27, 2020

Learn more about Reinforcement Learning in my course (75% off more more when you use the following link):

Artificial Intelligence: Reinforcement Learning in Python

A common question I get in my Reinforcement Learning class is:

“What is the difference between epsilon-greedy and epsilon-soft policies?”

At first glance, it may seem that these are the same thing. At times, in Sutton & Barto, it seems these 2 terms are used interchangeably.

Here’s the difference.

An epsilon-soft (\( \varepsilon-soft \)) policy is any policy where the probability of all actions given a state \(s\) is greater than some minimum value, specifically:

$$ \pi(a | s) \ge \frac{\varepsilon}{| \mathcal{A}(s) |} , \forall a \in \mathcal{A}(s) $$

The epsilon-greedy (\( \varepsilon-greedy \)) policy is a specific instance of an epsilon-soft policy.

Specifically, the epsilon-greedy policy can be defined as epsilon-greedy with respect to the action-value \( Q(s,a) \).

Let \( a^* \) be the greedy action with respect to \( Q(s,a) \), so that:

$$ a^* = \arg\max_a Q(s,a) $$

Then the epsilon-greedy policy assigns the following probabilities to the actions in \( \mathcal{A}(s) \):

\pi(a | s) &=& 1 – \varepsilon + \frac{\varepsilon}{| \mathcal{A}(s) |}, &a =& a^* \\
\pi(a | s) &=& \frac{\varepsilon}{| \mathcal{A}(s) |}, &a \ne& a^*

This can be accomplished using the following code:

def epsilon_greedy(Q, s, epsilon):
  if np.random.random() < epsilon:
    return np.random.choice(action_space)
    return np.argmax(Q[s, :])

This assumes that Q is a Numpy-like array with 2 dimensions corresponding to the possible states and actions.

Based on this, we can see that all epsilon-greedy policies are epsilon-soft policies, but not all epsilon-soft policies are epsilon-greedy policies.

Learn more about Reinforcement Learning in my course (75% off more more when you use the following link):

Artificial Intelligence: Reinforcement Learning in Python

Go to comments

How to setup NVIDIA GPU laptop with Ubuntu for Deep Learning (CUDA and CuDNN)

January 5, 2020

See the corresponding YouTube video lecture here:

gpusetup-playbutton copy

In this article, I will teach you how to setup your NVIDIA GPU laptop (or desktop!) for deep learning with NVIDIA’s CUDA and CuDNN libraries.

The main thing to remember before we start is that these steps are always constantly in flux – things change and they change quickly in the field of deep learning. Therefore I remind you of my slogan: “Learn the principles, not the syntax“. We are not doing any coding here so there’s no “syntax” per se, but the general idea is to learn the principles at a high-level, don’t try to memorize details which may change on you and confuse you if you forget about what the principles are.

This article is more like a personal story rather than a strict tutorial. It’s meant to help you understand the many obstacles you may encounter along the way, and what practical strategies you can take to get around them.

There are about 10 different ways to install the things we need. Some will work; some won’t. That’s just how cutting-edge software is. If that makes you uncomfortable, well, stop being a baby. Yes, it’s going to be frustrating. No, I didn’t invent this stuff, it is not within my control. Learn the principles, not the syntax!

This article will be organized into the following sections:

  1. Why you need this guide
  2. Choosing your laptop (i.e. a laptop that has an NVIDIA GPU)
  3. Choosing your Operating System
  4. Installing CUDA and CuDNN on Ubuntu and similar Linux OSes (Debian, Pop!_OS, Xubuntu, Lubuntu, etc.)
  5. Installing CUDA and CuDNN on Windows
  6. Installing GPU-enabled Tensorflow
  7. Installing GPU-enabled PyTorch
  8. Installing GPU-enabled Keras
  9. Installing GPU-enabled Theano

Why you need this guide

If you’ve never setup your laptop for GPU-enabled deep learning before, then you might assume that there’s nothing you need to do beyond buying a laptop with a GPU. WRONG!

You need to have a specific kind of laptop with specific software and drivers installed. Everything must work together.

You can think of all the software on your computer as a “stack” of layers.


At the lowest layer, you have the kernel (very low-level software that interacts with the hardware) and at higher levels you have runtimes and libraries such as SQLite, SSL, etc.

When you write an application, you need to make use of lower-level runtimes and libraries – your code doesn’t just run all by itself.

So, when you install Tensorflow (as an example), that depends on lower-level libraries (such as CUDA and CuDNN) which interact with the GPU (hardware).

If any of the layers in your stack are missing (all the way from the hardware up to high-level libraries), your code will not work.

Low-Level = Hardware


High-Level = Libraries and Frameworks


Choosing your laptop

Not all GPUs are created equal. If you buy a MacBook Pro these days, you’ll get a Radeon Pro Vega GPU. If you buy a Dell laptop, it might come with an Intel UHD GPU.

These are no good for machine learning or deep learning.

You will need a laptop with an NVIDIA GPU.

Some laptops come with a “mobile” NVIDIA GPU, such as the GTX 950m. These are OK, but ideally you want a GPU that doesn’t end with “m”. As always, check performance benchmarks if you want the full story.

I would also recommend at least 4GB of RAM (otherwise, you won’t be able to use larger batch sizes, which will affect training).

In fact, some of the newer neural networks won’t even fit on the RAM to do prediction, never mind training!


One thing you have to consider is if you actually want to do deep learning on your laptop vs. just provisioning a GPU-enabled machine on a service such as AWS (Amazon Web Services).

These will cost you a few cents to a dollar per hour (depending on the machine type), so if you just have a one-off job to run, you may want to consider this option.

I already have a walkthrough tutorial in my course Modern Deep Learning in Python about that, so I assume if you are reading this article, you are rather interested in purchasing your own GPU-enabled computer and installing everything yourself.


Personally, I would recommend Lenovo laptops. The main reason is they always play nice with Linux (we’ll go over why that’s important in the next section). Lenovo is known for their high-quality and sturdy laptops and most professionals who use PCs for work use Thinkpads. They have a long history (decades) of serving the professional community so it’s nearly impossible to go wrong. Other brands generally have lots of issues (e.g. sound not working, WiFi not working, etc.) with Linux.

Here are some good laptops with NVIDIA GPUs:

Lenovo Ideapad L340 Gaming Laptop, 15.6 Inch FHD (1920 X 1080) IPS Display, Intel Core i5-9300H Processor, 8GB DDR4 RAM, 512GB Nvme SSD, NVIDIA GeForce GTX 1650, Windows 10, 81LK00HDUS, Black ($694.95)


This one only has an i5 processor and 8GB of RAM, but on the plus side it’s cost-effective. Note that the prices were taken when I wrote this article; they might change.


2019 Newest Lenovo Premium Gaming PC Laptop L340: 15.6″ FHD IPS Anti-Glare Display, 9th Gen Intel 6-core i7-9750H, 16GB Ram, 256GB SSD, NVIDIA GeForce GTX 1650, WiFi, USB-C, HDMI, Win 10 ($964.00)


Same as above but different specs. 16GB RAM with an i7 processor, but only 256GB of SSD space. Same GPU. So there are some tradeoffs to be made.

2019 Lenovo Legion Y540 15.6″ FHD Gaming Laptop Computer, 9th Gen Intel Hexa-Core i7-9750H Up to 4.5GHz, 24GB DDR4 RAM, 1TB HDD + 512GB PCIE SSD, GeForce GTX 1650 4GB, 802.11ac WiFi, Windows 10 Home ($998.00)


This is the best option in my opinion. Better or equal specs compared to the previous two. i7 processor, 24GB of RAM (32GB would be ideal!), lots of space (1TB HD + 512GB SSD), and the same GPU. Bonus: it’s nearly the same price as the above (currently).

Dell XPS 15 7590, 15.6″ 4K UHD Touch, 9th Gen Intel Core i7-6 Core 9750H, NVIDIA GeForce GTX 1650 4GB GDDR5, 16GB DDR4 RAM, 1TB SSD ($1,830.00)


Pricier, but great specs. Same GPU!

Lenovo ThinkPad P53 Mobile Workstation 20QN0018US – Intel Six Core i7-9850H, 16GB RAM, 512GB PCIe Nvme SSD, 15.6″ HDR 400 FHD IPS 500Nits Display, NVIDIA Quadro RTX 5000 16GB GDDR6, Windows 10 Pro ($3,472.69)


If you really want to splurge, consider one of these big boys. Thinkpads are classic professional laptops. These come with real beast GPUs – NVIDIA Quadro RTX 5000 with 16GB of VRAM.

You’ve still got the i7 processor, 16GB of RAM, and a 512GB NVMe SSD (basically a faster version of already-super-fast SSDs). Personally, I think if you’re going to splurge, you should opt for 32GB of RAM and a 1TB SSD.


If you’ve watched my videos, you might be wondering: what about a Mac? (I use a Mac for screen recording).

Macs are great in general for development, and they used to come with NVIDIA GPUs (although those GPUs are not as powerful as the ones currently available for PCs). Support for Mac has dropped off in the past few years, so you won’t be able to install say, the latest version of Tensorflow, CUDA, and CuDNN without a significant amount of effort (I spent probably a day and just gave up). And on top of that the GPU won’t even be that great. Overall, not recommended.

Choosing your Operating System

As I mentioned earlier, you probably want to be running Linux (Ubuntu is my favorite).

Why, you might ask?

“Tensorflow works on Windows, so what’s the problem?”

Remember my motto: “Learn the principles, not the syntax“.

What’s the principle here? Many of you probably haven’t been around long enough to know this, but the problem is, many machine learning and deep learning libraries didn’t work with Windows when they first came out.

So, unless you want to wait a year or more after new inventions and software are being made, then try to avoid Windows.

Don’t take my word for it, look at the examples:

  • Early on, even installing Numpy, Matplotlib, Pandas, etc. was very difficult on Windows. I’ve spent hours with clients on this. Nowadays you can just use Anaconda, but that’s not always been the case. At the time of this writing, things only started to shape up a few years ago.
  • Theano (the original GPU-enabled deep learning library) initially did not work on Windows for many years.
  • Tensorflow, Google’s deep learning library and the most popular today, initially did not work on Windows.
  • PyTorch, a deep learning library popular with the academic community, initially did not work on Windows.
  • OpenAI Gym, the most popular reinforcement learning library, only partially works on Windows. Some environments, such as MuJoCo and Atari, still have no support for Windows.

There are more examples, but these are the major historical “lessons” I point to for why I normally choose Linux over Windows.

One benefit of using Windows is that installing CUDA is very easy, and it’s very likely that your Windows OS (on your Lenovo laptop) will come with it pre-installed. The original use-case for GPUs was gaming, so it’s pretty user-friendly.

If you purchase one of the above laptops and you choose to stick with Windows, then you will not have to worry about installing CUDA – it’s already there. There is a nice user interface so whenever you need to update the CUDA drivers you can do so with just a few clicks.

Installing CuDNN is less trivial, but the instructions are pretty clear ( Simply download the zip file, unzip it, copy the files to the locations specified in the instructions, and set a few environment variables. Easy!



Aside from the Python libraries below (such as Tensorflow / PyTorch) you need to install 2 things from NVIDIA first:

  1. CUDA (already comes with Windows if you purchase one of the above laptops, Ubuntu instructions below)
  2. CuDNN (you have to install it yourself, following the instructions on NVIDIA’s website)



I always find it useful to have both Windows and Ubuntu on-hand, and if you get the laptop above that has 2 drives (1TB HD and 512GB SSD) dual-booting is a natural choice.

These days, dual booting is not too difficult. Usually, one starts with Windows. Then, you insert your Ubuntu installer (USB stick), and choose the option to install Ubuntu alongside the existing OS. There are many tutorials online you can follow.

Hint: Upon entering the BIOS, you may have to disable the Secure Boot / Fast Boot options.



I already have lectures on how to install Python with and without Anaconda. These days, Anaconda works well on Linux, Mac, and Windows, so I recommend it for easy management of your virtual environments.

Environment Setup for UNIX-Like systems (includes Ubuntu and MacOS) without Anaconda

Environment Setup for Windows and/or Anaconda


Installing CUDA and CuDNN on Ubuntu and similar Linux OSes (Debian, Pop!_OS, Xubuntu, Lubuntu, etc.)


Ok, now we get to the hard stuff. You have your laptop and your Ubuntu/Debian OS.

Can you just install Tensorflow and magically start making use of your super powerful GPU? NO!

Now you need to install the “low-level” software that Tensorflow/Theano/PyTorch/etc. make use of – which are CUDA and CuDNN.

This is where things get tricky, because there are many ways to install CUDA and CuDNN, and some of these ways don’t always work (from my experience).

Examples of how things can “randomly go wrong”:

  • I installed CUDA on Linux Mint. After this, I was unable to boot the machine and get into the OS.
  • Pop!_OS (System76) has their own versions of CUDA and CuDNN that you can install with simple apt commands. Didn’t work. Had to install them the “regular way”.
  • Updating CUDA and CuDNN sucks. You may find the nuclear option easier (installing the OS and drivers from scratch)

Here is a method that consistently works for me:

  1. Go to and choose the options appropriate for your system. (Linux / x86_64 (64-bit) / Ubuntu / etc.). Note that Pop!_OS is a derivative of Ubuntu, as is Linux Mint.
  2. You’ll download a .deb file. Do the usual “dpkg -i <filename>.deb” to run the installer. CUDA is installed!
  3. Next, you’ll want to install CuDNN. Instructions from NVIDIA are here:

Those instructions are subject to change, but basically you can just copy and paste what they give you (don’t copy the below, check the site to get the latest version):

sudo dpkg -i \
sudo apt-get update && sudo apt-get install libcudnn7 libcudnn7-dev



Installing CUDA and CuDNN on Windows

If you decided you hate reinforcement learning and you’re okay with not being able to use new software until it becomes mainstream, then you may have decided you want to stick with Windows.

Luckily, there’s still lots you can do in deep learning.

As mentioned previously, installing CUDA and CuDNN on Windows is easy.

If you did not get a laptop which has CUDA preinstalled, then you’ll have to install it yourself. Go to, choose the options appropriate for your system (Windows 10 / x86_64 (64-bit) / etc.)

This will give you a .exe file to download. Simply click on it and follow the onscreen prompts.

As mentioned earlier, installing CuDNN is a little more complicated, but not too troublesome. Just go to and follow NVIDIA’s instructions for where to put the files and what environment variables to set.


Installing GPU-enabled Tensorflow

Unlike the other libraries we’ll discuss, there are different packages to separate the CPU and GPU versions of Tensorflow.

The Tensorflow website will give you the exact command to run to install Tensorflow (it’s the same whether you are in Anaconda or not).

It will look like this:

Screen Shot 2019-06-27 at 2.58.15 PM

So you would install it using either:

pip install tensorflow
pip install tensorflow-gpu

Since this article is about GPU-enabled deep learning, you’ll want to install tensorflow-gpu.

UPDATE: Starting with version 2.1, installing “tensorflow” will automatically give you GPU capabilities, so there’s no need to install a GPU-specific version (although the syntax still works).

After installing Tensorflow, you can verify that it is using the GPU:


This will return True if Tensorflow is using the GPU.


Installing GPU-enabled PyTorch

Nothing special nowadays! Just do:

pip install torch

as usual.

To check whether PyTorch is using the GPU, you can use the following commands:

In [1]: import torch

In [2]: torch.cuda.current_device()
Out[2]: 0

In [3]: torch.cuda.device(0)
Out[3]: <torch.cuda.device at 0x7efce0b03be0>

In [4]: torch.cuda.device_count()
Out[4]: 1

In [5]: torch.cuda.get_device_name(0)
Out[5]: 'GeForce GTX 950M'

In [6]: torch.cuda.is_available()
Out[6]: True

Installing GPU-enabled Keras

Luckily, Keras is just a wrapper around other libraries such as Tensorflow and Theano. Therefore, there is nothing special you have to do, as long as you already have the GPU-enabled version of the base library.

Therefore, just install Keras as you normally would:

pip install keras

As long as Keras is using Tensorflow as a backend, you can use the same method as above to check whether or not the GPU is being used.

Installing GPU-enabled Theano

For both Ubuntu and Windows, as always I recommend using Anaconda. In this case, the command to install Theano with GPU support is simply:

conda install theano pygpu

If necessary, further details can be found at:



SIDE NOTE: Unfortunately, I will not provide technical support for your environment setup. You are welcome to schedule a 1-on-1 but availability is limited.

Disclaimer: this post contains Amazon affiliate links.

Go to comments

Tensorflow 2.0 is here! Get the VIP version now

August 14, 2019

Tensorflow 2.0 is here!


Old coupon no longer works. Use this one instead:

PLEASE NOTE: VIP material will be removed from Udemy on November 27. If you signed up for the VIP version (using the VIP coupon) and want access beyond that point, you must email me at info [at] lazyprogrammer [dot] me.

If you want the VIP (full) version of the course beyond that date, you now need to purchase the “main” part and the “VIP” part separately. The “main” part can be purchased on Udemy and the “VIP” part can be purchased from:



I am happy to announce my latest and most massive course yet – Tensorflow 2.0: Deep Learning and Artificial Intelligence.

Guys I am not joking – this really is my most massive course yet – check out the curriculum.

Many of you will be interested in the stock prediction example, because you’ve been tricked by marketers posing as data scientists in the past – I will demonstrate why their results are seriously flawed.

[if you don’t want to read my little spiel just click here to get your VIP coupon:]

This is technically Deep Learning in Python part 12, but importantly this need not be the 12th deep learning course of mine that you take!

There are quite few important points to cover in this announcement, so let me outline what I will discuss:

A) What’s covered in this course
B) Why there are almost zero prerequisites for this course
C) The VIP content and near-term additions
D) The story behind this course (if you’ve been following my courses for some time you will be interested in this)

What’s covered in this course

As mentioned – this course is massive. It’s going to take you from basic linear models (the neuron) to ANNs, CNNs, and RNNs.

Thanks to the new standardized Tensorflow 2.0 API – we can move quickly.

The theme of this course is breadth, not depth. If you’re looking for heavy theory (e.g. backpropagation), well, I already have courses for those. So there’s no point in repeating that.

We will however go pretty in-depth to ensure that convolution (the main component of CNNs) and recurrent units (the main component of RNNs) are explained intuitively and from multiple perspectives.

These will include explanations and intuitions you have likely not seen before in my courses, so even if you’ve taken my CNN and RNN courses before, you will still want to see this.

There are many applications in this course. Here are a few:

– we will prove Moore’s Law using a neuron
– image classification with modern CNN design and data augmentation
– time series analysis and forecasting with RNNs

Anyone who is interested in stock prediction should check out the RNN section. Most RNN resources out there only look at NLP (natural language processing), including my old RNN course, but very few look at time series and forecasting.

And out of the ones that do, many do forecasting totally wrong!

There is one stock forecasting example I see everywhere, but its methodology is flawed. I will demonstrate why it’s flawed, and why stock prediction is not as simple as you have been led to believe.

There’s also a ton of Tensorflow-specific content, such as:

– Tensorflow serving (i.e. how to build a web service API from a Tensorflow model)
– Distributed training for faster training times (what Tensorflow calls “distribution strategies”)
– Low-level Tensorflow – this has changed completely from Tensorflow 1.x
– How to build your own models using the new Tensorflow 2.0 API
– Tensorflow Lite (how to export your models for mobile devices – iOS and Android) (coming soon)
– Tensorflow.js (how to export your models for the browser) (coming soon)

Why there are almost zero prerequisites for this course

Due to the new standardized Tensorflow 2.0 API, writing neural networks is easier than ever before.

This means that we’ll be able to blast through each section with very little theory (no backpropagation).

All you will need is a basic understanding of Python, Numpy, and Machine Learning, which are all taught in my free Numpy course.

As I always say, it’s free, so you have no excuses!

Tensorflow 2.0 however, does not invalidate or replace my other courses. If you haven’t taken them yet, you should take this course first for breadth, and then take the other courses which focus on individual models (CNNs, RNNs) for depth.

The VIP content and near-term additions

I had so much content in mind for this course, but I wanted to get this into your hands as soon as possible. With Tensorflow 2.0 due to be released any day now, I wanted to give you all a head start.

This field is moving so fast things were changing while I was making the course. Insane!

I’ll be adding more content in the coming weeks, possibly including but not limited to:

– Transfer Learning
– Natural Language Processing
– GANs
– Recommender Systems
– Reinforcement Learning

For this release, only the VIP version will be available for some time. That is why you do not see the usual Udemy discount.

You may be wondering: Which parts of the content are VIP content, and which are not?

This time, I wanted to do something interesting: it’s a surprise!

The VIP content will be added to a special section called the “VIP Section”, and this will be removed once the course becomes “Non-VIP”.

I will make an announcement well before that happens, so you will have the chance to download the VIP content before then, as well as get access to the VIP content permanently from

The story behind this course

Originally, this course was going to be an RNN course only (hence why the RNN sections have so much more content – both time series and NLP).

The reason for this was, my original RNN course was tied to Theano and building RNNs from scratch.

In Tensorflow, building RNNs is completely different. This is unlike ANNs and CNNs which are relatively similar.

Thus, I could never reconcile the differences between the Theano approach and the Tensorflow approach in my original RNN course. So, I decided that simply making a new course for RNNs in Tensorflow would be best.

But lo and behold – Tensorflow was evolving so fast that a new version was about to be released – so I thought, it’s probably best to just cover everything in Tensorflow 2.0!

And that is how this current course came to be.

I hope you enjoy this action-packed course.

I’ll see you in class!

Get the course now
Go to comments

[June 2019] AI / Machine Learning HUGE Summer Sale! $9.99

June 10, 2019

AI / Machine Learning Summer Sale

For the next week, all my Deep Learning and AI courses are available for just $9.99! (In addition to other courses on the site for the next few days)

For those of you who have been around for some time, you know that this sale doesn’t come around very often – just a few times per year. If you’ve been on the fence about getting a course, NOW is the time to do so. Get it now – save it for later.

For my courses, please use the coupons below (included in the links), or if you want, enter the coupon code: JUN2019.

As usual, if you want to know what order to take my courses in, check out the lecture “What order should I take your courses in?” in the Appendix of any of my courses (including the free Numpy course).

For prerequisite courses (math, stats, Python programming) and all other courses, follow the links at the bottom for sales of up to 90% off!

Since ALL courses on Udemy on sale, if you want any course not listed here, just click the general (site-wide) link, and search for courses from that page.



And just as important, $9.99 coupons for some helpful prerequisite courses. You NEED to know this stuff to understand machine learning in-depth:

General (site-wide):
Calc 1
Calc 2
Calc 3
Linalg 1
Linalg 2
Probability (option 1)
Probability (option 2)
Probability (option 3)



As you know, I’m the “Lazy Programmer”, not just the “Lazy Data Scientist” – I love all kinds of programming!


iOS courses:

Android courses:

Ruby on Rails courses:

Python courses:

Big Data (Spark + Hadoop) courses:

Javascript, ReactJS, AngularJS courses:



Into Yoga in your spare time? Photography? Painting? There are courses, and I’ve got coupons! If you find a course on Udemy that you’d like a coupon for, just let me know and I’ll hook you up!

Go to comments

New Course! Cutting-Edge AI: Deep Reinforcement Learning in Python

May 9, 2019

Quite a few of you have been asking when I’d do another Reinforcement Learning course… well, how about today? 😉

[if you don’t want to read my little spiel just click here to get your VIP coupon:]

This is technically Deep Learning in Python part 11, and my 3rd reinforcement learning course, which is super awesome.

The maturation of deep learning has propelled advances in reinforcement learning, which has been around since the 1980s, although some aspects of it, such as the Bellman equation, have been for much longer.

Recently, these advances have allowed us to showcase just how powerful reinforcement learning can be.

We’ve seen how AlphaZero can master the game of Go using only self-play.

This is just a few years after the original AlphaGo already beat a world champion in Go.

We’ve seen real-world robots learn how to walk, and even recover after being kicked over, despite only being trained using simulation.

Simulation is nice because it doesn’t require actual hardware, which is expensive. If your agent falls down, no real damage is done.

We’ve seen real-world robots learn hand dexterity, which is no small feat.

Walking is one thing, but that involves coarse movements. Hand dexterity is complex – you have many degrees of freedom and many of the forces involved are extremely subtle.

Last but not least – video games.

Even just considering the past few months, we’ve seen some amazing developments. AIs are now beating professional players in CS:GO and Dota 2.

So what makes this course different from the first two?

Now that we know deep learning works with reinforcement learning, the question becomes: how do we improve these algorithms?

This course is going to show you a few different ways: including the powerful A2C (Advantage Actor-Critic) algorithm, the DDPG (Deep Deterministic Policy Gradient) algorithm, and evolution strategies.

Evolution strategies is a new and fresh take on reinforcement learning, that kind of throws away all the old theory in favor of a more “black box” approach, inspired by biological evolution.

What’s also great about this new course is the variety of environments we get to look at.

First, we’re going to look at the classic Atari environments. These are important because they show that reinforcement learning agents can learn based on images alone.

Second, we’re going to look at MuJoCo, which is a physics simulator. This is the first step to building a robot that can navigate the real-world and understand physics – we first have to show it can work with simulated physics.

Finally, we’re going to look at Flappy Bird, everyone’s favorite mobile game just a few years ago.

What do you get if you sign up for the VIP version of this course? A brand new exclusive section covering an entirely new algorithm: TD3! As usual, both theory and code for this powerful state-of-the-art algorithm are provided.

I’ll see you in class!

P.S. As usual, if you primarily use another site (e.g. Udemy) you will automatically get free access (upon request) if you’ve already purchased the VIP version of the course from

Get the course now
Go to comments

Udemy St. Patrick’s Day Sale 🍀

March 13, 2019

Do beer and AI go together?

For the next week, all my Deep Learning and AI courses are available for just $11.99! ($1.00 less than the current sale, woohoo!)

For my courses, please use the coupons below (included in the links), or if you want, enter the coupon code: MAR2019.

For prerequisite courses (math, stats, Python programming) and all other courses, follow the links at the bottom for sales of up to 90% off!

Since ALL courses on Udemy on sale, if you want any course not listed here, just click the general (site-wide) link, and search for courses from that page.



And just as important, $11.99 coupons for some helpful prerequisite courses. You NEED to know this stuff to understand machine learning in-depth:

General (site-wide):
Calc 1
Calc 2
Calc 3
Linalg 1
Linalg 2
Probability (option 1)
Probability (option 2)
Probability (option 3)



As you know, I’m the “Lazy Programmer”, not just the “Lazy Data Scientist” – I love all kinds of programming!


iOS courses:

Android courses:

Ruby on Rails courses:

Python courses:

Big Data (Spark + Hadoop) courses:

Javascript, ReactJS, AngularJS courses:



Into Yoga in your spare time? Photography? Painting? There are courses, and I’ve got coupons! If you find a course on Udemy that you’d like a coupon for, just let me know and I’ll hook you up!

Go to comments

How to Meet Your New Years Resolutions in 2019 (Udemy Coupons $9.99)

January 1, 2019

Deep Learning and AI Courses for just $9.99

New Years 2019

How to meet your New Years resolutions in 2019

Firstly, I’d like to wish everyone on this list a happy new year, we are off to a great start. The new year is a time to set goals, turn things around, and be better than we were before.

What better way than to learn from thousands of experts around the world who are the best at what they do? Luckily, I’ve got something that will make it just a little easier.

I know a lot of you have been waiting for this – well here it is – the LOWEST price possible on ALL Udemy courses (yes, the whole site!)

For the next 10 days, ALL courses on Udemy (not just mine) are available for just $9.99!

For my courses, please use the Udemy coupons below (included in the links below), or if you want, enter the coupon code: JAN2019.

For prerequisite courses (math, stats, Python programming) and all other courses (Bitcoin, meditation, yoga, guitar, photography, whatever else you want to learn), follow the links at the bottom (or go to my website).

Since ALL courses on Udemy are on sale, if you want any course not listed here, just click the general (site-wide) link, and search for courses from that page.



And just as important, $9.99 coupons for some helpful prerequisite courses. You NEED to know this stuff to understand machine learning in-depth:

General (site-wide):
Calc 1
Calc 2
Calc 3
Linalg 1
Linalg 2
Probability (option 1)
Probability (option 2)
Probability (option 3)



As you know, I’m the “Lazy Programmer”, not just the “Lazy Data Scientist” – I love all kinds of programming!


iOS courses:

Android courses:

Ruby on Rails courses:

Python courses:

Big Data (Spark + Hadoop) courses:

Javascript, ReactJS, AngularJS courses:



Into Yoga in your spare time? Photography? Painting? There are courses, and I’ve got coupons! If you find a course on Udemy that you’d like a coupon for, just let me know and I’ll hook you up!

Go to comments

Deep Learning and Artificial Intelligence Newsletter

Get discount coupons, free machine learning material, and new course announcements