What is the difference between epsilon-greedy and epsilon-soft policies?

February 27, 2020

Learn more about Reinforcement Learning in my course (75% off more more when you use the following link):

Artificial Intelligence: Reinforcement Learning in Python

A common question I get in my Reinforcement Learning class is:

“What is the difference between epsilon-greedy and epsilon-soft policies?”

At first glance, it may seem that these are the same thing. At times, in Sutton & Barto, it seems these 2 terms are used interchangeably.

Here’s the difference.

An epsilon-soft (\( \varepsilon-soft \)) policy is any policy where the probability of all actions given a state \(s\) is greater than some minimum value, specifically:

$$ \pi(a | s) \ge \frac{\varepsilon}{| \mathcal{A}(s) |} , \forall a \in \mathcal{A}(s) $$

The epsilon-greedy (\( \varepsilon-greedy \)) policy is a specific instance of an epsilon-soft policy.

Specifically, the epsilon-greedy policy can be defined as epsilon-greedy with respect to the action-value \( Q(s,a) \).

Let \( a^* \) be the greedy action with respect to \( Q(s,a) \), so that:

$$ a^* = \arg\max_a Q(s,a) $$

Then the epsilon-greedy policy assigns the following probabilities to the actions in \( \mathcal{A}(s) \):

\pi(a | s) &=& 1 – \varepsilon + \frac{\varepsilon}{| \mathcal{A}(s) |}, &a =& a^* \\
\pi(a | s) &=& \frac{\varepsilon}{| \mathcal{A}(s) |}, &a \ne& a^*

This can be accomplished using the following code:

def epsilon_greedy(Q, s, epsilon):
  if np.random.random() < epsilon:
    return np.random.choice(action_space)
    return np.argmax(Q[s, :])

This assumes that Q is a Numpy-like array with 2 dimensions corresponding to the possible states and actions.

Based on this, we can see that all epsilon-greedy policies are epsilon-soft policies, but not all epsilon-soft policies are epsilon-greedy policies.

Learn more about Reinforcement Learning in my course (75% off more more when you use the following link):

Artificial Intelligence: Reinforcement Learning in Python

Go to comments

All Data is the Same

February 9, 2020

In this brief post, I am going to discuss my motto, “all data is the same”. Many years ago, when I began making courses, I created this motto because I found that many beginner students didn’t understand the purpose or goal of a machine learning course.

They often assumed that a machine learning course should consist of 3 simple lines of Scikit-Learn code, applied to whatever data they happened to be interested in (so called “real-world” examples). Some examples might be: customer segmentation, fraud detection, disease prediction, etc. etc.

Problem: students are only interested in their own problems, not the problems of other students!

It is an inherently selfish desire, but furthermore, it’s a desire that can never be satisfied, because no two students are interested in the same “real-world” examples and applications.

You can’t do a biology example, because the finance students would not understand it. You can’t do a finance example, because the biology students would not understand it.

The goal isn’t to learn the finance part. The goal isn’t to learn the biology part. It’s to learn the machine learning part!

Finance and biology are what we call “domain knowledge”. That’s the part you learn by yourself during your finance degree or your biology degree. It is not part of a machine learning course.

The goal then, is to use simple examples that everyone can understand easily. Especially important are visualizable examples. I often use the “Gaussian clouds” in my courses, because it provides geometric intuition for what machine learning is actually doing.

When you realize that all you’re trying to do is separate the purple dots from the red dots, you realize that it’s not magic after all.

A beginner may say: “Yeah, but Gaussian clouds are not REAL DATA!”

This is not correct thinking.

The correct thinking is: it doesn’t matter what the data is. The code would be the same anyway.

The most important fact to realize is that the point of “learning” machine learning isn’t that 3 lines of Scikit-Learn code.

You should be able to do that all by yourself after spending a few minutes reading the documentation.

“Learning” machine learning means learning what goes on inside those 3 lines of Scikit-Learn code, and realizing that it encapsulates perhaps tens or hundreds of lines of code. True understanding and competence would be the ability to implement that code yourself, without needing Scikit-Learn.

Furthermore, that 3 lines of Scikit-Learn code is the same for any dataset.

You wouldn’t ask, “How can I adapt this algorithm to work on my finance dataset?”

The algorithm doesn’t change just because you are using your special finance dataset or your special biology dataset.

Linear regression is always linear regression, the same linear regression that has existed for hundreds of years. There’s no such thing as “linear regression for biology” or “linear regression for finance”.

Linear regression is: \(w = (X^TX)^{-1}X^Ty\). This is the case for any dataset.

That’s why “all data is the same”.

Further notes:

If it takes you 20+ minutes to understand these 3 lines of code:

model = RandomForest()
model.fit(X, Y)

then something is really, really wrong. Nobody who is at the level where they’re ready for ML should need a significant amount of time to understand this.

Go to comments

Deep Learning and Artificial Intelligence Newsletter

Get discount coupons, free machine learning material, and new course announcements