What is the difference between epsilon-greedy and epsilon-soft policies?

February 27, 2020

Learn more about Reinforcement Learning in my course (75% off more more when you use the following link):

Artificial Intelligence: Reinforcement Learning in Python

A common question I get in my Reinforcement Learning class is:

“What is the difference between epsilon-greedy and epsilon-soft policies?”

At first glance, it may seem that these are the same thing. At times, in Sutton & Barto, it seems these 2 terms are used interchangeably.

Here’s the difference.

An epsilon-soft (\( \varepsilon-soft \)) policy is any policy where the probability of all actions given a state \(s\) is greater than some minimum value, specifically:

$$ \pi(a | s) \ge \frac{\varepsilon}{| \mathcal{A}(s) |} , \forall a \in \mathcal{A}(s) $$

The epsilon-greedy (\( \varepsilon-greedy \)) policy is a specific instance of an epsilon-soft policy.

Specifically, the epsilon-greedy policy can be defined as epsilon-greedy with respect to the action-value \( Q(s,a) \).

Let \( a^* \) be the greedy action with respect to \( Q(s,a) \), so that:

$$ a^* = \arg\max_a Q(s,a) $$

Then the epsilon-greedy policy assigns the following probabilities to the actions in \( \mathcal{A}(s) \):

$$\begin{eqnarray}
\pi(a | s) &=& 1 – \varepsilon + \frac{\varepsilon}{| \mathcal{A}(s) |}, &a =& a^* \\
\pi(a | s) &=& \frac{\varepsilon}{| \mathcal{A}(s) |}, &a \ne& a^*
\end{eqnarray}$$

This can be accomplished using the following code:

def epsilon_greedy(Q, s, epsilon):
  if np.random.random() < epsilon:
    return np.random.choice(action_space)
  else:
    return np.argmax(Q[s, :])

This assumes that Q is a Numpy-like array with 2 dimensions corresponding to the possible states and actions.

Based on this, we can see that all epsilon-greedy policies are epsilon-soft policies, but not all epsilon-soft policies are epsilon-greedy policies.

Learn more about Reinforcement Learning in my course (75% off more more when you use the following link):

Artificial Intelligence: Reinforcement Learning in Python

Go to comments


All Data is the Same

February 9, 2020

In this brief post, I am going to discuss my motto, “all data is the same”. Many years ago, when I began making courses, I created this motto because I found that many beginner students didn’t understand the purpose or goal of a machine learning course.

 

The origin of this phrase

Beginners often ask questions such as:

  • “How can I apply this algorithm in the ‘real world’?” (after using it on synthetic data)
  • “How can I apply this algorithm to my dataset?”
  • “How can I apply this algorithm for fraud detection?”
  • “How can I apply this algorithm to sentiment classification?”
  • “How can I apply this algorithm to disease prediction?”

The basic answer is: there is no difference (in relation to the code I gave you in the course).

The code to apply some algorithm in any of these cases is exactly the same.

Because the code is the same and requires no change to adapt to different datasets, this means that in the eyes of the machine learning model, “all data is the same”.

“Data” is just a table of numbers.

ML algorithms don’t care if your data comes from biology, finance, ecology, physics, etc. 

A table of numbers is just a table of numbers. The “real world” meaning is irrelevant to the ML algorithm.

If you want to apply an algorithm from the course to your dataset, no change is required.

This question typically arises when beginner students get frustrated about learning the “theory” behind an algorithm.

They get flustered because there’s math and beginners tend to have very poor math skills.

They want to skip the math and go straight to “applying the algorithm to real-world data”.

As a side note: that’s not what it means to ‘learn’ machine learning. To ‘learn’ machine learning is to learn how the models actually work.

If you want to ‘apply ML to data’, this is trivial. It should take no more than 15 minutes to learn how to plug your data into a scikit-learn model with 3 lines of code. You don’t need a 20 hour course for that.

 

Why what you want doesn’t work

So, why can’t I just make a course showing you examples of how to apply ML algorithms to whatever so-called “real world data” you are interested in?

Problem: students are only interested in their own problems, not the problems of other students!

It is an inherently selfish desire, but furthermore, it’s a desire that can never be satisfied, because no two students are interested in the same “real-world” examples and applications.

You can’t do a biology example, because the finance students would not understand it. You can’t do a finance example, because the biology students would not understand it.

The goal isn’t to learn the finance part. The goal isn’t to learn the biology part. It’s to learn the machine learning part!

Finance and biology are what we call “domain knowledge”. That’s the part you learn by yourself during your finance degree or your biology degree. It is not part of a machine learning course.

The goal then, is to use simple examples that everyone can understand easily. Especially important are visualizable examples. I often use the “Gaussian clouds” in my courses, because it provides geometric intuition for what machine learning is actually doing.

When you realize that all you’re trying to do is separate the purple dots from the red dots, you realize that it’s not magic after all.

A beginner may say: “Yeah, but Gaussian clouds are not REAL DATA!”

This is not correct thinking.

The correct thinking is: it doesn’t matter what the data is. The code would be the same anyway.

The most important fact to realize is that the point of “learning” machine learning isn’t that 3 lines of Scikit-Learn code.

You should be able to do that all by yourself after spending a few minutes reading the documentation.

“Learning” machine learning means learning what goes on inside those 3 lines of Scikit-Learn code, and realizing that it encapsulates perhaps tens or hundreds of lines of code. True understanding and competence would be the ability to implement that code yourself, without needing Scikit-Learn.

Furthermore, that 3 lines of Scikit-Learn code is the same for any dataset.

You wouldn’t ask, “How can I adapt this algorithm to work on my finance dataset?”

The algorithm doesn’t change just because you are using your special finance dataset or your special biology dataset.

Linear regression is always linear regression, the same linear regression that has existed for hundreds of years. There’s no such thing as “linear regression for biology” or “linear regression for finance”.

Linear regression is: \(w = (X^TX)^{-1}X^Ty\). This is the case for any dataset.

That’s why “all data is the same”.


Further notes:

If it takes you 20+ minutes to understand these 3 lines of code:

model = RandomForest()
model.fit(X, Y)
model.predict(X)

then something is really, really wrong. Aim to reach a higher level of understanding.

Go to comments


Deep Learning and Artificial Intelligence Newsletter

Get discount coupons, free machine learning material, and new course announcements