NEW COURSE: Time Series Analysis, Forecasting, and Machine Learning in Python

June 16, 2021

Time Series Analysis, Forecasting, and Machine Learning in Python

VIP Promotion

The complete Time Series Analysis course has arrived

Hello friends!

2 years ago, I asked the students in my Tensorflow 2.0 course if they’d be interested in a course on time series. The answer was a resounding YES.

Don’t want to read the rest of this little spiel? Just get the coupon:

(Updated: Expires May 25, 2022)

(note: this VIP coupon expires in 30 days!)

Time series analysis is becoming an increasingly important analytical tool.

  • With inflation on the rise, many are turning to the stock market and cryptocurrencies in order to ensure their savings do not lose their value.
  • COVID-19 has shown us how forecasting is an essential tool for driving public health decisions.
  • Businesses are becoming increasingly efficient, forecasting inventory and operational needs ahead of time.

Let me cut to the chase. This is not your average Time Series Analysis course. This course covers modern developments such as deep learning, time series classification (which can drive user insights from smartphone data, or read your thoughts from electrical activity in the brain), and more.

We will cover techniques such as:

  • ETS and Exponential Smoothing
  • Holt’s Linear Trend Model
  • Holt-Winters Model
  • ACF and PACF
  • Vector Autoregression and Moving Average Models (VAR, VMA, VARMA)
  • Machine Learning Models (including Logistic Regression, Support Vector Machines, and Random Forests)
  • Deep Learning Models (Artificial Neural Networks, Convolutional Neural Networks, and Recurrent Neural Networks)
  • GRUs and LSTMs for Time Series Forecasting

We will cover applications such as:

  • Time series forecasting of sales data
  • Time series forecasting of stock prices and stock returns
  • Time series classification of smartphone data to predict user behavior

The VIP version of the course (obtained by purchasing the course NOW during the VIP period) will cover even more exciting topics, such as:

  • AWS Forecast (Amazon’s state-of-the-art low-code forecasting API)
  • GARCH (financial volatility modeling)
  • FB Prophet (Facebook’s time series library)
  • Granger Causality

As always, please note that the VIP period may not last forever, and if / when the course becomes “non-VIP”, the VIP contents will be removed. If you purchased the VIP version, you will retain permanent access to the VIP content via my website, simply by letting me know via email you’d like access (you only need to email if I announce the VIP period is ending).

So what are you waiting for? Get the VIP version of Time Series Analysis NOW:

Go to comments

NEW COURSE: Financial Engineering and Artificial Intelligence in Python

September 8, 2020

Financial Engineering and Artificial Intelligence in Python

VIP Promotion

The complete Financial Engineering course has arrived

Hello once again friends!

Today, I am announcing the VIP version of my latest course: Financial Engineering and Artificial Intelligence in Python.

If you don’t want to read my little spiel just click here to get your VIP coupon: (expires Oct 9, 2020) (expires May 25, 2022)

(as usual, this coupon lasts only 30 days, so don’t wait!)

This is a MASSIVE (21 hours) Financial Engineering course covering the core fundamentals of financial engineering and financial analysis from scratch. We will go in-depth into all the classic topics, such as:

  • Exploratory data analysis, significance testing, correlations
  • Alpha and beta
  • Advanced Pandas Data Frame manipulation for time series and finance
  • Time series analysis, simple moving average, exponentially-weighted moving average
  • Holt-Winters exponential smoothing model
  • Efficient Market Hypothesis
  • Random Walk Hypothesis
  • Time series forecasting (“stock price prediction”)
  • Modern portfolio theory
  • Efficient frontier / Markowitz bullet
  • Mean-variance optimization
  • Maximizing the Sharpe ratio
  • Convex optimization with Linear Programming and Quadratic Programming
  • Capital Asset Pricing Model (CAPM)
  • Algorithmic trading

In addition, we will look at various non-traditional techniques which stem purely from the field of machine learning and artificial intelligence, such as:

  • Regression models
  • Classification models
  • Unsupervised learning
  • Reinforcement learning and Q-learning

We will learn about the greatest flub made in the past decade by marketers posing as “machine learning experts” who promise to teach unsuspecting students how to “predict stock prices with LSTMs”. You will learn exactly why their methodology is fundamentally flawed and why their results are complete nonsense. It is a lesson in how not to apply AI in finance.

List of VIP-only Contents

As with my Tensorflow 2 release, some of the VIP content will be a surprise and will be released in stages. Currently, the entirety of the Algorithmic Trading sections are VIP sections. Newly added VIP sections include Statistical Factor Models and “The Lazy Programmer Bonus Offer”. Here’s a full list:

Classic Algorithmic Trading – Trend Following Strategy

You will learn how moving averages can be applied to do algorithmic trading.

Machine Learning-Based Trading Strategy

Forecast returns in order to determine when to buy and sell.

Reinforcement Learning-Based (Q-Learning) Trading Strategy

I give you a full introduction to Reinforcement Learning from scratch, and then we apply it to build a Q-Learning trader. Note that this is *not* the same as the example I used in my Tensorflow 2, PyTorch, and Reinforcement Learning courses. I think the example included in this course is much more principled and robust.

Statistical Factor Models

The CAPM is one of the most renowned financial models in history, but did you know it’s only the simplest factor model, with just a single factor? To go beyond just this single factor model, we will learn about statistical factor models, where the multiple “factors” are found automatically using only the data.

Regime Detection with Hidden Markov Models (HMMs)

In the first section on financial basics, we learn how to model the distribution of returns. But can we really say “the” distribution, as if there is only one?

One important “stylized fact” about returns is that volatility “clusters” or “persists”. That is, large returns tend to be surrounded by more large returns, and small returns by more small returns.

In other words, returns are actually nonstationary and to build a more accurate model we should not assume that they all come from the same distribution at all times.

Using HMMs, we can model this behavior. HMMs allow you to model hidden state sequences (high volatility and low volatility regimes), from which observations (the actual returns) are generated.

The Lazy Programmer Bonus Offer

There are marketers out there who want to capitalize on your enthusiastic interest in finance, and unfortunately what they are teaching you is utter and complete garbage.

They will claim that they can “predict stock prices with LSTMs” and show you charts like this with nearly perfect stock price predictions.

Hint: if they can do this, why do they bother putting effort into making courses? Wouldn’t they already be billionaires?

Have you ever wondered if you are taking such a course from a fake data scientist / marketer? If so, just send me a message, and I will tell you whether or not you are taking such a course. (Hint: many of you are) I will give you a list of mistakes they made so you can look out for them yourself, and avoid “learning” things which will ultimately make YOU look very bad in front of potential future employers.

Believe me, if you ever try to get a job in machine learning or data science and you talk about a project where you “predicted stock prices with LSTMs”, all you will be demonstrating is how incompetent you are. I don’t want to see any of my students falling for this! Save yourself from this embarrassing scenario by taking the “Lazy Programmer Offer”!

Please note: The VIP coupon will work only for the next month (starting from the coupon creation time). It’s unknown whether the VIP period will renew after that time.

After that, although the VIP content will be removed from Udemy, all who purchased the VIP course will get permanent free access to these VIP contents on

In case it’s not clear, the process is very easy. For those folks who want the “step-by-step” instructions:

STEP 1) I announce the VIP content will be removed.

STEP 2) You email me with proof that you purchased the course during the VIP period. Do NOT email me earlier as it will just get buried.

STEP 3) I will give you free access to the VIP materials for this course on

Benefits of taking this course

  • Learn the knowledge you need to work at top tier investment firms
  • Gain practical, real-world quantitative skills that can be applied within and outside of finance
  • Make better decisions regarding your own finances

Personally, I think this is the most interesting and action-packed course I have created yet. My last few courses were cool, but they were all about topics which I had already covered in the past! GANs, NLP, Transfer Learning, Recommender Systems, etc etc. all just machine learning topics I have covered several times in different libraries. This course contains new, fresh content and concepts I have never covered in any of my courses, ever.

This is the first course I’ve created that extends into a niche area of AI application. It goes outside of AI and into domain expertise. An in-depth topic such as finance deserves its own course. This is that course. These are topics you will never learn in a generic data science or machine learning course. However, as a student of AI, you will recognize many of our tools and methods being applied, such as statistical inference, supervised and unsupervised learning, convex optimization, and optimal control. This allows us to go deeper than your run of the mill financial engineering course, and it becomes more than just the sum of its parts.

So what are you waiting for?

Go to comments

The complete PyTorch course for AI and Deep Learning has arrived

April 1, 2020

PyTorch: Deep Learning and Artificial Intelligence

VIP Promotion

The complete PyTorch course has arrived

Hello friends!

I hope you are all staying safe. Well, I’m sure you’ve heard enough about that so how about some different news?

Today, I am announcing the VIP version of my latest course: PyTorch: Deep Learning and Artificial Intelligence

[If you don’t want to read my little spiel just click here to get your VIP coupon:] (expires May 25, 2022)

This is a MASSIVE (over 24 hours) Deep Learning course covering EVERYTHING from scratch. That includes:

  • Machine learning basics (linear neurons)
  • ANNs, CNNs, and RNNs for images and sequence data
  • Time series forecasting and stock predictions (+ why all those fake data scientists are doing it wrong)
  • NLP (natural language processing)
  • Recommender systems
  • Transfer learning for computer vision
  • GANs (generative adversarial networks)
  • Deep reinforcement learning and applying it by building a stock trading bot

IN ADDITION, you will get some unique and never-before-seen VIP projects:

Estimating prediction uncertainty

Drawing the standard deviation of the prediction along with the prediction itself. This is useful for heteroskedastic data (that means the variance changes as a function of the input). The most popular application where heteroskedasticity appears is stock prices and stock returns – which I know a lot of you are interested in.

It allows you to draw your model predictions like this:

Sometimes, the data is simply such that a spot-on prediction can’t be made. But we can do better by letting the model tell us how certain it is in its predictions.

Facial recognition with siamese networks

This one is cool. I mean, I don’t have to tell you how big facial recognition has become, right? It’s the single most controversial technology to come out of deep learning. In the past, we looked at simple ways of doing this with classification, but in this section I will teach you about an architecture built specifically for facial recognition.

You will learn how this can work even on small datasets – so you can build a network that recognizes your friends or can even identify all of your coworkers!

You can really impress your boss with this one. Surprise them one day with an app that calls out your coworkers by name every time they walk by your desk. 😉

Please note: The VIP coupon will work only for the next month (ending May 1, 2020). It’s unknown whether the VIP period will renew after that time.

After that, although the VIP content will be removed from Udemy, all who purchased the VIP course will get permanent free access on

Minimal Prerequisites

This course is designed to be a beginner to advanced course. All that is required is that you take my free Numpy prerequisites to learn some basic scientific programming in Python. And it’s free, so why wouldn’t you!?

You will learn things that took me years to learn on my own. For many people, that is worth tens of thousands of dollars by itself.

There is no heavy math, no backpropagation, etc. Why? Because I already have courses on those things. So there’s no need to repeat them here, and PyTorch doesn’t use them. So you can relax and have fun. =)

Why PyTorch?

All of my deep learning courses until now have been in Tensorflow (and prior to that Theano).

So why learn PyTorch?

Does this mean my future deep learning courses will use PyTorch?

In fact, if you have traveled in machine learning circles recently, you will have noticed that there has been a strong shift to PyTorch.

Case in point: OpenAI switched to PyTorch earlier this year (2020).

Major AI shops such as Apple, JPMorgan Chase, and Qualcomm have adopted PyTorch.

PyTorch is primarily maintained by Facebook (Facebook AI Research to be specific) – the “other” Internet giant who, alongside Google, have a strong vested interest in developing state-of-the-art AI.

But why PyTorch for you and me? (aside from the fact that you might want to work for one of the above companies)

As you know, Tensorflow has adopted the super simple Keras API. This makes common things easy, but it makes uncommon things hard.

With PyTorch, common things take a tiny bit of extra effort, but the upside is that uncommon things are still very easy.

Creating your own custom models and inventing your own ideas is seamless. We will see many examples of that in this course.

For this reason, it is very possible that future deep learning courses will use PyTorch, especially for those advanced topics that many of you have been asking for.

Because of the ease at which you can do advanced things, PyTorch is the main library used by deep learning researchers around the world. If that’s your goal, then PyTorch is for you.

In terms of growth rate, PyTorch dominates Tensorflow. PyTorch now outnumbers Tensorflow by 2:1 and even 3:1 at major machine learning conferences. Researchers hold that PyTorch is superior to Tensorflow in terms of the simplicity of its API, and even speed / performance!

Do you need more convincing?

Go to comments

[VIP COURSE UPDATE] Artificial Intelligence: Reinforcement Learning in Python

March 18, 2020

Artificial Intelligence: Reinforcement Learning in Python

VIP Promotion

Hello all!

In this post, I am announcing the VIP coupon to my course titled “Artificial Intelligence: Reinforcement Learning in Python”.

There are 2 places to get the course.

  1. Udemy, with this VIP coupon: (expires May 25, 2022)
  2. Deep Learning Courses (coupon automatically applied):

You may recognize this course as one that has already existed in my catalog – however, the course I am announcing today contains ALL-NEW material. The entire course has been gutted and every lecture contained within the course did not exist in the original version.

One of the most common questions I get from students in my PyTorch, Tensorflow 2, and Financial Engineering courses is: “How can I learn reinforcement learning?”

While I do cover RL in those courses, it’s very brief. I’ve essentially summarized 12 hours of material into 2. So by necessity, you will be missing some things.

While that serves as a good way to scratch the surface of RL, it doesn’t give you a true, in-depth understanding that you will get by actually learning each component of RL step-by-step, and most importantly, getting a chance to put everything into code!

This course covers:

  • The explore-exploit dilemma and the Bayesian bandit method
  • MDPs (Markov Decision Processes)
  • Dynamic Programming solution for MDPs
  • Monte Carlo Method
  • Temporal Difference Method (including Q-Learning)
  • Approximation Methods using Radial Basis Functions
  • Applying your code to OpenAI Gym with zero effort / code changes
  • Building a stock trading bot (different approach in each course!)


When you get the version, note that you will get both versions (new and old) of the course – totalling nearly 20 hours of material.

If you want access to the tic-tac-toe project, this is the version you should get.

Otherwise, if you prefer to use Udemy, that’s fine too. If you purchase on Udemy but would like access to, I will allow this since they are the same price. Just send me an email and show me your proof of purchase.

Note that I’m not able to offer the reverse (can’t give you access to Udemy if you purchase on, due to operational reasons).

So what are you waiting for?


Go to comments

Become a Millionaire by Taking my Financial Engineering Course

May 17, 2022

I just got an excellent question today about my Financial Engineering course, which allowed me to put into words many thoughts and ideas I’d been pondering recently.

Through this post, I hope to get all these ideas into one place for future reference.


The question was: “How practical is this course? I’ve skimmed through several top ratings on Udemy but have yet seen one boasting how much money the student made after taking it

Will you become a millionaire after taking my financial engineering course?


Let’s answer this question by starting with my own definition of “practical”, and then subsequently addressing the student’s definition of practical which appears to mean “making money”.

In my view, “practical” simply means you’re applying knowledge to a real-world dataset.

For example, my Recommender Systems course is practical because you apply the algorithms we learn to real-world ratings datasets.

My Bayesian Machine Learning: A/B Testing course is practical because you can apply the algorithms to any business scenario where you have to decide between multiple choices based on some numerical objective (e.g. clicks, page view time, etc.)

In the same way, the Financial Engineering course is extremely practical, because the whole course is about applying algorithms to real-world financial datasets. The application is a real-world problem.

This is unlike, say, reading Pattern Recognition and Machine Learning by Bishop, which is all about the algorithms and not the fields of application. The implication is that, you know what you’re doing and can take those algorithms and apply them to your own data.

On one hand, that’s powerful – because you can apply these algorithms to any field (like biology, astronomy, chemistry, robotics, control systems, and yes, finance), but at the same time, you have to be pretty smart to do it. The average Udemy student would struggle.

In that sense, this is the most practical you can get. Everything you learn in this course is being directly applied to real-world data in a specific field (finance).

You can grab one of the algorithms taught in the course and start using it today on your own investing account. There’s a lecture about that in the Summary section called “Applying This Course” for those who need extra help.

Importantly, do keep in mind that while I can teach you what to do, I can’t actually make you do it.

In A/B Testing, I can show you the code, but the rest is up to the student to make it practical, by actually getting a job where they get to do that in a production system, or by inserting the code into their own production website so they can feed it to live users.

Funny enough, A/B Testing isn’t even about finance nor money. But will you make money with those techniques? YES. Amazon, Facebook, Netflix, etc. are already using the same techniques with great success.

The only reason some students might say it’s not practical is because they are too lazy/incompetent to get off their butts and actually do it!

Same here. I can teach the algorithms, but I can’t go into your brokerage account and run them for you.


Now let’s consider the definition of “practical” in the sense of being guaranteed to “make money”.

This is a common concern among students who are new to finance and don’t really know yet what to expect.

Let’s suppose I could guarantee that by taking this course, you could make money.

Consider some obvious questions:

  • If this were true, anyone (including myself) would just scale it up and become extremely wealthy without doing any work. Clearly, no such thing exists (that is public and that we know of).
  • If this were true, why would anyone work? Financial engineering graduates wouldn’t bother to apply for jobs, they would just run algorithms all day. They would teach their friends / family to do the same. No one would ever bother to get a job.
  • If this were true, why would hedge funds bother to hire employees? After inventing an algorithm, they could just run it forever. What’s the point of wasting money to hire humans? What would they even do?
  • If this were true, why would hedge funds bother to hire PhDs and why would people bother to get PhDs? Imagine you could increase your investments infinitely from a 20 hour online course. What kind of insane person would work for 4-7 years just to get a pittance and a paper that says “PhD”?

On the contrary, the reality is this.

The financial sector does hire very smart people and it is well-known that they have poor work-life balance.

They must be working hard. What are they doing?

Why can’t they just learn an algorithm and sit back and relax?


Instead, let’s expand the definition of “practical”.

Originally, this question was asked in a comment on a video I made about predicting stock prices with LSTMs. Is this video practical? YES. If you didn’t know this, you could have spent weeks / months / maybe even your whole life trying to “predict stock prices with LSTMs”, with zero clue that it didn’t actually work. That would be sad.

Spending weeks or months doing something that doesn’t even make sense is what I would consider to be very impractical. And hence, learning how to avoid it would be very practical.

A lot of the course is about how to properly model and analyze. How to stay away from stupidity.

One of the major themes of the course is that “Santa Claus doesn’t exist”.

A naive person might think “there must be some way to predict the stock price, you are just not telling me about the most advanced algos!”

But the “Santa Claus doesn’t exist” moment is when we prove mathematically why certain predictions are impossible.

This is practical because it saves you from attempting something which doesn’t make any logical sense.

Obviously, it doesn’t fulfill the childhood dream of meeting Santa (predicting an unpredictable time series), but I would posit that trying to meet Santa is what is really impractical.

What is actually practical is learning how to determine whether you can or cannot predict a time series (at which point, you can then make your predictions as normal).

I’ll give you another example lesson.

If you used the simplest trading strategy from this course, you could have beat the market from 2000 – 2018.

Using the same algorithm, you would have underperformed the market from 2018 to now.

The practical lesson there is that “past performance doesn’t indicate future performance”.

This is how you can have a “practical” lesson, which doesn’t automatically imply “guaranteed rate of return” (which is impossible).

Go to comments

Machine Learning in Finance by Dixon, Halperin, Bilokon – A Critique

May 16, 2022

Check out the video version of this post on YouTube:


In this post, I’m going to write about one of my all-time favorite subjects: the wrong way to predict stock and cryptocurrency prices.

Despite the fact that I’ve discussed this many times before, I’m very excited about this one.

It’s not everyday I get to critique a published book by a big name like Springer.

The book I’m referring to is called “Machine Learning in Finance: From Theory to Practice”, by Matthew Dixon, Igor Halperin, and Paul Bilokon.

Now you might think I’m beating a dead horse with this video, which is kind of true.

I’ve already spoken at length about the many mistakes people make when trying to predict stock prices.

But there are a few key differences with this video.

Firstly, in past videos, I’ve mentioned that it is typically bloggers and marketers who put out this bad content.

This time, it’s not a blogger or marketer, but an Assistant Professor of Applied Math at the Illinois Institute of Technology.

Secondly, while I’ve spoken about what the mistakes are, I’ve never done a case study where I’ve broken down actual code that makes these mistakes.

This is the first.

Thirdly, in my opinion, this is the most important topic to cover for beginners to finance, because it’s always the first thing people try to do. They want to predict future prices so they know what to invest in today.

If you take my course on Financial Engineering, you’ll learn that this is completely untrue. Price prediction barely scratches the surface of true finance.


In order to get the code I’ve used in this video, please use this link:

Note that it’s a copy of the code provided with the textbook, with added code for my own experiments (computing the naive forecast and the corresponding train / test MSE).

I also removed code for a different type of RNN called the “alpha RNN”, which uses an old version of Keras. Removing this code doesn’t make a difference in our results because this model didn’t perform well.


The mistakes I’ll cover in this post are as follows.

1) They only standardize the price time series, which does nothing about the problem of extrapolation.

2) They never check whether their model can beat the naive forecast. Spoiler alert. I checked, and it doesn’t. The models they built are worse than useless.

3) Misleading train-test split.


So let’s talk about mistake #1, which is why standardizing a price time series does not work.

The problem with prices is that they are ever increasing. This wasn’t the case for the time period used in the textbook, but it is the case in general.

Why is this an issue?

The train set is always in the past, and the test set is always in the future.

Therefore, the values in the test set in general will be higher than the values in the train set.

If you build an autoregressive model based on this data, your model will have to extrapolate to a domain never seen before in the train set.

This is not good, because machine learning models suck at extrapolation.

How they extrapolate has more to do with the model itself, than it has to do with the data.

We analyzed this phenomena in my course on time series analysis.

For instance, decision trees tend to extrapolate by going horizontally outward.

Neural networks, Gaussian Processes, and other models all behave differently, and none of these behaviors are related to the data.


Mistake #2, which is the worst mistake, is that the authors never check against the naive forecast.

As you recall, the naive forecast is when your prediction is simply the past known value.

In their notebook, the authors predict 4 time steps ahead.

So effectively, our naive prediction is the price from 4 time steps in the past.

Even this very dumb prediction beats their fancy RNN models. Surprisingly, this happens not just for the test set, but the train set as well.


Mistake #3 is the misleading train-test split.

In the notebook, the authors make a plot of their models’ predictions against the true price.

Of course, the error looks very small and very close to the true price in all cases.

But remember that this is misleading. It doesn’t tell you that these models actually suck.

In time series analysis, when we think of a test set, we normally think of it as the forecast horizon.

Instead, the forecast horizon is actually 4 time steps, and the plot actually just shows the incremental predictions at each time step using true past data.

To be clear, although this is not a forecast, it’s also not technically wrong, but it’s still misleading and totally useless for evaluating the efficacy of these models.

As we saw from mistake #2, even just the naive forecast beats these models, which you wouldn’t know from these seemingly good plots.


So I hope this post serves as a good lesson that you always have to be careful about how you apply machine learning in finance.

Even big name publishers like Springer, and reputable authors who might even be college professors, are not immune to these mistakes.

Don’t trust everything you see, and always experiment and stress test any claims.

Go to comments

Using Granger Causality to Determine Whether Twitter Sentiment Predicts Bitcoin Price Movement

February 22, 2022

In this article, we are again going to combine my current favorite subjects: natural language processing, time series analysis, and financial analysis.

Recently, I created a couple lectures covering Granger causality, so this topic is fresh on my mind.

In short, Granger causality is used to determine whether one time series can be used to forecast another (i.e. predict the future).

In these lectures, I demonstrated that some economics variables are Granger causal (in particular, GDP and term spread).

Of course, another easy application is to determine whether or not Twitter sentiment can predict cryptocurrency movements.

This post is based on this short publication: “Does Twitter Predict Bitcoin?” by Shen, D., Urquhart, A. and Wang, P. (2019) and can be found at

The premise is quite simple and you really have to just understand these 3 components in order to implement this yourself:

1) How to get a Twitter sentiment time series

2) How to get Bitcoin price time series

3) How to implement the Granger causality test

If you can do 1-3, you can predict Bitcoin! (at least, partially)

So let’s go over each of these 3 topics in order.


How to get a Twitter sentiment time series

This is going to probably be the most difficult part for most students. Most students are used to downloading a CSV dataset that I typically make very nice and simple for my courses.

Unfortunately, real life is not like this.

This becomes a data engineering problem.

Which tweets by which authors do you choose?

How do you use Twitter’s API to download the tweets?

Where do you store the tweets?

Once you’ve figured that out, you need to convert the tweets into a number (sentiment) such that the numbers collectively form a time series.

That part is not so hard.

I’ve demonstrated several methods of doing this, such as:

a) training your own model on sentiment data (you could even create your own dataset)

b) using a pretrained Transformer model


How to get Bitcoin price time series

In contrast to the first task, this is probably the easiest.

In the past, I’ve demonstrated how you can easily get minute, daily, monthly, etc. data for essentially any ticker using the yfinance Python package.


How to implement the Granger causality test

For those of you who haven’t learned Time Series Analysis with me in the past, you perhaps have never heard of Granger causality.

In short, we build a multivariate autoregressive time series model called a VAR model.

It takes the form of:

$$y(t) = \sum_{\tau=1}^L A_\tau y(t-\tau) + \varepsilon(t)$$

Essentially, if you find any component \( A_\tau(j,i) \) is “big enough” (in magnitude), then you can conclude that \( y_i(t) \) Granger causes \( y_j(t) \).

As in regression analysis, one decides whether these model coefficients are statistically significant by using hypothesis testing.

It’s important to note that Granger causality is not “true” causality as one usually thinks of it (e.g. eating food causes me to be satiated). Granger causal simply means that one time series is useful in forecasting another (hence the cross-coefficients being non-zero).

Luckily, the Granger causality test is very easy to use in Python with the statsmodels package.

Suppose you have your 2 time series (BTC returns and Twitter sentiment) in a 2-column dataframe (sidenote: your time series should be stationary so you should use returns and not prices).

Then you simply call the statsmodels function:

This will output p-values for every lag so you can see whether or not the sentiment at that particular lag affects the BTC return.

Final note: unfortunately, the paper only shows that Twitter sentiment Granger causes some function of the squared return. This means we lose information about whether the return is actually going up or down!

Go to comments

FREE Exercise: Predict Stocks with News, + Other ML News

January 19, 2022

TL;DR: this is an article about how to predict stocks using the news.

In this article, we are going to do an exercise involving my 2 current favorite subjects: natural language processing and financial engineering!

I’ll present this as an exercise / tutorial, so hopefully you can follow along on your own.

One comment I frequently make about predicting stocks is that autoregressive time series models aren’t really a great idea.

Basic analysis (e.g. ACF, PACF) shows no serial correlation in returns (that is, there’s no correlation between past and future) and hence, the future is not predictable from the past.

The best-fitting ARIMA model is more often than not, a simple random walk.

What is a random walk? If you haven’t yet learned this from me, then basically think of it like flipping a coin at each time step. The result of the coin flip tells you which way to walk: up the street or down the street.

Just as you can’t predict the result of a coin flip from past coin flips (by the way, this is essentially the gambler’s fallacy!), so too is it impossible to predict the next step of a random walk.

In these situations, the best prediction is simply the last-known value.

This is why, when one tries to fit an LSTM to a stock price time series, all it ends up doing is predicting close to the previous value.

There is a nice quote which is unfortunately (as far as I know) unattributed, that says something like: “trying to predict the future from the past is like trying to drive by looking through the rearview mirror”.

Anyway, this brings us to the question: “If I don’t use past prices, then what do I use?”

One common approach is to use the news.

We’ve all seen that news and notable events can have an impact on stock / cryptocurrency prices. Examples:

  • The Omicron variant of COVID-19
  • High inflation
  • Supply-chain issues
  • Elon Musk tweeting about Dogecoin
  • Mark Zuckerberg being grilled by the government

Luckily, I’m not going to make you scrape the web to download news yourself.

Instead, we’re going to use a pre-built dataset, which you can get at:

Briefly, you’ll want to look at the “combined” CSV file which has the following columns:

  • Date (e.g. 2008-08-11 – daily data)
  • Label (0 or 1 – whether or not the DJIA went up or down)
  • Top1, Top2, …, Top25 (news in the form of text, retrieved from the top 25 Reddit news posts)

Note that this is a binary classification problem.

Thanks to my famous rule, “all data is the same“, your code should be no different than a simple sentiment analysis / spam detection script.

To start you off, I’ll present some basic starter code / tips.


Tip 1) Some text contains weird formatting, e.g.

b”Georgia ‘downs two Russian warplanes’ as cou…

Basically, it looks like how a binary string would be printed out, but the “b” is part of the actual string.

Here’s a simple way to remove unwanted characters:


Tip 2) Don’t forget that this is time-ordered data, so you don’t want to do a train-test split with shuffling (mixing future and past in the train and test sets). The train set should only contain data that comes before the test set.


Tip 3) A simple way to form feature vectors from the news would be to just concatenate all 25 news columns into a single text, and then apply TF-IDF. E.g.

I’ll leave the concatenation part as an exercise for you.


Here are some extra thoughts to consider:

  • How were the labels created? Does that method make sense? Is it based on close-close or open-close?
  • What were the exact times that the news was posted? Was there sufficient time between the latest news post and the result from which the label is computed?
  • Returns tend to be very noisy. If you’re getting something like 85% test accuracy, you should be very suspicious that you’ve done something wrong. A more realistic result would be around 50-60%. Even 60% would be considered suspiciously high.


So that’s basically the exercise. It is simple, yet hopefully thought-provoking.


Now I didn’t know where else to put this ML news I found recently, but I enjoyed it so I want to share it with you all.

First up: “Chatbots: Still Dumb After All These Years

I enjoyed this article because I get a lot of requests to cover Chatbots.

Unfortunately, Chatbot technology isn’t very good.

Previously, we used seq2seq (and also seq2seq with attention) which basically just learns to copy canned responses to various inputs. seq2seq means “sequence to sequence” so the input is a sequence (a prompt) and the target/output is a sequence (the chatbot’s response).

Even with Transformers, the best results are still lacking.


Next: “PyTorch vs TensorFlow in 2022

Wait, people are still talking about this in 2022? You betcha!

Read this article. It says a lot of the same stuff I’ve been saying myself. But it’s nice to hear it from someone else.

It also provides actual metrics which I am too lazy to do.


Finally: “Facebook’s advice to students interested in artificial intelligence

This isn’t really “new news” (in fact, Facebook isn’t even called Facebook anymore) but I recently came across this old article I saved many years earlier.

Probably the most common beginner question I get is “why do I need to do all this math?” (in my ML courses).

You’ve heard the arguments from me hundreds of times.

Perhaps you are hesitant to listen to me. That would be like listening to your parents. Yuck.

Instead, why not listen to Yann LeCun? Remember that guy? The guy who invented CNNs?

He’s the Chief AI Scientist at Facebook (Meta) now, so if you want a job there, you should probably listen to his advice…

And if you think Google, Netflix, Amazon, Microsoft, etc. are any different, well, that is wishful thinking my friends.

What do you think?

Is this convincing? Or is Yann LeCun just as wrong as I am?

Let me know!

Go to comments

[NEW COURSE] Machine Learning: Natural Language Processing in Python (V2)

December 20, 2021

VIP Promotion

Machine Learning: Natural Language Processing in Python (V2)

===The Complete Natural Language Processing Course Has Arrived===

Hello friends!

Welcome to my latest course, on Natural Language Processing (NLP).

Don’t want to read my little spiel? Just click here to get the VIP discount (expires in 30 days – Jan 20, 2022!):

UPDATE: The opportunity to get the VIP version on Udemy has expired. However, the main part of the course (without the VIP parts) is now available at a new low price. Click here to automatically get the current lowest price:

UPDATE 2: Some of you may see the full price of $199 USD without any discount. This is because promotions going forward will now be decided by Udemy, so you will only get what they give. Such is the downside of not getting the VIP version. From what I hear, promotions happen quite often, so you should not have to wait too long.

UPDATE 3: I’ve updated the above with an actual coupon code, so ALL students should see a discount.

UPDATE 4: For those of you waiting for me to finish the rest of the course (e.g. deep learning sections) that has now been done. I’ve also added a big handful of advanced notebooks to the VIP content! (see “part 6” below)

IMPORTANT INFO: For those of you who missed the VIP discount but still want access to the VIP content, scroll to the bottom of this post. For those who got the VIP version on Udemy and want to access the VIP content for free at its new permanent home, scroll to the bottom of this post.


“Wait a minute… don’t you already have like, 3 courses on NLP?”


My first NLP course was released over 5 years ago. While there have been updates to it over the years, it has turned into a Frankenstein monster of sorts.

Therefore, the logical action was to simply start anew.

This course is another MASSIVE one – I say it’s basically 4 courses in 1 (not including the VIP section).

One of those “courses” (the ML part) is a revamp of my original 2016 NLP course. And therefore, this new course is actually a superset of NLP V1. The TL;DR: way more content, better organization.

Let’s get to the details:

Part 1: Vector models and text-preprocessing

  • Tokenization, stemming, lemmatization, stopwords, etc.
  • CountVectorizer and TF-IDF
  • Basic intro to word2vec and GloVe
  • Build a text classifier
  • Build a recommendation engine

Part 2: Probability models

  • Markov models and language models
  • Article spinner
  • Cipher decryption

Part 3: Machine learning

  • Spam detection with Naive Bayes
  • Sentiment analysis with Logistic Regression
  • Text summarization with TF-IDF and TextRank
  • Topic modeling with Latent Dirichlet Allocation and Non-negative Matrix Factorization*
  • Latent semantic indexing (LSI / LSA) with PCA / SVD*
  • VIP only: Applying LSI to text summarization, topic modeling, classification, and recommendations*

Part 4: Deep learning*

  • Embeddings
  • Feedforward ANNs
  • CNNs
  • RNNs / LSTMs

Part 5: Beginner’s Corner on Transformers with Hugging Face (VIP only)

  • Sentiment analysis revisit
  • Text generation revisit
  • Article spinning revisit
  • Question-answering
  • Zero-shot classification

Part 6: Even MORE bonus VIP notebooks (VIP only)

  • Stock Movement Prediction Using News
  • LSA / LSI for Recommendations
  • LSA / LSI for Classification (Feature Engineering)
  • LSA / LSI for Topic Modeling
  • LSA / LSI for Text Summarization (2 methods)
  • LSTM for Text Generation Notebook (i.e. the “decoder” part of an encoder-decoder network)
  • Masked language model with LSTM Notebook (revisiting the article spinner)

I’m sure many of you are most excited about the Transformers VIP section. Please note that this is not a full course on Transformers. As you know, I like to go very in-depth and as such, this is a topic which deserves its own course. This VIP section is a “beginner’s corner”-style set of lectures, which outlines the tasks that Transformers can do (listed above), along with code examples for each task. The Transformers “code” is very simple – basically just 1 or 2 lines. Don’t worry, the actual notebooks are much longer than that, and demonstrate real meaningful use-cases. The Transformer-specific part is just 1 or 2 lines – and that is great for practical purposes. It does not show you how to train or fine-tune a Transformer, only how to use existing models. If you just want to use Transformers and make use of these state-of-the-art models, but you don’t care about the nitty gritty details, this is perfect for you.

Is the VIP section only ideal for beginners? NO! Despite the name, this section will be useful for everyone, especially those who are interested in Transformers. This is quite a complex topic, and getting “good” with Transformers really requires a step-by-step approach. Think of this as the first step.

What is the “VIP version”? As usual, the VIP version of the course contains extra VIP content only available to those who purchase the course during the VIP period (i.e now). This content will be removed when it becomes a regular, non-VIP course, at which point I will make an announcement. All who sign up for the VIP version will retain access to the VIP content forever via my website, simply by letting me know via email you’d like access (you only need to email if I announce the VIP period is ending).

NOTE: If you are interested in Transformers, a lot of this course contains important prerequisites. The language models and article spinner from part 2 (“probability models”) are very important for understanding pre-training methods. The deep learning sections are very important for learning about embeddings and how neural networks deal with sequences.

NOTE: As per the last few releases, I’ve wanted to get the course into your hands as early as possible. Some sections are still in progress, specifically, those denoted with an asterisk (*) above. UPDATE: All post-release sections have been uploaded!

So what are you waiting for? Get the VIP version of Natural Language Processing (V2) NOW:



For those who missed the VIP version but still want it:

Yes, you can still get the VIP contents! They can now be purchased separately on

You can get it here:


For those of you who already purchased the VIP version and want to get setup with the VIP content on

Email me with your name (exactly as it appears on Udemy) along with your date of purchase. I will look up your details to confirm.

If required, read more about how the “VIP version” works here:


Does this course replace “Natural Language Processing with Deep Learning in Python”, or “Deep Learning: Advanced NLP and RNNs”?

In fact, this course replaces neither of these more advanced NLP courses.

Let’s first consider “Natural Language Processing with Deep Learning in Python”.

This course covers more advanced topics, generally.

For instance, both variants of word2vec (skip-gram and CBOW) are discussed in detail and implemented from scratch. In the current course, only the very basic ideas are discussed.

Another word embedding algorithm called GloVe is taught in detail, along with a from-scratch implementation. In the current course, again it is only mentioned very briefly.

This course reviews RNNs, but goes into great detail on a completely new architecture, the “Recursive Neural Tensor Network”.

Essentially, this is a neural network structured like a tree, which is very useful for tasks such as sentiment analysis where negation of whole phrases may be desired (and easily accomplished with a tree structure).


How about “Deep Learning: Advanced NLP and RNNs”?

Again, there is essentially no overlap.

As the title suggests, this course covers more advanced topics. Like the previously mentioned course, it can be thought of as another sequel to the current course.

This course covers topics such as: bidirectional RNNs, seq2seq (for many-to-many tasks where the input length is not equal to the target length), attention (the central mechanism in transformers), and memory networks.


Go to comments

Deep Learning and Artificial Intelligence Newsletter

Get discount coupons, free machine learning material, and new course announcements