Using Granger Causality to Determine Whether Twitter Sentiment Predicts Bitcoin Price Movement

February 22, 2022

In this article, we are again going to combine my current favorite subjects: natural language processing, time series analysis, and financial analysis.

Recently, I created a couple lectures covering Granger causality, so this topic is fresh on my mind.

In short, Granger causality is used to determine whether one time series can be used to forecast another (i.e. predict the future).

In these lectures, I demonstrated that some economics variables are Granger causal (in particular, GDP and term spread).

Of course, another easy application is to determine whether or not Twitter sentiment can predict cryptocurrency movements.

This post is based on this short publication: “Does Twitter Predict Bitcoin?” by Shen, D., Urquhart, A. and Wang, P. (2019) and can be found at

The premise is quite simple and you really have to just understand these 3 components in order to implement this yourself:

1) How to get a Twitter sentiment time series

2) How to get Bitcoin price time series

3) How to implement the Granger causality test

If you can do 1-3, you can predict Bitcoin! (at least, partially)

So let’s go over each of these 3 topics in order.


How to get a Twitter sentiment time series

This is going to probably be the most difficult part for most students. Most students are used to downloading a CSV dataset that I typically make very nice and simple for my courses.

Unfortunately, real life is not like this.

This becomes a data engineering problem.

Which tweets by which authors do you choose?

How do you use Twitter’s API to download the tweets?

Where do you store the tweets?

Once you’ve figured that out, you need to convert the tweets into a number (sentiment) such that the numbers collectively form a time series.

That part is not so hard.

I’ve demonstrated several methods of doing this, such as:

a) training your own model on sentiment data (you could even create your own dataset)

b) using a pretrained Transformer model


How to get Bitcoin price time series

In contrast to the first task, this is probably the easiest.

In the past, I’ve demonstrated how you can easily get minute, daily, monthly, etc. data for essentially any ticker using the yfinance Python package.


How to implement the Granger causality test

For those of you who haven’t learned Time Series Analysis with me in the past, you perhaps have never heard of Granger causality.

In short, we build a multivariate autoregressive time series model called a VAR model.

It takes the form of:

$$y(t) = \sum_{\tau=1}^L A_\tau y(t-\tau) + \varepsilon(t)$$

Essentially, if you find any component \( A_\tau(j,i) \) is “big enough” (in magnitude), then you can conclude that \( y_i(t) \) Granger causes \( y_j(t) \).

As in regression analysis, one decides whether these model coefficients are statistically significant by using hypothesis testing.

It’s important to note that Granger causality is not “true” causality as one usually thinks of it (e.g. eating food causes me to be satiated). Granger causal simply means that one time series is useful in forecasting another (hence the cross-coefficients being non-zero).

Luckily, the Granger causality test is very easy to use in Python with the statsmodels package.

Suppose you have your 2 time series (BTC returns and Twitter sentiment) in a 2-column dataframe (sidenote: your time series should be stationary so you should use returns and not prices).

Then you simply call the statsmodels function:

This will output p-values for every lag so you can see whether or not the sentiment at that particular lag affects the BTC return.

Final note: unfortunately, the paper only shows that Twitter sentiment Granger causes some function of the squared return. This means we lose information about whether the return is actually going up or down!

Go to comments

Mistakes in Stock Prediction: Trying to Predict the Price

November 15, 2021

Hello friends! Curious about how to properly predict stock prices?


I’ve now released the third video in this YouTube mini-series.

For those unfamiliar: this is a video series that debunks common mistakes found in nearly all blog articles / Github repos claiming to do “stock price predictions with LSTMs”.

These are typically written by non-experts in the field just looking for clicks, and I have a lot of fun breaking down precisely what they’re doing wrong.

Why is this important?

Beginners are often fooled by such content, wasting money on courses to learn things that don’t work. Even worse, they may end up putting such examples on their own Github accounts or in their portfolios / resumes, worsening their chances of getting a job in the field.

To be clear: the bad part isn’t that they learned something that doesn’t work (although that is pretty bad by itself). The bad part is, they don’t even understand why it doesn’t work. They are confident that it does and will fight me over it! (That is, until I ask them to verify or rebut any of the claims I’ve made).

Thus, not only does this stuff not work (due to all the mistakes I outline in my video series), but it could actually be detrimental to getting a job and working in this area. These are not just harmless mistakes!


The first video discussed why min-max scaling over the train set doesn’t work.

The second video discussed why using prices as inputs (i.e. lagged prices to build an autoregressive model) doesn’t work.

This third video (this video) will discuss why using prices as targets does not work.


Go to comments

Why you shouldn’t use prices as inputs to predict stock prices in machine learning (YouTube Episode 20)

October 12, 2021

Ever come across a machine learning / data science blog demonstrating how to predict stock prices using an autoregressive model, with past stock prices as input?

It’s been awhile, but I am finally continuing this YouTube mini-series I started awhile back, which goes over common mistakes in popular blogs on predicting stock prices with machine learning. This is the 2nd installment.

It is about why you shouldn’t use prices as inputs.


Go to comments

NEW COURSE: Time Series Analysis, Forecasting, and Machine Learning in Python

June 16, 2021

Time Series Analysis, Forecasting, and Machine Learning in Python

VIP Promotion

The complete Time Series Analysis course has arrived

Hello friends!

2 years ago, I asked the students in my Tensorflow 2.0 course if they’d be interested in a course on time series. The answer was a resounding YES.

Don’t want to read the rest of this little spiel? Just get the coupon:

(Updated: Expires May 25, 2022)

(note: this VIP coupon expires in 30 days!)

Time series analysis is becoming an increasingly important analytical tool.

  • With inflation on the rise, many are turning to the stock market and cryptocurrencies in order to ensure their savings do not lose their value.
  • COVID-19 has shown us how forecasting is an essential tool for driving public health decisions.
  • Businesses are becoming increasingly efficient, forecasting inventory and operational needs ahead of time.

Let me cut to the chase. This is not your average Time Series Analysis course. This course covers modern developments such as deep learning, time series classification (which can drive user insights from smartphone data, or read your thoughts from electrical activity in the brain), and more.

We will cover techniques such as:

  • ETS and Exponential Smoothing
  • Holt’s Linear Trend Model
  • Holt-Winters Model
  • ACF and PACF
  • Vector Autoregression and Moving Average Models (VAR, VMA, VARMA)
  • Machine Learning Models (including Logistic Regression, Support Vector Machines, and Random Forests)
  • Deep Learning Models (Artificial Neural Networks, Convolutional Neural Networks, and Recurrent Neural Networks)
  • GRUs and LSTMs for Time Series Forecasting

We will cover applications such as:

  • Time series forecasting of sales data
  • Time series forecasting of stock prices and stock returns
  • Time series classification of smartphone data to predict user behavior

The VIP version of the course (obtained by purchasing the course NOW during the VIP period) will cover even more exciting topics, such as:

  • AWS Forecast (Amazon’s state-of-the-art low-code forecasting API)
  • GARCH (financial volatility modeling)
  • FB Prophet (Facebook’s time series library)
  • Granger Causality

As always, please note that the VIP period may not last forever, and if / when the course becomes “non-VIP”, the VIP contents will be removed. If you purchased the VIP version, you will retain permanent access to the VIP content via my website, simply by letting me know via email you’d like access (you only need to email if I announce the VIP period is ending).

So what are you waiting for? Get the VIP version of Time Series Analysis NOW:

Go to comments

Data Science Interview Questions: Random Walk Hypothesis and Stock Price Prediction

December 10, 2019

Welcome to another episode of Data Science Interview Questions! In this episode, I discuss the Random Walk Hypothesis and Stock Price Prediction.

Why is stock price data often considered to be a random walk?

If your data is best modeled as a random walk, how can you do a time series forecast into the future?

How can you draw a confidence interval around the forecast?

What does this mean for stock price predictions?

Find out here:


What you will learn:

  • How to make the best forecast possible if your data is from a random walk model
  • How to find the confidence bounds for your forecast (also called confidence limits or prediction intervals)
  • Why pretty much all the “data science” instructors out there are really just marketers who have been selling you lies for years
  • Hint: No, LSTMs will not help you predict stock prices and in fact perform worse than the simple model described above
Go to comments

Deep Learning and Artificial Intelligence Newsletter

Get discount coupons, free machine learning material, and new course announcements