Using Granger Causality to Determine Whether Twitter Sentiment Predicts Bitcoin Price Movement

February 22, 2022

In this article, we are again going to combine my current favorite subjects: natural language processing, time series analysis, and financial analysis.

Recently, I created a couple lectures covering Granger causality, so this topic is fresh on my mind.

In short, Granger causality is used to determine whether one time series can be used to forecast another (i.e. predict the future).

In these lectures, I demonstrated that some economics variables are Granger causal (in particular, GDP and term spread).

Of course, another easy application is to determine whether or not Twitter sentiment can predict cryptocurrency movements.

This post is based on this short publication: “Does Twitter Predict Bitcoin?” by Shen, D., Urquhart, A. and Wang, P. (2019) and can be found at https://centaur.reading.ac.uk/80420/1/Twitter.Bitcoin.pdf

The premise is quite simple and you really have to just understand these 3 components in order to implement this yourself:

1) How to get a Twitter sentiment time series

2) How to get Bitcoin price time series

3) How to implement the Granger causality test

If you can do 1-3, you can predict Bitcoin! (at least, partially)

So let’s go over each of these 3 topics in order.

How to get a Twitter sentiment time series

This is going to probably be the most difficult part for most students. Most students are used to downloading a CSV dataset that I typically make very nice and simple for my courses.

Unfortunately, real life is not like this.

This becomes a data engineering problem.

Which tweets by which authors do you choose?

Where do you store the tweets?

Once you’ve figured that out, you need to convert the tweets into a number (sentiment) such that the numbers collectively form a time series.

That part is not so hard.

I’ve demonstrated several methods of doing this, such as:

a) training your own model on sentiment data (you could even create your own dataset)

b) using a pretrained Transformer model

How to get Bitcoin price time series

In contrast to the first task, this is probably the easiest.

In the past, I’ve demonstrated how you can easily get minute, daily, monthly, etc. data for essentially any ticker using the yfinance Python package.

How to implement the Granger causality test

For those of you who haven’t learned Time Series Analysis with me in the past, you perhaps have never heard of Granger causality.

In short, we build a multivariate autoregressive time series model called a VAR model.

It takes the form of:

$$y(t) = \sum_{\tau=1}^L A_\tau y(t-\tau) + \varepsilon(t)$$

Essentially, if you find any component $$A_\tau(j,i)$$ is “big enough” (in magnitude), then you can conclude that $$y_i(t)$$ Granger causes $$y_j(t)$$.

As in regression analysis, one decides whether these model coefficients are statistically significant by using hypothesis testing.

It’s important to note that Granger causality is not “true” causality as one usually thinks of it (e.g. eating food causes me to be satiated). Granger causal simply means that one time series is useful in forecasting another (hence the cross-coefficients being non-zero).

Luckily, the Granger causality test is very easy to use in Python with the statsmodels package.

Suppose you have your 2 time series (BTC returns and Twitter sentiment) in a 2-column dataframe (sidenote: your time series should be stationary so you should use returns and not prices).

Then you simply call the statsmodels function:

This will output p-values for every lag so you can see whether or not the sentiment at that particular lag affects the BTC return.

Final note: unfortunately, the paper only shows that Twitter sentiment Granger causes some function of the squared return. This means we lose information about whether the return is actually going up or down!

Mistakes in Stock Prediction: Trying to Predict the Price

November 15, 2021

Hello friends! Curious about how to properly predict stock prices?

I’ve now released the third video in this YouTube mini-series.

For those unfamiliar: this is a video series that debunks common mistakes found in nearly all blog articles / Github repos claiming to do “stock price predictions with LSTMs”.

These are typically written by non-experts in the field just looking for clicks, and I have a lot of fun breaking down precisely what they’re doing wrong.

Why is this important?

Beginners are often fooled by such content, wasting money on courses to learn things that don’t work. Even worse, they may end up putting such examples on their own Github accounts or in their portfolios / resumes, worsening their chances of getting a job in the field.

To be clear: the bad part isn’t that they learned something that doesn’t work (although that is pretty bad by itself). The bad part is, they don’t even understand why it doesn’t work. They are confident that it does and will fight me over it! (That is, until I ask them to verify or rebut any of the claims I’ve made).

Thus, not only does this stuff not work (due to all the mistakes I outline in my video series), but it could actually be detrimental to getting a job and working in this area. These are not just harmless mistakes!

The first video discussed why min-max scaling over the train set doesn’t work.

The second video discussed why using prices as inputs (i.e. lagged prices to build an autoregressive model) doesn’t work.

This third video (this video) will discuss why using prices as targets does not work.

Enjoy!

Why you shouldn’t use prices as inputs to predict stock prices in machine learning (YouTube Episode 20)

October 12, 2021

Ever come across a machine learning / data science blog demonstrating how to predict stock prices using an autoregressive model, with past stock prices as input?

It’s been awhile, but I am finally continuing this YouTube mini-series I started awhile back, which goes over common mistakes in popular blogs on predicting stock prices with machine learning. This is the 2nd installment.

It is about why you shouldn’t use prices as inputs.

June 16, 2021

VIP Promotion

The complete Time Series Analysis course has arrived

Hello friends!

2 years ago, I asked the students in my Tensorflow 2.0 course if they’d be interested in a course on time series. The answer was a resounding YES.

Don’t want to read the rest of this little spiel? Just get the coupon:

https://www.udemy.com/course/time-series-analysis/?couponCode=TIMEVIP

(Updated: Expires May 25, 2022) https://www.udemy.com/course/time-series-analysis/?couponCode=TIMEVIP11

(note: this VIP coupon expires in 30 days!)

Time series analysis is becoming an increasingly important analytical tool.

• With inflation on the rise, many are turning to the stock market and cryptocurrencies in order to ensure their savings do not lose their value.
• COVID-19 has shown us how forecasting is an essential tool for driving public health decisions.
• Businesses are becoming increasingly efficient, forecasting inventory and operational needs ahead of time.

Let me cut to the chase. This is not your average Time Series Analysis course. This course covers modern developments such as deep learning, time series classification (which can drive user insights from smartphone data, or read your thoughts from electrical activity in the brain), and more.

We will cover techniques such as:

• ETS and Exponential Smoothing
• Holt’s Linear Trend Model
• Holt-Winters Model
• ARIMA, SARIMA, SARIMAX, and Auto ARIMA
• ACF and PACF
• Vector Autoregression and Moving Average Models (VAR, VMA, VARMA)
• Machine Learning Models (including Logistic Regression, Support Vector Machines, and Random Forests)
• Deep Learning Models (Artificial Neural Networks, Convolutional Neural Networks, and Recurrent Neural Networks)
• GRUs and LSTMs for Time Series Forecasting

We will cover applications such as:

• Time series forecasting of sales data
• Time series forecasting of stock prices and stock returns
• Time series classification of smartphone data to predict user behavior

The VIP version of the course (obtained by purchasing the course NOW during the VIP period) will cover even more exciting topics, such as:

• AWS Forecast (Amazon’s state-of-the-art low-code forecasting API)
• GARCH (financial volatility modeling)
• FB Prophet (Facebook’s time series library)
• Granger Causality

As always, please note that the VIP period may not last forever, and if / when the course becomes “non-VIP”, the VIP contents will be removed. If you purchased the VIP version, you will retain permanent access to the VIP content via my website, simply by letting me know via email you’d like access (you only need to email if I announce the VIP period is ending).

So what are you waiting for? Get the VIP version of Time Series Analysis NOW:

NEW COURSE: Financial Engineering and Artificial Intelligence in Python

September 8, 2020

VIP Promotion

The complete Financial Engineering course has arrived

Hello once again friends!

Today, I am announcing the VIP version of my latest course: Financial Engineering and Artificial Intelligence in Python.

https://www.udemy.com/course/ai-finance/?couponCode=FINANCEVIP (expires Oct 9, 2020)

https://www.udemy.com/course/ai-finance/?couponCode=FINANCEVIP20 (expires May 25, 2022)

(as usual, this coupon lasts only 30 days, so don’t wait!)

This is a MASSIVE (21 hours) Financial Engineering course covering the core fundamentals of financial engineering and financial analysis from scratch. We will go in-depth into all the classic topics, such as:

• Exploratory data analysis, significance testing, correlations
• Alpha and beta
• Advanced Pandas Data Frame manipulation for time series and finance
• Time series analysis, simple moving average, exponentially-weighted moving average
• Holt-Winters exponential smoothing model
• ARIMA and SARIMA
• Efficient Market Hypothesis
• Random Walk Hypothesis
• Time series forecasting (“stock price prediction”)
• Modern portfolio theory
• Efficient frontier / Markowitz bullet
• Mean-variance optimization
• Maximizing the Sharpe ratio
• Convex optimization with Linear Programming and Quadratic Programming
• Capital Asset Pricing Model (CAPM)

In addition, we will look at various non-traditional techniques which stem purely from the field of machine learning and artificial intelligence, such as:

• Regression models
• Classification models
• Unsupervised learning
• Reinforcement learning and Q-learning

We will learn about the greatest flub made in the past decade by marketers posing as “machine learning experts” who promise to teach unsuspecting students how to “predict stock prices with LSTMs”. You will learn exactly why their methodology is fundamentally flawed and why their results are complete nonsense. It is a lesson in how not to apply AI in finance.

List of VIP-only Contents

As with my Tensorflow 2 release, some of the VIP content will be a surprise and will be released in stages. Currently, the entirety of the Algorithmic Trading sections are VIP sections. Newly added VIP sections include Statistical Factor Models and “The Lazy Programmer Bonus Offer”. Here’s a full list:

Classic Algorithmic Trading – Trend Following Strategy

You will learn how moving averages can be applied to do algorithmic trading.

Forecast returns in order to determine when to buy and sell.

I give you a full introduction to Reinforcement Learning from scratch, and then we apply it to build a Q-Learning trader. Note that this is *not* the same as the example I used in my Tensorflow 2, PyTorch, and Reinforcement Learning courses. I think the example included in this course is much more principled and robust.

Statistical Factor Models

The CAPM is one of the most renowned financial models in history, but did you know it’s only the simplest factor model, with just a single factor? To go beyond just this single factor model, we will learn about statistical factor models, where the multiple “factors” are found automatically using only the data.

Regime Detection with Hidden Markov Models (HMMs)

In the first section on financial basics, we learn how to model the distribution of returns. But can we really say “the” distribution, as if there is only one?

One important “stylized fact” about returns is that volatility “clusters” or “persists”. That is, large returns tend to be surrounded by more large returns, and small returns by more small returns.

In other words, returns are actually nonstationary and to build a more accurate model we should not assume that they all come from the same distribution at all times.

Using HMMs, we can model this behavior. HMMs allow you to model hidden state sequences (high volatility and low volatility regimes), from which observations (the actual returns) are generated.

The Lazy Programmer Bonus Offer

There are marketers out there who want to capitalize on your enthusiastic interest in finance, and unfortunately what they are teaching you is utter and complete garbage.

They will claim that they can “predict stock prices with LSTMs” and show you charts like this with nearly perfect stock price predictions.

Hint: if they can do this, why do they bother putting effort into making courses? Wouldn’t they already be billionaires?

Have you ever wondered if you are taking such a course from a fake data scientist / marketer? If so, just send me a message, and I will tell you whether or not you are taking such a course. (Hint: many of you are) I will give you a list of mistakes they made so you can look out for them yourself, and avoid “learning” things which will ultimately make YOU look very bad in front of potential future employers.

Believe me, if you ever try to get a job in machine learning or data science and you talk about a project where you “predicted stock prices with LSTMs”, all you will be demonstrating is how incompetent you are. I don’t want to see any of my students falling for this! Save yourself from this embarrassing scenario by taking the “Lazy Programmer Offer”!

Please note: The VIP coupon will work only for the next month (starting from the coupon creation time). It’s unknown whether the VIP period will renew after that time.

After that, although the VIP content will be removed from Udemy, all who purchased the VIP course will get permanent free access to these VIP contents on deeplearningcourses.com.

In case it’s not clear, the process is very easy. For those folks who want the “step-by-step” instructions:

STEP 1) I announce the VIP content will be removed.

STEP 2) You email me with proof that you purchased the course during the VIP period. Do NOT email me earlier as it will just get buried.

STEP 3) I will give you free access to the VIP materials for this course on deeplearningcourses.com.

Benefits of taking this course

• Learn the knowledge you need to work at top tier investment firms
• Gain practical, real-world quantitative skills that can be applied within and outside of finance
• Make better decisions regarding your own finances

Personally, I think this is the most interesting and action-packed course I have created yet. My last few courses were cool, but they were all about topics which I had already covered in the past! GANs, NLP, Transfer Learning, Recommender Systems, etc etc. all just machine learning topics I have covered several times in different libraries. This course contains new, fresh content and concepts I have never covered in any of my courses, ever.

This is the first course I’ve created that extends into a niche area of AI application. It goes outside of AI and into domain expertise. An in-depth topic such as finance deserves its own course. This is that course. These are topics you will never learn in a generic data science or machine learning course. However, as a student of AI, you will recognize many of our tools and methods being applied, such as statistical inference, supervised and unsupervised learning, convex optimization, and optimal control. This allows us to go deeper than your run of the mill financial engineering course, and it becomes more than just the sum of its parts.

So what are you waiting for?

Data Science Interview Questions: Random Walk Hypothesis and Stock Price Prediction

December 10, 2019

Welcome to another episode of Data Science Interview Questions! In this episode, I discuss the Random Walk Hypothesis and Stock Price Prediction.

Why is stock price data often considered to be a random walk?

If your data is best modeled as a random walk, how can you do a time series forecast into the future?

How can you draw a confidence interval around the forecast?

What does this mean for stock price predictions?

Find out here:

What you will learn:

• How to make the best forecast possible if your data is from a random walk model
• How to find the confidence bounds for your forecast (also called confidence limits or prediction intervals)
• Why pretty much all the “data science” instructors out there are really just marketers who have been selling you lies for years
• Hint: No, LSTMs will not help you predict stock prices and in fact perform worse than the simple model described above