In this article, we are again going to combine my current favorite subjects: natural language processing, time series analysis, and financial analysis.
Recently, I created a couple lectures covering Granger causality, so this topic is fresh on my mind.
In short, Granger causality is used to determine whether one time series can be used to forecast another (i.e. predict the future).
In these lectures, I demonstrated that some economics variables are Granger causal (in particular, GDP and term spread).
Of course, another easy application is to determine whether or not Twitter sentiment can predict cryptocurrency movements.
This post is based on this short publication: “Does Twitter Predict Bitcoin?” by Shen, D., Urquhart, A. and Wang, P. (2019) and can be found at https://centaur.reading.ac.uk/80420/1/Twitter.Bitcoin.pdf
The premise is quite simple and you really have to just understand these 3 components in order to implement this yourself:
1) How to get a Twitter sentiment time series
2) How to get Bitcoin price time series
3) How to implement the Granger causality test
If you can do 1-3, you can predict Bitcoin! (at least, partially)
So let’s go over each of these 3 topics in order.
How to get a Twitter sentiment time series
This is going to probably be the most difficult part for most students. Most students are used to downloading a CSV dataset that I typically make very nice and simple for my courses.
Unfortunately, real life is not like this.
This becomes a data engineering problem.
Which tweets by which authors do you choose?
How do you use Twitter’s API to download the tweets?
Where do you store the tweets?
Once you’ve figured that out, you need to convert the tweets into a number (sentiment) such that the numbers collectively form a time series.
That part is not so hard.
I’ve demonstrated several methods of doing this, such as:
a) training your own model on sentiment data (you could even create your own dataset)
b) using a pretrained Transformer model
How to get Bitcoin price time series
In contrast to the first task, this is probably the easiest.
In the past, I’ve demonstrated how you can easily get minute, daily, monthly, etc. data for essentially any ticker using the yfinance Python package.
How to implement the Granger causality test
For those of you who haven’t learned Time Series Analysis with me in the past, you perhaps have never heard of Granger causality.
In short, we build a multivariate autoregressive time series model called a VAR model.
Essentially, if you find any component \( A_\tau(j,i) \) is “big enough” (in magnitude), then you can conclude that \( y_i(t) \) Granger causes \( y_j(t) \).
As in regression analysis, one decides whether these model coefficients are statistically significant by using hypothesis testing.
It’s important to note that Granger causality is not “true” causality as one usually thinks of it (e.g. eating food causes me to be satiated). Granger causal simply means that one time series is useful in forecasting another (hence the cross-coefficients being non-zero).
Luckily, the Granger causality test is very easy to use in Python with the statsmodels package.
Suppose you have your 2 time series (BTC returns and Twitter sentiment) in a 2-column dataframe (sidenote: your time series should be stationary so you should use returns and not prices).
Then you simply call the statsmodels function:
This will output p-values for every lag so you can see whether or not the sentiment at that particular lag affects the BTC return.
Final note: unfortunately, the paper only shows that Twitter sentiment Granger causes some function of the squared return. This means we lose information about whether the return is actually going up or down!
Hello friends! Curious about how to properly predict stock prices?
I’ve now released the third video in this YouTube mini-series.
For those unfamiliar: this is a video series that debunks common mistakes found in nearly all blog articles / Github repos claiming to do “stock price predictions with LSTMs”.
These are typically written by non-experts in the field just looking for clicks, and I have a lot of fun breaking down precisely what they’re doing wrong.
Why is this important?
Beginners are often fooled by such content, wasting money on courses to learn things that don’t work. Even worse, they may end up putting such examples on their own Github accounts or in their portfolios / resumes, worsening their chances of getting a job in the field.
To be clear: the bad part isn’t that they learned something that doesn’t work (although that is pretty bad by itself). The bad part is, they don’t even understand why it doesn’t work. They are confident that it does and will fight me over it! (That is, until I ask them to verify or rebut any of the claims I’ve made).
Thus, not only does this stuff not work (due to all the mistakes I outline in my video series), but it could actually be detrimental to getting a job and working in this area. These are not just harmless mistakes!
The first video discussed why min-max scaling over the train set doesn’t work.
The second video discussed why using prices as inputs (i.e. lagged prices to build an autoregressive model) doesn’t work.
This third video (this video) will discuss why using prices as targets does not work.
Ever come across a machine learning / data science blog demonstrating how to predict stock prices using an autoregressive model, with past stock prices as input?
It’s been awhile, but I am finally continuing this YouTube mini-series I started awhile back, which goes over common mistakes in popular blogs on predicting stock prices with machine learning. This is the 2nd installment.
It is about why you shouldn’t use prices as inputs.
Time series analysis is becoming an increasingly important analytical tool.
With inflation on the rise, many are turning to the stock market and cryptocurrencies in order to ensure their savings do not lose their value.
COVID-19 has shown us how forecasting is an essential tool for driving public health decisions.
Businesses are becoming increasingly efficient, forecasting inventory and operational needs ahead of time.
Let me cut to the chase. This is not your average Time Series Analysis course. This course covers modern developments such as deep learning, time series classification (which can drive user insights from smartphone data, or read your thoughts from electrical activity in the brain), and more.
We will cover techniques such as:
ETS and Exponential Smoothing
Holt’s Linear Trend Model
Holt-Winters Model
ARIMA, SARIMA, SARIMAX, and Auto ARIMA
ACF and PACF
Vector Autoregression and Moving Average Models (VAR, VMA, VARMA)
Machine Learning Models (including Logistic Regression, Support Vector Machines, and Random Forests)
Deep Learning Models (Artificial Neural Networks, Convolutional Neural Networks, and Recurrent Neural Networks)
GRUs and LSTMs for Time Series Forecasting
We will cover applications such as:
Time series forecasting of sales data
Time series forecasting of stock prices and stock returns
Time series classification of smartphone data to predict user behavior
The VIP version of the course (obtained by purchasing the course NOW during the VIP period) will cover even more exciting topics, such as:
As always, please note that the VIP period may not last forever, and if / when the course becomes “non-VIP”, the VIP contents will be removed. If you purchased the VIP version, you will retain permanent access to the VIP content via my website, simply by letting me know via email you’d like access (you only need to email if I announce the VIP period is ending).
So what are you waiting for? Get the VIP version of Time Series Analysis NOW: