In my latest course (Time Series Analysis), I made subtle hints in the section on Convolutional Neural Networks that instead of using 1-D convolutions on 1-D time series, it is possible to convert a time series into an image and use 2-D convolutions instead.
CNNs with 2-D convolutions are the “typical” kind of neural network used in deep learning, which normally are used on images (e.g. ImageNet, object detection, segmentation, medical imaging and diagnosis, etc.)
In this article, we will look at 2 ways to convert a time series into an image:
- Gramian Angular Field
- Markov Transition Field
Before we continue, just a gentle reminder that the VIP discount coupons for Time Series Analysis, Financial Engineering and PyTorch: Deep Learning and Artificial Intelligence are expiring in just two weeks!
CLICK HERE to get 75% OFF “Time Series Analysis, Forecasting, and Machine Learning in Python”
Topics covered:
- ETS and Exponential Smoothing
- Holt’s Linear Trend Model
- Holt-Winters Model
- ARIMA, SARIMA, SARIMAX, and Auto ARIMA
- ACF and PACF
- Vector Autoregression and Moving Average Models (VAR, VMA, VARMA)
- Machine Learning Models (including Logistic Regression, Support Vector Machines, and Random Forests)
- Deep Learning Models (Artificial Neural Networks, Convolutional Neural Networks, and Recurrent Neural Networks)
- GRUs and LSTMs for Time Series Forecasting
- Time series forecasting of sales data
- Time series forecasting of stock prices and stock returns
- Time series classification of smartphone data to predict user behavior
- AWS Forecast (Amazon’s state-of-the-art low-code forecasting API)
- GARCH (financial volatility modeling)
- FB Prophet (Facebook’s time series library)
CLICK HERE to get 75% OFF “Financial Engineering and Artificial Intelligence in Python”
Topics covered:
- Exploratory data analysis, significance testing, correlations
- Alpha and beta
- Time series analysis, simple moving average, exponentially-weighted moving average
- Holt-Winters exponential smoothing model
- ARIMA and SARIMA
- Efficient Market Hypothesis
- Random Walk Hypothesis
- Time series forecasting (“stock price prediction”)
- Modern portfolio theory
- Efficient frontier / Markowitz bullet
- Mean-variance optimization
- Maximizing the Sharpe ratio
- Convex optimization with Linear Programming and Quadratic Programming
- Capital Asset Pricing Model (CAPM)
- Algorithmic trading
CLICK HERE to get 75% OFF “PyTorch: Deep Learning and Artificial Intelligence in python”
Topics covered:
- Machine learning basics (linear neurons)
- ANNs, CNNs, and RNNs for images and sequence data
- Time series forecasting and stock predictions (+ why all those fake data scientists are doing it wrong)
- NLP (natural language processing)
- Recommender systems
- Transfer learning for computer vision
- GANs (generative adversarial networks)
- Deep reinforcement learning and applying it by building a stock trading bot
Gramian Angular Field
The Gramian Angular Field is quite involved mathematically, so this article will discuss the intuition only, along with the code.
Those interesting in all the gory details are encouraged to read the paper, titled “Encoding Time Series as Images for Visual Inspection and Classification Using Tiled Convolutional Neural Networks” by Zhiguang Wang and Tim Oates.
We’ll build the intuition in a series of steps.
Let us begin by recalling that the dot product or inner product is a measure of similarity between two vectors.
$$\langle a, b\rangle = \lVert a \rVert \lVert b \rVert \cos \theta$$
Where \( \theta \) is the angle between \( a \) and \( b \).
Ignoring the magnitude of the vectors, if the angle between them is small (i.e. close to 0) then the cosine of that angle will be nearly 1. If the angle is perpendicular, the cosine of the angle is 0. If the two vectors are pointing in opposite directions, then the cosine of the angle will be -1.
The Gram Matrix is just the repeated application of the inner product between every vector in a set of vectors, and every other vector in that same set of vectors.
i.e. Suppose that we store a set of column vectors in a matrix called \( X \).
The Gram Matrix is:
$$ G = X^TX $$
This expands to:
$$G = \begin{bmatrix} \langle x_1, x_1 \rangle & \langle x_1, x_2 \rangle & … & \langle x_1, x_N \rangle \\ \langle x_2, x_1 \rangle & \langle x_2, x_2 \rangle & … & \langle x_2, x_N \rangle \\ … & … & … & … \\ \langle x_N, x_1 \rangle & \langle x_N, x_2 \rangle & … & \langle x_N, x_N \rangle \end{bmatrix} $$
In other words, if we think of the inner product as the similarity between two vectors, then the Gram Matrix just gives us the pairwise similarity between every vector and every other vector.
Note that the Gramian Angular Field (GAF) does not apply the Gram Matrix directly (in fact, each value of the time series is a scalar, not a vector).
The first step in computing the GAF is to normalize the time series to be in the range [-1, +1].
Let’s assume we are given a time series \( X = \{x_1, x_2, …, x_N \} \).
The normalized values are denoted by \( \tilde{x_i} \).
The second step is to convert each value in the normalized time series into polar coordinates.
We use the following transformation:
$$ \phi_i = \arccos \tilde{x_i}$$
$$ r_i = \frac{t_i}{N} $$
Where \( t_i \in \mathbb{N} \) represents the timestamp of data point \(x _i \).
Finally, the GAF method defines its own “special” inner product as:
$$ \langle x_1, x_2 \rangle = \cos(\phi_1 + \phi_2) $$
From here, the above formula for \( G \) still applies (except using \( \tilde{X} \) instead of \( X \), and using the custom inner product instead of the usual version).
Here is an illustration of the process:
So why use the GAF?
Like the original Gram Matrix, it gives you a “picture” (no pun intended) of the relationship between every point and every other point in the time series.
That is, it displays the temporal correlation structure in the time series.
Here’s how you can use it in code.
Firstly, you need to install the pyts library. Then, run the following code on a time series of your choice:
Note that the library allows you to rescale the image with the image_size argument.
As an exercise, try using this method instead of the 1-D CNNs we used in the course and compare their performance!
Markov Transition Field
The Markov Transition Field (MTF) is another method of converting a time series into an image.
The process is a bit simpler than that of the GAF.
If you have taken any of my courses which involve Markov Models (like Natural Language Processing, or HMMs) you should feel right at home.
Let’s assume we have an N-length time series.
We begin by putting each value in the time series into quantiles (i.e. we “bin” each value).
For example, if we use quartiles (4 bins), the smallest 25% of values would define the boundaries of the first quartile, the second smallest 25% of values would define the boundaries of the second quartile, etc.
We can think of each bin as a ‘state’ (using Markov model terminology).
Intuitively, we know that what we’d like to do when using Markov models is to form the state transition matrix.
This matrix has the values:
$$A_{ij} = P(s_t = j | s_{t-1} = i)$$
That is, \( A_{ij} \) is the probability of transitioning from state i to state j.
As usual, we estimate this value by maximum likelihood. ( \( A_{ij} \) is the count of transitions from i to j, divided by the total number of times we were in state i).
Note that if we have \( Q \) quantiles (i.e. we have \( Q \) “states”), then \( A \) is a \( Q \times Q \) matrix.
The MTF follows a similar concept.
The MTF (denoted by \( M \)) is an \( N \times N \) matrix where:
$$M_{kl} = A_{q_k q_l}$$
And where \( q_k \) is the quantile (“bin”) for \( x_k \), and \( q_l \) is the quantile for \( x_l \).
Note: I haven’t re-used the letters i and j to index \( M \), which most resources do and it’s super confusing.
Do not mix up the indices for \( M \) and \( A \)! The indices in \( A \) refer to states. The indices for \( M \) are temporal.
\( A_{ij} \) is the probability of transitioning from state i to state j.
\( M_{kl} \) is the probability of a one-step transition from the bin for \( x_k \), to the bin for \( x_l \).
That is, it looks at \( x_k \) and \( x_l \), which are 2 points in the time series at arbitrary time steps \( k \) and \( l \).
\( q_k \) and \( q_l \) are the corresponding quantiles.
\( M_{kl} \) is then just the probability that we saw a direct one-step (i.e. Markovian) transition from \( q_k \) to \( q_l \) in the time series.
So why use the MTF?
It shows us how related 2 arbitrary points in the time series are, relative to how often they appear next to each other in the time series.
Here’s how you can use it in code.
Note that the library allows you to rescale the image with the image_size argument.
As an exercise, try using this method instead of the 1-D CNNs we used in the course and compare their performance
Enjoy!