Convert a Time Series Into an Image with Gramian Angular Fields and Markov Transition Fields

August 30, 2021

In my latest course (Time Series Analysis), I made subtle hints in the section on Convolutional Neural Networks that instead of using 1-D convolutions on 1-D time series, it is possible to convert a time series into an image and use 2-D convolutions instead.

CNNs with 2-D convolutions are the “typical” kind of neural network used in deep learning, which normally are used on images (e.g. ImageNet, object detection, segmentation, medical imaging and diagnosis, etc.)

In this article, we will look at 2 ways to convert a time series into an image:

  1. Gramian Angular Field
  2. Markov Transition Field

 

 

Gramian Angular Field

 

The Gramian Angular Field is quite involved mathematically, so this article will discuss the intuition only, along with the code.

Those interesting in all the gory details are encouraged to read the paper, titled “Encoding Time Series as Images for Visual Inspection and Classification Using Tiled Convolutional Neural Networks” by Zhiguang Wang and Tim Oates.

We’ll build the intuition in a series of steps.

Let us begin by recalling that the dot product or inner product is a measure of similarity between two vectors.

$$\langle a, b\rangle = \lVert a \rVert \lVert b \rVert \cos \theta$$

Where \( \theta \) is the angle between \( a \) and \( b \).

Ignoring the magnitude of the vectors, if the angle between them is small (i.e. close to 0) then the cosine of that angle will be nearly 1. If the angle is perpendicular, the cosine of the angle is 0. If the two vectors are pointing in opposite directions, then the cosine of the angle will be -1.

The Gram Matrix is just the repeated application of the inner product between every vector in a set of vectors, and every other vector in that same set of vectors.

i.e. Suppose that we store a set of column vectors in a matrix called \( X \).

The Gram Matrix is:

$$ G = X^TX $$

This expands to:

$$G = \begin{bmatrix} \langle x_1, x_1 \rangle & \langle x_1, x_2 \rangle & … & \langle x_1, x_N \rangle \\ \langle x_2, x_1 \rangle & \langle x_2, x_2 \rangle & … & \langle x_2, x_N \rangle \\ … & … & … & … \\ \langle x_N, x_1 \rangle & \langle x_N, x_2 \rangle & … & \langle x_N, x_N \rangle \end{bmatrix} $$

In other words, if we think of the inner product as the similarity between two vectors, then the Gram Matrix just gives us the pairwise similarity between every vector and every other vector.

 

Note that the Gramian Angular Field (GAF) does not apply the Gram Matrix directly (in fact, each value of the time series is a scalar, not a vector).

The first step in computing the GAF is to normalize the time series to be in the range [-1, +1].

Let’s assume we are given a time series \( X = \{x_1, x_2, …, x_N \} \).

The normalized values are denoted by \( \tilde{x_i} \).

The second step is to convert each value in the normalized time series into polar coordinates.

We use the following transformation:

$$ \phi_i = \arccos \tilde{x_i}$$

$$ r_i = \frac{t_i}{N} $$

Where \( t_i \in \mathbb{N} \) represents the timestamp of data point \(x _i \).

Finally, the GAF method defines its own “special” inner product as:

$$ \langle x_1, x_2 \rangle = \cos(\phi_1 + \phi_2) $$

From here, the above formula for \( G \) still applies (except using \( \tilde{X} \) instead of \( X \), and using the custom inner product instead of the usual version).

Here is an illustration of the process:

So why use the GAF?

Like the original Gram Matrix, it gives you a “picture” (no pun intended) of the relationship between every point and every other point in the time series.

That is, it displays the temporal correlation structure in the time series.

Here’s how you can use it in code.

Firstly, you need to install the pyts library. Then, run the following code on a time series of your choice:

 

Note that the library allows you to rescale the image with the image_size argument.

As an exercise, try using this method instead of the 1-D CNNs we used in the course and compare their performance!

 

Markov Transition Field

The Markov Transition Field (MTF) is another method of converting a time series into an image.

The process is a bit simpler than that of the GAF.

If you have taken any of my courses which involve Markov Models (like Natural Language Processing, or HMMs) you should feel right at home.

Let’s assume we have an N-length time series.

We begin by putting each value in the time series into quantiles (i.e. we “bin” each value).

For example, if we use quartiles (4 bins), the smallest 25% of values would define the boundaries of the first quartile, the second smallest 25% of values would define the boundaries of the second quartile, etc.

We can think of each bin as a ‘state’ (using Markov model terminology).

Intuitively, we know that what we’d like to do when using Markov models is to form the state transition matrix.

This matrix has the values:

$$A_{ij} = P(s_t = j | s_{t-1} = i)$$

That is, \( A_{ij} \) is the probability of transitioning from state i to state j.

As usual, we estimate this value by maximum likelihood. ( \( A_{ij} \) is the count of transitions from i to j, divided by the total number of times we were in state i).

Note that if we have \( Q \) quantiles (i.e. we have \( Q \) “states”), then \( A \) is a \( Q \times Q \) matrix.

The MTF follows a similar concept.

The MTF (denoted by \( M \)) is an \( N \times N \) matrix where:

$$M_{kl} = A_{q_k q_l}$$

And where \( q_k \) is the quantile (“bin”) for \( x_k \), and \( q_l \) is the quantile for \( x_l \).

Note: I haven’t re-used the letters i and j to index \( M \), which most resources do and it’s super confusing.

Do not mix up the indices for \( M \) and \( A \)! The indices in \( A \) refer to states. The indices for \( M \) are temporal.

\( A_{ij} \) is the probability of transitioning from state i to state j.

\( M_{kl} \) is the probability of a one-step transition from the bin for \( x_k \), to the bin for \( x_l \).

That is, it looks at \( x_k \) and \( x_l \), which are 2 points in the time series at arbitrary time steps \( k \) and \( l \).

\( q_k \) and \( q_l \) are the corresponding quantiles.

\( M_{kl} \) is then just the probability that we saw a direct one-step (i.e. Markovian) transition from \( q_k \) to \( q_l \) in the time series.

So why use the MTF?

It shows us how related 2 arbitrary points in the time series are, relative to how often they appear next to each other in the time series.

 

Here’s how you can use it in code.

Note that the library allows you to rescale the image with the image_size argument.

As an exercise, try using this method instead of the 1-D CNNs we used in the course and compare their performance

Enjoy!






Deep Learning and Artificial Intelligence Newsletter

Get discount coupons, free machine learning material, and new course announcements