Neural Ordinary Differential Equations

Very interesting paper that got the Best Paper award at NIPS 2018.

“Neural Ordinary Differential Equations” by Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud.

Comes out of Geoffrey Hinton’s Vector Institute in Toronto, Canada (although he is not an author on the paper).

For those of you who have ever programmed simulations of systems of differential equations, the motivation behind this should be quite intuitive.

Recall that a derivative is the same thing as the slope of a tangent line, and can be approximated by the usual “rise over run” formula for small time steps \( \Delta t \).

$$ \frac{dh}{dt} \approx \frac{h(t + \Delta t) – h(t)}{\Delta t}$$

Here’s a picture of that if you forgot what it looks like:


Normally, the derivative is known to be some function \( \frac{dh}{dt} = f(h, t) \).

Your job in writing a simulation is to find out how \( h(t) \) evolves over time.

Here’s a picture of how that works (using different symbols):


Since our job is to find the next value of \( h(t) \), we can rearrange the above to get:

$$ h(t + \Delta t) = h(t) + f(h(t), t) \Delta t $$

Typically the time step is just \( 1 \), so we can rewrite the above as:

$$ h_{t+1} = h_t + f(h_t, t) $$

Researchers noticed that this looks a lot like the residual network layer that is often used in deep learning!

In a residual network layer, \( h_t \) represents the input value, \( h_{t+1} \) represents the output value, and \( f(h_t, t) \) represents the residual.

Here’s a picture of that (using different symbols):



At this point, the question to ask is, if a residual network layer is just a difference equation that approximates a differential equation, can there be a neural network layer that is an actual differential equation?

How would backpropagation be done?

This paper goes over all that and more.

Read the paper here!