Should you study the theory behind machine learning?

In this post, I want to discuss why you should not study the theory behind machine learning.

This may surprise some of you, since my courses can appear to be more “theoretical” than other ML courses on popular websites such as Udemy.

However, that is not the kind of “theory” I am talking about.


Most popular courses in ML don’t look at any math at all.

They are popular precisely for this reason: lack of math makes them accessible to the average Joe.

This does a disservice to you students, because you end up not having any solid understanding about how the algorithm works.

You may end up:

  • doing things that don’t make sense, due to that lack of understanding.
  • only being able to copy code from others, but not write any code yourself.
  • not knowing how to apply algorithms to new kinds of data, without someone showing you how first.

For more discussion on that, see my post: “Why do you need math for machine learning and deep learning?

But let’s make this clear: math != theory.


When we look at math in my courses, we only look at the math needed to derive the algorithm and understand how it works at an intuitive level.

Yes, believe it or not, we are using math to improve our intuition.

This is despite what many beginners might think. When they see math, they automatically assume “math” = “not intuitive”, and that “intuitive” = “pictures, animations, and purposely avoiding math”.

That’s OK if you want to read a news article in the NY Times about ML, but not when you want to be a practitioner of ML.

Those are 2 different levels of “intuition” (layman vs. practitioner).

To see an extreme example of this, one need not look any further than Albert Einstein. Einstein was great at communicating his ideas to the public. Everyone can easily understand the layman interpretation of general relativity (mass bends space and time). But this is not the same as being a practitioner of relativistic physics.

Everyone has seen this picture and understands what it means at a high level. But does that mean you are a physicist or that you can “do physics”?

Anyway, that was just an aside so we don’t confuse “math used for intuition” and “layman intuition” and “theory”. These are 3 separate things. Just because you’re looking at some math, does not automatically imply you’re looking at “theory”.



What do we mean by “theory”?

Here’s a simple question to consider. Why does gradient descent work?

Despite the fact that we have used gradient descent in many of my courses, and derived the gradient descent update rules for neural networks, SVMs, and other models, we have never discussed why it works.

And that’s OK!

The “mathematical intuition” is enough.

But let’s get back to the question of this article: Why is the Lazy Programmer saying we should not study theory?


Well, this is the kind of “theory” that gets so deep, it:

  • Does not produce any near-term gains in your work
  • Requires a very high level of math ability (e.g. real analysis, optimization, dynamical systems)
  • Is on the cutting-edge of understanding, and thus very difficult, likely to be disputed or even superseded in the near future


Case in point: although we have been using gradient descent for years in my courses (and decades before that in general), our understanding is still not yet complete.

Here’s an article that just came out this year on gradient descent (August 2021): “Computer Scientists Discover Limits of Major Research Algorithm“.

Here’s a direct link to the corresponding paper, called “The Complexity of Gradient Descent: CLS = PPAD ∩ PLS”:

There will be more papers on these “theory” topics in the years to come.


My advice is not to go down this path, unless you really enjoy it, you are doing graduate research (e.g. PhD-level), you don’t mind if ideas you spent years and years working on might be proven incorrect, and you have a very high level of math ability in subjects like real analysis, optimization, and dynamical systems.