Why do you need math for machine learning and deep learning?

July 9, 2021

In this article, I will demonstrate why math is necessary for machine learning, data science, deep learning, and AI.

Most of my students have already heard this from me countless times. College-level math is a prerequisite for nearly all of my courses already.

This article is a bit different.

Perhaps you may believe I am biased, because I’m the one teaching these courses which require all this math.

It would seem that I am just some crazy guy, making things extra hard for you because I like making things difficult.


You’ve heard it from me many times. Now you’ll hear it from others.

This article is a collection of resources where people other than myself explain the importance of math in ML.


Example #1

Let’s begin with one of the most famous professors in ML, Daphne Koller, who co-founded Coursera.

In this clip, Lex Fridman asks what advice she would have for those interested in beginning a journey into AI and machine learning.

One important thing she mentions, which I have seen time and time again in my own experience, is that those without typical prerequisite math backgrounds often make mistakes and do things that don’t make sense.

She’s being nice here, but I’ve met many of these folks who not only have no idea that what they are doing does not make sense, they also tend to be overly confident about it!

Then it becomes a burden for me, because I have to put in more effort explaining the basics to you just to convince you that you are wrong.

For that reason, I generally advise against hiring people for ML roles if they do not know basic math.


Example #2

I enjoyed this strongly worded Reddit comment.

Original post:

Top comment:


Example #3

Not exactly machine learning, but very related field: quant finance.

In fact, many students taking my courses dream about applying ML to finance.

Well, it’s going to be pretty hard if you can’t pass these interview questions.


Think about this logically: All quants who have a job can pass these kinds of interview questions. But you cannot. How well do you think you will do compared to them?


Example #4

Entrepreneur and angel investor Naval Ravikant explains why deriving (what we do in all of my in-depth machine learning courses) is much more important than memorizing on the Joe Rogan Experience.

Most beginner-level Udemy courses don’t derive anything – they just tell you random facts about ML algorithms and then jump straight to the usual 3 lines of scikit-learn code. Useless!

Link: https://www.youtube.com/watch?v=3qHkcs3kG44&t=5610s (Skips to 1:33:30 automatically)



Example #5

I found this in a thread about Lambda School (one of the many “developer bootcamps” in existence these days) getting sued for lying about its job placement rates and cutting down on its staff.

Two interesting comments here from people “in the know” about how bootcamps did not really help unless the student already had a math / science / STEM background. The first comment is striking because it is written by a former recruiter (who has the ability to see who does and doesn’t get the job).

That is to say, it is difficult to go from random guy off the street to professional software engineer from just a bootcamp alone (the implication here is that we can apply similar reasoning to online courses).

In this case, it wasn’t even that the math was being directly applied. A math / science background is important because it teaches you how to think properly. If 2 people can complete a bootcamp or online course, but only one has a STEM background and knows how to apply what they learned, that one will get the job, and the other will not.

Importantly, note that it’s not about the credentials, it’s purely about ability, as per the comments below.



Example #6

This is from a thread concerning Yann LeCun’s deep learning course at NYU. As usual, someone makes a comment that you don’t need such courses when you can just plug your data into Tensorflow like everyone else. Another, more experienced developer sets them straight.



Example #7

Hey, you guys have heard of Yann LeCun, right? Remember that guy? The guy who invented CNNs?

Let’s see what he has to say:

Math. Math. Oh and perhaps some more math.

That’s the gist of the advice to students interested in AI from Facebook’s Yann LeCun and Joaquin Quiñonero Candela

 who run the company’s Artificial Intelligence Lab and Applied Machine Learning group respectively.

Tech companies often advocate STEM (science, technology, engineering and math), but today’s tips are particularly pointed. The pair specifically note that students should eat their vegetables take Calc I, Calc II, Calc III, Linear Algebra, Probability and Statistics as early as possible.

From: https://techcrunch.com/2016/12/01/facebooks-advice-to-students-interested-in-artificial-intelligence/


This article will be updated over time. Keep checking back!

Go to comments

Time Series: How to convert AR(p) to VAR(1) and VAR(p) to VAR(1)

July 1, 2021

This is a very condensed post, mainly just so I could write down the equations I need for my Time Series Analysis course. 😉

However, it you find it useful – I am happy to hear that!

[Get 75% off the VIP version here]

Start with an AR(2):

$$ y_t = b + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \varepsilon_t $$


Suppose we create a vector containing both \( y_t \) and \( y_{t -1} \):

$$\begin{bmatrix} y_t \\ y_{t-1} \end{bmatrix}$$


We can write our AR(2) as follows:

$$\begin{bmatrix} y_t \\ y_{t-1} \end{bmatrix} = \begin{bmatrix} b \\ 0 \end{bmatrix} + \begin{bmatrix} \phi_1 & \phi_2 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} y_{t-1} \\ y_{t-2} \end{bmatrix} + \begin{bmatrix} \varepsilon_t \\ 0 \end{bmatrix}$$


Exercise: expand the above to see that you get back the original AR(2). Note that the 2nd line just ends up giving you \( y_{t-1} = y_{t-1} \).

The above is just a VAR(1)!

You can see this by letting:

$$ \textbf{z}_t = \begin{bmatrix} y_t \\ y_{t-1} \end{bmatrix}$$

$$ \textbf{b}’ = \begin{bmatrix} b \\ 0 \end{bmatrix} $$

$$ \boldsymbol{\Phi}’_1 = \begin{bmatrix} \phi_1 & \phi_2 \\ 1 & 0 \end{bmatrix} $$

$$ \boldsymbol{\eta}_t = \begin{bmatrix} \varepsilon_t \\ 0 \end{bmatrix}$$.

Then we get:

$$ \textbf{z}_t = \textbf{b}’ + \boldsymbol{\Phi}’_1\textbf{z}_{t-1} + \boldsymbol{\eta}_t$$

Which is a VAR(1).


Now let us try to do the same thing with an AR(3).

$$ y_t = b + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \phi_3 y_{t-3} + \varepsilon_t $$


We can write our AR(3) as follows:

$$\begin{bmatrix} y_t \\ y_{t-1} \\ y_{t-2} \end{bmatrix} = \begin{bmatrix} b \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} \phi_1 & \phi_2 & \phi_3 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} y_{t-1} \\ y_{t-2} \\ y_{t-3} \end{bmatrix} + \begin{bmatrix} \varepsilon_t \\ 0 \\ 0 \end{bmatrix}$$

Note that this is also a VAR(1).


Of course, we can just repeat the same pattern for AR(p).


The cool thing is, we can extend this to VAR(p) as well, to show that any VAR(p) can be expressed as a VAR(1).

Suppose we have a VAR(3).

$$ \textbf{y}_t = \textbf{b} + \boldsymbol{\Phi}_1 \textbf{y}_{t-1} + \boldsymbol{\Phi}_2 \textbf{y}_{t-2} + \boldsymbol{\Phi}_3 \textbf{y}_{t-3} + \boldsymbol{ \varepsilon }_t $$


Now suppose that we create a new vector by concatenating \( \textbf{y}_t \), \( \textbf{y}_{t-1} \), and \( \textbf{y}_{t-2} \). We get:

$$\begin{bmatrix} \textbf{y}_t \\ \textbf{y}_{t-1} \\ \textbf{y}_{t-2} \end{bmatrix} = \begin{bmatrix} \textbf{b} \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} \boldsymbol{\Phi}_1 & \boldsymbol{\Phi}_2 & \boldsymbol{\Phi}_3 \\ I & 0 & 0 \\ 0 & I & 0 \end{bmatrix} \begin{bmatrix} \textbf{y}_{t-1} \\ \textbf{y}_{t-2} \\ \textbf{y}_{t-3} \end{bmatrix} + \begin{bmatrix} \boldsymbol{\varepsilon_t} \\ 0 \\ 0 \end{bmatrix}$$

This is a VAR(1)!



Go to comments

Deep Learning and Artificial Intelligence Newsletter

Get discount coupons, free machine learning material, and new course announcements