April 15, 2020

In this short article, I am going to give a short description of what “in-depth” means, as it pertains to courses on machine learning and data science.

To do this, I will provide several examples.

Example 1) Typical Udemy Course

Example 2) Typical College (“Academic”) Course

Example 3) My Courses

### Example 1: Typical Udemy Course

A typical Udemy course on machine learning spends ~20 minutes talking about how “cool” and “powerful” an algorithm is (useless).

Then, the instructor spends ~20 minutes talking about “intuition” (maybe a few pictures here and there). Importantly, **no math is used, no pseudocode is presented.**

Then, the instructor spends ~30-45 minutes explaining 3 lines of code:

model = MyModel() model.fit(X, Y) model.predict(X)

**Why is this bad?**

- You still have no idea how the model really works. Your “intuition” is often wrong.
- Sometimes, even the instructor himself is wrong! (That’s because most instructors teaching these courses are marketers, not practicing data scientists.) They cannot teach the math or algorithms parts, because they don’t understand it themselves.
- Look at the code above. Is it even related to any algorithm? This code has no relation to understanding how the model works. As such, it is useless for learning “machine learning”.
- You can’t implement the model. As I always say, “If you can’t implement it, then you don’t understand it”. Or, as the famous physicist Richard Feynman once said, “What I cannot create, I do not understand”.
- You just paid someone to learn just 3 lines of code and some flakey “intuition”. You could have done that for FREE! Check the Scikit-Learn documentation.

### Example 2: Typical College / University Course

If I only had a choice between #1 and #2, I would usually choose #2. Despite the problems listed below, a college course on ML is far superior to the typical “for dummies” Udemy course.

The typical college course proceeds as follows:

Show all the math and pseudocode (usually), but go through them at **super speed!** Usually, each slide contains like 3-5 equations (way too much packed onto one slide).

Good Example:

Yikes!

Luckily, college ML courses **do not use Scikit-Learn** (unlike typical Udemy courses). Students do get to practice implementing algorithms (good).

Unfortunately, because the class moves so quickly, they will not get to implement everything. Usually, each lecture will cover 1 or 2 algorithms (suppose there’s 12 lectures per semester, so you will learn ~12-20 different algorithms in the whole course – too many too fast in my opinion).

But, since you can’t have 20 homework assignments (don’t get any crazy ideas professors!), you will likely only get to implement a few (e.g. logistic regression, k-means clustering, neural networks).

The **end result**: the course goes so fast, that students end up learning very little. They have *some idea* how each algorithm works, but not **in-depth** (even though all the equations were shown, it was simply too fast).

**Another problem: **Because implementation is only a homework assignment, you never get to see a reference implementation. As you probably know, not everyone is an A+ student. Why?

Well, you lose marks when you get things wrong. Students coding these assignments, if they get a B or a C, it means they got some things wrong. However, they passed the course. But they never learned to code the algorithm correctly.

So even if you “pass the course”, you are still missing knowledge!

### Example 3: My Courses

In my courses, I aim to fix these shortcomings of both the typical Udemy course and the typical academic course.

How? By going **in-depth** into each subject.

Specifically:

- Don’t cover every machine learning algorithm under the sun. Focus on one or a few algorithms per course.
- For each algorithm, derive any equations used
**from first principles**(typically, that means calculus, linear algebra, and probability, unless the course had other prerequisites) - For each algorithm, derive the “algorithmic” part / pseudocode from first principles
- For each algorithm, implement it in code from scratch. This requires
*true understanding*. It forces you to understand the equations and not skip over the details. It forces you to think about how the algorithm works, rather than just using flawed “intuition”.

**Advantage #1:** Anyone should be able to fully understand the material as long as they meet the prerequisites (usually basic calculus, linear algebra, probability, and programming). This is because everything is built up logically from these basic first principles.

**Advantage #2: **Because everything is **derived** from first principles, it’s unlike a typical academic course, where too many equations are shown all at once, and it’s not clear how the flow of logic proceeds unless you go through them slowly by yourself. Instead, each equation is shown one-by-one.

If you don’t understand it, then simply pause the video, and review it again, or use the Q&A to clarify your misunderstanding.

In a college class, things go by so fast you rarely have the opportunity to even think of the right question to ask before the lecturer moves on to the next subject.

**Advantage #3: **You get to implement everything from scratch. This means you have true understanding. Unlike the typical academic course, where implementation is just homework, I am going to show you a reference example of the code. So even if you get it wrong, you still have a chance to fix your mistakes.

### So what does “in-depth” mean?

It means:

- Deriving all the math (unlike Udemy courses)
- Deriving all the algorithms and pseudocode (unlike Udemy courses)
- Showing the implementation of the code (unlike academic courses)

### Common Beginner Mistake

Many beginners confuse the word “**depth**” with “**breadth**“.

For example, a beginner may come across a 20-40h course and proclaim, “This course covers so many topics! It is very in-depth!”

In actuality, it’s the opposite of deep. It’s shallow.

Spending 10 hours on basic Python, 1 hour on linear regression, 1 hour on logistic regression, etc. etc. means you only learned very basic stuff.

You didn’t learn anything “deep”. You only have superficial understanding.

Beginners often think that to be “in-depth”, a course must cover *many different algorithms*. That is the opposite of depth. That is called **breadth**.

Depth means going over the **details** of an algorithm. Specifically:

- Deriving each equation being used and the intuition behind them.
- Deriving any algorithms that must be put into code.
- Walking through the pseudocode.
- Implementing the actual code (pseudocode is nice, but it can’t run on your computer).

This is called “depth” because it goes over every nook and cranny of an algorithm or model, leaving no stone unturned.

“Breadth” is the opposite. This means covering many different topics in a **shallow fashion** (the word “shallow” is the opposite of “deep”).

Typically that involves what I described in Example 1 (very brief explanation, a few lines of code to call an API).

Breadth is useless. I can obtain breadth for free by reading the Scikit-Learn documentation.

Go to comments