August 29, 2020
Note: You can find the video lecture for this article at https://youtu.be/C-RZUWOBDpY
Summary of the video:
The following books can be used to study core computer science topics at the college / university level, to prepare yourself for machine learning, deep learning, artificial intelligence, and data science.
These are the books I recommend for building your own computer science degree. Remember! The goal is to do as many exercises as you can. It’s not to just watch 5 minute YouTube videos and then conclude “I understand everything! There’s no need for exercises!”
The purpose of actually learning this material is so you can “do” useful things.
It’s not about memorizing a set of rules about calculus, a set of rules about probability, etc.
Sidenote: if you prefer video courses over books, then check out the links at the bottom of this page (under “other resources”):
However, note that these courses do not come with sufficient exercises. For those, you are still strongly recommended to try the ones from these books, which should test you at the appropriate level.
These books cover common core courses that are relevant for many sub-fields of Computer Science and Engineering, including Machine Learning et. al., but also related fields such as operations research, statistics, quantitative finance, software engineering, digital communications, wireless communications, control systems (e.g. autopilot), robotics, and many more.
To recap, these are the courses and why you want to take them:
Nearly all machine learning algorithms boil down to optimization problems. What is optimization? Generally speaking, it’s when you have a function and you want to maximize or minimize that function.
If you’ve taken calculus, then you should recall that this is exactly what you learn how to do in calculus.
Therefore, calculus is an essential tool in artificial intelligence, data science, etc.
In machine learning and deep learning especially, we work with vectors, matrices, and higher-dimensional objects. This is the realm of linear algebra.
Luckily, you don’t have to go that far in linear algebra to get what you need for deep learning and AI.
For example, the concept of spans, subspaces, rank, etc. rarely show up in machine learning.
On the other hand, the basics, such as matrix and vector multiplication and eigenvalues and eigenvectors, show up often.
Probability is the language you must speak if you want to do machine learning and AI.
Recall above that machine learning often boils down to an optimization. What are we trying to optimize? Often, it’s an expected value. What is an expected value? Well, you have to learn probability to find that out.
For a time, “probabilistic graphical models” were the state of the art in artificial intelligence. Clearly, probability would be a prerequisite.
Probability shows up nearly everywhere in machine learning, from soft k-means clustering to hidden Markov models to artificial neural networks and beyond.
Side note: if you were thinking earlier, “who needs calculus when Tensorflow can do it for me!?”, think again. Calculus is a prerequisite to probability. So if you want to learn probability, you still need calculus anyway.
Obviously, at some point, you need to be able to actually write a computer program in order to use machine learning.
Nowadays, things can seem very simple when all you need is 3 lines of boilerplate code to use scikit-learn.
However, that’s not really what one should imagine when they think of “learning machine learning”.
Check any college-level machine learning course (not that weak sauce being sold by marketers online) to confirm what I am saying.
As the great physicist Richard Feynman once said, “What I cannot create, I do not understand”.
In order to best understand a machine learning algorithm, you should implement it.
No, you are not “reinventing the wheel”. This is called “learning”.
I would posit that if you can’t implement basic algorithms like k-means clustering, logistic regression, k-nearest neighbor, and naive Bayes, you do not understand those algorithms.
So why do I suggest Java over something like Python, which has easily become the most popular language for doing data science.
The problem with Python is that it’s too high level. It doesn’t make you think about the program you are writing at the algorithmic level.
You should understand the difference between an efficient and an inefficient algorithm. (No, that doesn’t mean memorizing facts like “Python list comprehensions are better than for loops”).
In fact you should recognize that list comprehensions have the exact same time complexity as your for loop.
Java, being slightly lower level, forces you to think algorithmically.
That brings us to the final topic.
Algorithms and Data Structures
In order to really understand algorithms, you should study… algorithms.
There are many famous algorithms contained in the book I’ve suggested below.
Realistically, you are not going to use these in your day to day work (a very common complaint from software developers seeking employment).
However, that’s not really the point.
The point is exercising your brain and learning how to think in certain ways that help you write better code.
Also, you should understand the pros and cons of basic data structures such as lists, sets, dictionaries, and trees.
If you’re coding up some algorithm, why might a set be better than a list? Algorithms tell you why.
One major data structure you might want to learn about is graphs (along with their associated algorithms). Graph neural networks seem to be picking up steam, and they are being used for all kinds of interesting problems like social network analysis, chemistry, and more.
To summarize: the core courses I would consider essential for building your own Computer Science degree in preparation for machine learning, deep learning, data science, and artificial intelligence are calculus, linear algebra, probability, programming, and algorithms.
Don’t just watch a bunch of videos on YouTube or Khan Academy and then proclaim you understand the subject. The reason I’ve suggested books is because they contain exercises / homework problems. These are what you must be able to do in order to claim that you understand something. It’s not about “absorbing information”, it’s about “producing useful output”.
Common question: What about C++? Yes, C++ is excellent! Ideally, you will learn both C++ and Java, but obviously, these are not hard prerequisites for machine learning or data science.
Calculus: Early Transcendentals by James Stewart
Introduction to Linear Algebra by Gilbert Strang
Introduction to Probability by Bertsekas and Tsitsiklis
Big Java by Cay Horstmann
Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein
Disclaimer: this post contains Amazon affiliate links.