Transformers have changed deep learning immensely.
They’ve massively improved the state-of-the-art in all NLP tasks, like sentiment analysis, machine translation, question-answering, etc.
They’re even expanding their influence into other fields, such as computational biology and computer vision. DeepMind’s AlphaFold 2 has been said to “solve” a longstanding problem in molecular biology, known as protein structure prediction. Recently, DALL-E 2 demonstrated the ability to generate amazing art and photo-realistic images based only on simple text prompts. Imagine that – creating a realistic image out of just an idea!
Just within the past week, DeepMind introduced “Gato“, which is what they call a “generalist agent”, an AI that can do multiple things, like chat (i.e. do NLP!), play Atari games, caption images (i.e. computer vision!), manipulate a real, physical robot arm to stack blocks, and more!
Gato does all this by converting all the usual inputs from other domains into a sequence of tokens, so that they can be processed just like how we do in NLP. This is a great example of my oft-repeated rule, “all data is the same” (and also, another great reason to learn NLP since it would be a prerequisite to understanding this).
The course is split into 3 major parts:
Using Transformers (Beginner)
Fine-Tuning Transformers (Intermediate)
Transformers In-Depth (Expert – VIP only)
In part 1, you will learn how to use transformers which were trained for you. This costs millions of dollars to do, so it’s not something you want to try by yourself!
We’ll see how these prebuilt models can already be used for a wide array of tasks, including:
text classification (e.g. spam detection, sentiment analysis, document categorization)
named entity recognition
generating (believable) text
masked language modeling (article spinning)
This is already very practical.
If you need to do sentiment analysis, document categorization, entity recognition, translation, summarization, etc. on documents at your workplace or for your clients – you already have the most powerful state-of-the-art models at your fingertips with very few lines of code.
One of the most amazing applications is “zero-shot classification”, where you will observe that a pretrained model can categorize your documents, even without any training at all.
In part 2, you will learn how to improve the performance of transformers on your own custom datasets. By using “transfer learning”, you can leverage the millions of dollars of training that have already gone into making transformers work very well.
You’ll see that you can fine-tune a transformer for many of the above tasks with relatively little work (and little cost).
In part 3 (the VIP sections), you will learn how transformers really work. The previous sections are nice, but a little too nice. Libraries are OK for people who just want to get the job done, but they don’t work if you want to do anything new or interesting.
Let’s be clear: this is very practical.
How practical, you might ask?
Well, this is where the big bucks are.
Those who have a deep understanding of these models and can do things no one has ever done before are in a position to command higher salaries and prestigious titles. Machine learning is a competitive field, and a deep understanding of how things work can be the edge you need to come out on top.
We’ll also look at how to implement transformers from scratch.
As the great Richard Feynman once said, “what I cannot create, I do not understand”.
As usual, I wanted to get this course into your hands as early as possible! There are a few sections and lectures still in the works, including (but not limited to): fine-tuning for question-answering, more theory about transformers, and implementing transformers from scratch. As usual, I will update this post as new lectures are released.
Everyone makes mistakes (including me)! Because this is such a large course, if I forgot anything (e.g. a Github link), just email me and let me know.
Due to the way Udemy now works, if you purchase the course on deeplearningcourses.com, I cannot give you access to the Udemy version. It hasn’t always been this way, and Udemy has tended to make changes over the years that negatively impact both me and you, unfortunately.
If you don’t know how “VIP courses” work, check out my post on that here. Short version: deeplearningcourses.com always houses all the content (both VIP and non-VIP). Udemy will house all the content initially, but the VIP content is removed later on.
So what are you waiting for? Get the VIP version of Transformers for Natural Language Processing NOW:
I just got an excellent question today about my Financial Engineering course, which allowed me to put into words many thoughts and ideas I’d been pondering recently.
Through this post, I hope to get all these ideas into one place for future reference.
The question was: “How practical is this course? I’ve skimmed through several top ratings on Udemy but have yet seen one boasting how much money the student made after taking it”
Will you become a millionaire after taking my financial engineering course?
Let’s answer this question by starting with my own definition of “practical”, and then subsequently addressing the student’s definition of practical which appears to mean “making money”.
In my view, “practical” simply means you’re applying knowledge to a real-world dataset.
For example, my Recommender Systems course is practical because you apply the algorithms we learn to real-world ratings datasets.
My Bayesian Machine Learning: A/B Testing course is practical because you can apply the algorithms to any business scenario where you have to decide between multiple choices based on some numerical objective (e.g. clicks, page view time, etc.)
In the same way, the Financial Engineering course is extremely practical, because the whole course is about applying algorithms to real-world financial datasets. The application is a real-world problem.
This is unlike, say, reading Pattern Recognition and Machine Learning by Bishop, which is all about the algorithms and not the fields of application. The implication is that, you know what you’re doing and can take those algorithms and apply them to your own data.
On one hand, that’s powerful – because you can apply these algorithms to any field (like biology, astronomy, chemistry, robotics, control systems, and yes, finance), but at the same time, you have to be pretty smart to do it. The average Udemy student would struggle.
In that sense, this is the most practical you can get. Everything you learn in this course is being directly applied to real-world data in a specific field (finance).
You can grab one of the algorithms taught in the course and start using it today on your own investing account. There’s a lecture about that in the Summary section called “Applying This Course” for those who need extra help.
Importantly, do keep in mind that while I can teach you what to do, I can’t actually make you do it.
In A/B Testing, I can show you the code, but the rest is up to the student to make it practical, by actually getting a job where they get to do that in a production system, or by inserting the code into their own production website so they can feed it to live users.
Funny enough, A/B Testing isn’t even about finance nor money. But will you make money with those techniques? YES. Amazon, Facebook, Netflix, etc. are already using the same techniques with great success.
The only reason some students might say it’s not practical is because they are too lazy/incompetent to get off their butts and actually do it!
Same here. I can teach the algorithms, but I can’t go into your brokerage account and run them for you.
Now let’s consider the definition of “practical” in the sense of being guaranteed to “make money”.
This is a common concern among students who are new to finance and don’t really know yet what to expect.
Let’s suppose I could guarantee that by taking this course, you could make money.
Consider some obvious questions:
If this were true, anyone (including myself) would just scale it up and become extremely wealthy without doing any work. Clearly, no such thing exists (that is public and that we know of).
If this were true, why would anyone work? Financial engineering graduates wouldn’t bother to apply for jobs, they would just run algorithms all day. They would teach their friends / family to do the same. No one would ever bother to get a job.
If this were true, why would hedge funds bother to hire employees? After inventing an algorithm, they could just run it forever. What’s the point of wasting money to hire humans? What would they even do?
If this were true, why would hedge funds bother to hire PhDs and why would people bother to get PhDs? Imagine you could increase your investments infinitely from a 20 hour online course. What kind of insane person would work for 4-7 years just to get a pittance and a paper that says “PhD”?
On the contrary, the reality is this.
The financial sector does hire very smart people and it is well-known that they have poor work-life balance.
They must be working hard. What are they doing?
Why can’t they just learn an algorithm and sit back and relax?
Instead, let’s expand the definition of “practical”.
Originally, this question was asked in a comment on a video I made about predicting stock prices with LSTMs. Is this video practical? YES. If you didn’t know this, you could have spent weeks / months / maybe even your whole life trying to “predict stock prices with LSTMs”, with zero clue that it didn’t actually work. That would be sad.
Spending weeks or months doing something that doesn’t even make sense is what I would consider to be very impractical. And hence, learning how to avoid it would be very practical.
A lot of the course is about how to properly model and analyze. How to stay away from stupidity.
One of the major themes of the course is that “Santa Claus doesn’t exist”.
A naive person might think “there must be some way to predict the stock price, you are just not telling me about the most advanced algos!”
But the “Santa Claus doesn’t exist” moment is when we prove mathematically why certain predictions are impossible.
This is practical because it saves you from attempting something which doesn’t make any logical sense.
Obviously, it doesn’t fulfill the childhood dream of meeting Santa (predicting an unpredictable time series), but I would posit that trying to meet Santa is what is really impractical.
What is actually practical is learning how to determine whether you can or cannot predict a time series (at which point, you can then make your predictions as normal).
I’ll give you another example lesson.
If you used the simplest trading strategy from this course, you could have beat the market from 2000 – 2018.
Using the same algorithm, you would have underperformed the market from 2018 to now.
The practical lesson there is that “past performance doesn’t indicate future performance”.
This is how you can have a “practical” lesson, which doesn’t automatically imply “guaranteed rate of return” (which is impossible).
Addendum: actually, it is possible to guarantee a rate of return. Just purchase a fixed-income security like a CD (certificate of deposit) at your bank. The downside is that the rate of return is very low. This is yet another practical lesson from the course – the tradeoff between risk and reward and how real-world entities automatically adjusts themselves to match present conditions. In other words, you’ll never find a zero-risk asset that guarantees 1000x returns. Why is this practical? Again, you want to avoid wasting time searching for that which does not exist.
Check out the video version of this post on YouTube:
In this post, I’m going to write about one of my all-time favorite subjects: the wrong way to predict stock and cryptocurrency prices.
Despite the fact that I’ve discussed this many times before, I’m very excited about this one.
It’s not everyday I get to critique a published book by a big name like Springer.
The book I’m referring to is called “Machine Learning in Finance: From Theory to Practice”, by Matthew Dixon, Igor Halperin, and Paul Bilokon.
Now you might think I’m beating a dead horse with this video, which is kind of true.
I’ve already spoken at length about the many mistakes people make when trying to predict stock prices.
But there are a few key differences with this video.
Firstly, in past videos, I’ve mentioned that it is typically bloggers and marketers who put out this bad content.
This time, it’s not a blogger or marketer, but an Assistant Professor of Applied Math at the Illinois Institute of Technology.
Secondly, while I’ve spoken about what the mistakes are, I’ve never done a case study where I’ve broken down actual code that makes these mistakes.
This is the first.
Thirdly, in my opinion, this is the most important topic to cover for beginners to finance, because it’s always the first thing people try to do. They want to predict future prices so they know what to invest in today.
If you take my course on Financial Engineering, you’ll learn that this is completely untrue. Price prediction barely scratches the surface of true finance.
Note that it’s a copy of the code provided with the textbook, with added code for my own experiments (computing the naive forecast and the corresponding train / test MSE).
I also removed code for a different type of RNN called the “alpha RNN”, which uses an old version of Keras. Removing this code doesn’t make a difference in our results because this model didn’t perform well.
The mistakes I’ll cover in this post are as follows.
1) They only standardize the price time series, which does nothing about the problem of extrapolation.
2) They never check whether their model can beat the naive forecast. Spoiler alert. I checked, and it doesn’t. The models they built are worse than useless.
3) Misleading train-test split.
So let’s talk about mistake #1, which is why standardizing a price time series does not work.
The problem with prices is that they are ever increasing. This wasn’t the case for the time period used in the textbook, but it is the case in general.
Why is this an issue?
The train set is always in the past, and the test set is always in the future.
Therefore, the values in the test set in general will be higher than the values in the train set.
If you build an autoregressive model based on this data, your model will have to extrapolate to a domain never seen before in the train set.
This is not good, because machine learning models suck at extrapolation.
How they extrapolate has more to do with the model itself, than it has to do with the data.