# Deep Learning: The Swish Activation Function

October 18, 2017

The Google Brain team has just released a new paper (https://arxiv.org/abs/1710.05941) that demonstrates the superiority of a new activation function called Swish on a number of different neural network architectures.

This is interesting because people often ask me, “which activation function should I use?”

These days, it is common to just use the ReLU by default.

To refresh your memory, the ReLU looks like this:

And it is defined by the equation:

$$f(x) = max(0, x)$$

One major problem with the ReLU is that its derivative is 0 for half the values of the input $$x$$. Because we use “gradient descent” as our parameter update algorithm, if the gradient is 0 for a parameter, then that parameter will not be updated!

In other words, when I do:

$$\theta = \theta – \alpha \frac{\partial J}{\partial \theta }$$

And:

$$\frac{\partial J}{\partial \theta } = 0$$

Then my update is just:

$$\theta = \theta$$

Which just assigns the parameter back to itself.

This leads to the problem of “dead neurons”. Experiments have shown that neural networks trained with ReLUs can have up to 40% dead neurons!

There have been some proposed alternatives to this, such as the leaky ReLU, the ELU, and the SELU.

Interestingly, none of these have seemed to catch on and it’s still ReLU by default.

So how does the Swish activation function work?

The function itself is very simple:

$$f(x) = x \sigma(x)$$

Where $$\sigma(x)$$ is the usual sigmoid activation function.

$$\sigma(x) = (1 + e^{-x})^{-1}$$

It looks like this:

What’s interesting about this is that unlike every other activation function, it is not monotonically increasing. Does it matter? It seems the answer is no!

The derivative looks like this:

One interesting thing we can do is re-parameterize the Swish, in order to “stretch out” the sigmoid:

$$f(x) = 2x \sigma(\beta x)$$

We can see that, if $$\beta = 0$$, then we get the identity activation $$f(x) = x$$, and if $$\beta \rightarrow \infty$$ then the sigmoid converges to the unit step and multiplying that by $$x$$ gives us back $$f(x) = 2 max(0, x)$$ which is just the ReLU multiplied by a constant factor.

So including $$\beta$$ is a way for us to nonlinearly interpolate between identity and ReLU.

The title of the paper is “A Self-Gated Activation Function”, which might make you wonder, “Why is it self-gated?”

This should remind you of the LSTM, where we have “gates” in the form of sigmoids that control how much of a vector gets passed on to the next stage, by multiplying it between the output of the sigmoid, which is a number between 0 and 1.

So “self-gated” means that the gate is just the sigmoid of the activation itself.

Gate: $$\sigma(x)$$

Value to pass through: $$x$$

But that’s enough theory. For most of us, we want to know: “Does it work?”

And more practically, “Can I just use this by default instead of the ReLU?”

The best thing to do is just to try it for yourself and see how robust it is to different settings of hyperparameters (learning rate, architecture, etc.) but let’s look at some results so we can be confident when it comes to using Swish:

Click on the image to see it in the original size.

To compare Swish with baseline, a statistical test called the one-sided paired sign test was used.

Conclusion: Try Swish for yourself!

# Python 2-to-3 Tips

October 17, 2017

This is a short post to help those of you who need help translating code from Python 2 to Python 3.

Python 2 is the most popular Python version (at least at this time and certainly at the time my courses were created), hence why it was used.

It comes with Mac OS and Ubuntu pre-installed so when you type in “python” into your command line, you get Python 2.

This list is not exhaustive. It shows only code that appears commonly in my machine learning scripts, to assist the students taking my machine learning courses (https://deeplearningcourses.com).

Integer Division

OLD:

a / b

NEW:

a // b

For Loops

OLD:

for i in xrange

NEW:

for i in range

Printing

OLD:

print "hello world"

NEW:

print("hello world")

# Deep Learning and Machine Learning FALL SALE 90% OFF

October 7, 2017

“Hey Lazy Programmer, when is your next course coming out?”

I’ve been really busy adding tons of free updates to my existing courses! You can scroll down to the very bottom to see what they are. But in the mean time we are going to do another HUGE sale. ALL courses on Udemy are now $12. Take this opportunity to grab as many courses as you can because you never know when the next sale is going to be! As usual, I’m providing$12 coupons for all my courses in the links below. Please use these links and share them with your friends!

You can also just type in the coupon code “OCT123” (except Deep Learning pt 1 because I messed it up =), for that use “OCT123A”).

The promo goes until October 10. Don’t wait!

At the end of this post, I’m going to provide you with some additional links to get machine learning prerequisites (calculus, linear algebra, Python, etc…) for $12 too! But that’s not all… I’m the Lazy Programmer, not just the Lazy Data Scientist – I’ve got$12 coupons for iOS development, Android development, Ruby on Rails, Python, Big Data / Hadoop / Spark, React.js, Angular, and MORE. All important skillsets on ANY engineering team. Got any friends or coworkers in mobile / backend / big data development? Let them know!

If you don’t know what order to take the courses in, please check here: https://deeplearningcourses.com/course_order
Here are the links for my courses:

Deep Learning Prerequisites: Linear Regression in Python
https://www.udemy.com/data-science-linear-regression-in-python/?couponCode=OCT123

Deep Learning Prerequisites: Logistic Regression in Python
https://www.udemy.com/data-science-logistic-regression-in-python/?couponCode=OCT123

Deep Learning in Python
https://www.udemy.com/data-science-deep-learning-in-python/?couponCode=OCT123A

Practical Deep Learning in Theano and TensorFlow
https://www.udemy.com/data-science-deep-learning-in-theano-tensorflow/?couponCode=OCT123

Deep Learning: Convolutional Neural Networks in Python
https://www.udemy.com/deep-learning-convolutional-neural-networks-theano-tensorflow/?couponCode=OCT123

Unsupervised Deep Learning in Python
https://www.udemy.com/unsupervised-deep-learning-in-python/?couponCode=OCT123

Deep Learning: Recurrent Neural Networks in Python
https://www.udemy.com/deep-learning-recurrent-neural-networks-in-python/?couponCode=OCT123

Advanced Natural Language Processing: Deep Learning in Python
https://www.udemy.com/natural-language-processing-with-deep-learning-in-python/?couponCode=OCT123

Advanced AI: Deep Reinforcement Learning in Python
https://www.udemy.com/deep-reinforcement-learning-in-python/?couponCode=OCT123

Deep Learning: GANs and Variational Autoencoders
https://www.udemy.com/deep-learning-gans-and-variational-autoencoders/?couponCode=OCT123

Easy Natural Language Processing in Python
https://www.udemy.com/data-science-natural-language-processing-in-python/?couponCode=OCT123

Cluster Analysis and Unsupervised Machine Learning in Python
https://www.udemy.com/cluster-analysis-unsupervised-machine-learning-python/?couponCode=OCT123

Unsupervised Machine Learning: Hidden Markov Models in Python
https://www.udemy.com/unsupervised-machine-learning-hidden-markov-models-in-python/?couponCode=OCT123

Data Science: Supervised Machine Learning in Python
https://www.udemy.com/data-science-supervised-machine-learning-in-python/?couponCode=OCT123

Bayesian Machine Learning in Python: A/B Testing
https://www.udemy.com/bayesian-machine-learning-in-python-ab-testing/?couponCode=OCT123

Ensemble Machine Learning in Python: Random Forest and AdaBoost

Artificial Intelligence: Reinforcement Learning in Python
https://www.udemy.com/artificial-intelligence-reinforcement-learning-in-python/?couponCode=OCT123

SQL for Newbs and Marketers
https://www.udemy.com/sql-for-marketers-data-analytics-data-science-big-data/?couponCode=OCT123

PREREQUISITE COURSE COUPONS

And just as important, $12 coupons for some helpful prerequisite courses. You NEED to know this stuff before you study machine learning: General (site-wide): http://bit.ly/2oCY14Z Python http://bit.ly/2pbXxXz Calc 1 http://bit.ly/2okPUib Calc 2 http://bit.ly/2oXnhpX Calc 3 http://bit.ly/2pVU0gQ Linalg 1 http://bit.ly/2oBBir1 Linalg 2 http://bit.ly/2q5SGEE Probability (option 1) http://bit.ly/2prFQ7o Probability (option 2) http://bit.ly/2p8kcC0 Probability (option 3) http://bit.ly/2oXa2pb Probability (option 4) http://bit.ly/2oXbZSK OTHER UDEMY COURSE COUPONS As you know, I’m the “Lazy Programmer”, not just the “Lazy Data Scientist” – I love all kinds of programming! And I’ve got sales for everything: iOS courses: https://lazyprogrammer.me/ios Android courses: https://lazyprogrammer.me/android Ruby on Rails courses: https://lazyprogrammer.me/ruby-on-rails Python courses: https://lazyprogrammer.me/python Big Data (Spark + Hadoop) courses: https://lazyprogrammer.me/big-data-hadoop-spark-sql Javascript, ReactJS, AngularJS courses: https://lazyprogrammer.me/js EVEN MORE COOL STUFF Into Yoga in your spare time? Photography? Painting? There are courses, and I’ve got coupons! If you find a course on Udemy that you’d like a coupon for, just let me know and I’ll hook you up! Remember, these links will self-destruct on October 10 (5 days). Act NOW! COURSE UPDATES Recent updates to existing courses because my students are awesome and deserve free stuff: Deep Learning pt 2 (Theano / Tensorflow): * Brand new section on batch normalization (7 new lectures!) Deep Reinforcement Learning (Advanced AI): * Continuous Mountain Car in Theano and Tensorflow with Policy Gradient Cluster Analysis / Unsupervised ML: * Simulating Biological Evolution + Applying Clustering * Applying Clustering to Donald Trump + Hillary Clinton Tweets from 2016 Election Numpy Stack: * Improved quality / resolution of all slides * Pushed code up so video player controls won’t block it Unsupervised Deep Learning * Visualizing t-SNE Go to comments # Goodbye Theano September 29, 2017 It’s a sad day for us Theano fans. The developers of Theano have announced that they are halting development following the 1.0 release. Here’s the original post: https://groups.google.com/forum/#!topic/theano-users/7Poq8BZutbY Dear users and developers, After almost ten years of development, we have the regret to announce that we will put an end to our Theano development after the 1.0 release, which is due in the next few weeks. We will continue minimal maintenance to keep it working for one year, but we will stop actively implementing new features. Theano will continue to be available afterwards, as per our engagement towards open source software, but MILA does not commit to spend time on maintenance or support after that time frame. The software ecosystem supporting deep learning research has been evolving quickly, and has now reached a healthy state: open-source software is the norm; a variety of frameworks are available, satisfying needs spanning from exploring novel ideas to deploying them into production; and strong industrial players are backing different software stacks in a stimulating competition. We are proud that most of the innovations Theano introduced across the years have now been adopted and perfected by other frameworks. Being able to express models as mathematical expressions, rewriting computation graphs for better performance and memory usage, transparent execution on GPU, higher-order automatic differentiation, for instance, have all become mainstream ideas. In that context, we came to the conclusion that supporting Theano is no longer the best way we can enable the emergence and application of novel research ideas. Even with the increasing support of external contributions from industry and academia, maintaining an older code base and keeping up with competitors has come in the way of innovation. MILA is still committed to supporting researchers and enabling the implementation and exploration of innovative (and sometimes wild) research ideas, and we will keep working towards this goal through other means, and making significant open source contributions to other projects. Thanks to all of you who for helping develop Theano, and making it better by contributing bug reports, profiles, use cases, documentation, and support. — Yoshua Bengio, Head of MILA Go to comments # Deep Learning and Machine Learning September 2017 Coupons September 13, 2017 Since I am still busy hacking away at my next course, we are going to do another HUGE sale. ALL courses on Udemy are now$12. Take this opportunity to grab as many courses as you can because you never know when the next sale is going to be!

As usual, I’m providing $12 coupons for all my courses in the links below. Please use these links and share them with your friends! You can also just type in the coupon code “SEP123”. The promo goes until September 20. Don’t wait! At the end of this post, I’m going to provide you with some additional links to get machine learning prerequisites (calculus, linear algebra, Python, etc…) for$12 too!

But that’s not all… I’m the Lazy Programmer, not just the Lazy Data Scientist – I’ve got $12 coupons for iOS development, Android development, Ruby on Rails, Python, Big Data / Hadoop / Spark, React.js, Angular, and MORE. All important skillsets on ANY engineering team. Got any friends or coworkers in mobile / backend / big data development? Let them know! If you don’t know what order to take the courses in, please check here: https://deeplearningcourses.com/course_order Here are the links for my courses: Deep Learning Prerequisites: Linear Regression in Python https://www.udemy.com/data-science-linear-regression-in-python/?couponCode=SEP123 Deep Learning Prerequisites: Logistic Regression in Python https://www.udemy.com/data-science-logistic-regression-in-python/?couponCode=SEP123 Deep Learning in Python https://www.udemy.com/data-science-deep-learning-in-python/?couponCode=SEP123 Practical Deep Learning in Theano and TensorFlow https://www.udemy.com/data-science-deep-learning-in-theano-tensorflow/?couponCode=SEP123 Deep Learning: Convolutional Neural Networks in Python https://www.udemy.com/deep-learning-convolutional-neural-networks-theano-tensorflow/?couponCode=SEP123 Unsupervised Deep Learning in Python https://www.udemy.com/unsupervised-deep-learning-in-python/?couponCode=SEP123 Deep Learning: Recurrent Neural Networks in Python https://www.udemy.com/deep-learning-recurrent-neural-networks-in-python/?couponCode=SEP123 Advanced Natural Language Processing: Deep Learning in Python https://www.udemy.com/natural-language-processing-with-deep-learning-in-python/?couponCode=SEP123 Advanced AI: Deep Reinforcement Learning in Python https://www.udemy.com/deep-reinforcement-learning-in-python/?couponCode=SEP123 Deep Learning: GANs and Variational Autoencoders https://www.udemy.com/deep-learning-gans-and-variational-autoencoders/?couponCode=SEP123 Easy Natural Language Processing in Python https://www.udemy.com/data-science-natural-language-processing-in-python/?couponCode=SEP123 Cluster Analysis and Unsupervised Machine Learning in Python https://www.udemy.com/cluster-analysis-unsupervised-machine-learning-python/?couponCode=SEP123 Unsupervised Machine Learning: Hidden Markov Models in Python https://www.udemy.com/unsupervised-machine-learning-hidden-markov-models-in-python/?couponCode=SEP123 Data Science: Supervised Machine Learning in Python https://www.udemy.com/data-science-supervised-machine-learning-in-python/?couponCode=SEP123 Bayesian Machine Learning in Python: A/B Testing https://www.udemy.com/bayesian-machine-learning-in-python-ab-testing/?couponCode=SEP123 Ensemble Machine Learning in Python: Random Forest and AdaBoost https://www.udemy.com/machine-learning-in-python-random-forest-adaboost/?couponCode=SEP123 Artificial Intelligence: Reinforcement Learning in Python https://www.udemy.com/artificial-intelligence-reinforcement-learning-in-python/?couponCode=SEP123 SQL for Newbs and Marketers https://www.udemy.com/sql-for-marketers-data-analytics-data-science-big-data/?couponCode=SEP123 PREREQUISITE COURSE COUPONS And last but not least,$12 coupons for some helpful prerequisite courses. You NEED to know this stuff before you study machine learning:

General (site-wide): http://bit.ly/2oCY14Z
Python http://bit.ly/2pbXxXz
Calc 1 http://bit.ly/2okPUib
Calc 2 http://bit.ly/2oXnhpX
Calc 3 http://bit.ly/2pVU0gQ
Linalg 1 http://bit.ly/2oBBir1
Linalg 2 http://bit.ly/2q5SGEE
Probability (option 1) http://bit.ly/2prFQ7o
Probability (option 2) http://bit.ly/2p8kcC0
Probability (option 3) http://bit.ly/2oXa2pb
Probability (option 4) http://bit.ly/2oXbZSK

OTHER UDEMY COURSE COUPONS

As you know, I’m the “Lazy Programmer”, not just the “Lazy Data Scientist” – I love all kinds of programming!

And I’ve got sales for everything:

iOS courses:
https://lazyprogrammer.me/ios

Android courses:
https://lazyprogrammer.me/android

Ruby on Rails courses:
https://lazyprogrammer.me/ruby-on-rails

Python courses:
https://lazyprogrammer.me/python

Big Data (Spark + Hadoop) courses:

Javascript, ReactJS, AngularJS courses:
https://lazyprogrammer.me/js

EVEN MORE COOL STUFF

Into Yoga in your spare time? Photography? Painting? There are courses, and I’ve got coupons! If you find a course on Udemy that you’d like a coupon for, just let me know and I’ll hook you up!

Remember, these links will self-destruct on September 20 (7 days). Act NOW!