Lazy Programmer

Your source for the latest in deep learning, big data, data science, and artificial intelligence. Sign up now

Deep Learning: The Swish Activation Function

October 18, 2017

The Google Brain team has just released a new paper (https://arxiv.org/abs/1710.05941) that demonstrates the superiority of a new activation function called Swish on a number of different neural network architectures.

This is interesting because people often ask me, “which activation function should I use?”

These days, it is common to just use the ReLU by default.

To refresh your memory, the ReLU looks like this:

relu

And it is defined by the equation:

$$ f(x) = max(0, x) $$

One major problem with the ReLU is that its derivative is 0 for half the values of the input \( x \). Because we use “gradient descent” as our parameter update algorithm, if the gradient is 0 for a parameter, then that parameter will not be updated!

In other words, when I do:

$$ \theta = \theta – \alpha \frac{\partial J}{\partial \theta } $$

And:

$$ \frac{\partial J}{\partial \theta } = 0 $$

Then my update is just:

$$ \theta = \theta $$

Which just assigns the parameter back to itself.

This leads to the problem of “dead neurons”. Experiments have shown that neural networks trained with ReLUs can have up to 40% dead neurons!

There have been some proposed alternatives to this, such as the leaky ReLU, the ELU, and the SELU.

Interestingly, none of these have seemed to catch on and it’s still ReLU by default.

 

 

So how does the Swish activation function work?

The function itself is very simple:

$$ f(x) = x \sigma(x) $$

Where \( \sigma(x) \) is the usual sigmoid activation function.

$$ \sigma(x) = (1 + e^{-x})^{-1} $$

It looks like this:

Screen Shot 2017-10-18 at 2.39.55 PM

What’s interesting about this is that unlike every other activation function, it is not monotonically increasing. Does it matter? It seems the answer is no!

The derivative looks like this:

Screen Shot 2017-10-18 at 3.29.34 PM

One interesting thing we can do is re-parameterize the Swish, in order to “stretch out” the sigmoid:

$$ f(x) = 2x \sigma(\beta x) $$

We can see that, if \( \beta = 0 \), then we get the identity activation \( f(x) = x \), and if \( \beta \rightarrow \infty \) then the sigmoid converges to the unit step and multiplying that by \( x \) gives us back \( f(x) = 2 max(0, x) \) which is just the ReLU multiplied by a constant factor.

So including \( \beta \) is a way for us to nonlinearly interpolate between identity and ReLU.

The title of the paper is “A Self-Gated Activation Function”, which might make you wonder, “Why is it self-gated?”

This should remind you of the LSTM, where we have “gates” in the form of sigmoids that control how much of a vector gets passed on to the next stage, by multiplying it between the output of the sigmoid, which is a number between 0 and 1.

So “self-gated” means that the gate is just the sigmoid of the activation itself.

Gate: \( \sigma(x) \)

Value to pass through: \( x \)

But that’s enough theory. For most of us, we want to know: “Does it work?”

And more practically, “Can I just use this by default instead of the ReLU?”

The best thing to do is just to try it for yourself and see how robust it is to different settings of hyperparameters (learning rate, architecture, etc.) but let’s look at some results so we can be confident when it comes to using Swish:

Screen Shot 2017-10-18 at 3.42.46 PM

Click on the image to see it in the original size.

To compare Swish with baseline, a statistical test called the one-sided paired sign test was used.

Conclusion: Try Swish for yourself!

Go to comments


Python 2-to-3 Tips

October 17, 2017

This is a short post to help those of you who need help translating code from Python 2 to Python 3.

Python 2 is the most popular Python version (at least at this time and certainly at the time my courses were created), hence why it was used.

It comes with Mac OS and Ubuntu pre-installed so when you type in “python” into your command line, you get Python 2.

This list is not exhaustive. It shows only code that appears commonly in my machine learning scripts, to assist the students taking my machine learning courses (https://deeplearningcourses.com).

 

Integer Division

OLD:

a / b

NEW:

a // b

 

For Loops

OLD:

for i in xrange

NEW:

for i in range

 

Printing

OLD:

print "hello world"

NEW:

print("hello world")
Go to comments


Deep Learning and Machine Learning FALL SALE 90% OFF

October 7, 2017

“Hey Lazy Programmer, when is your next course coming out?”

I’ve been really busy adding tons of free updates to my existing courses! You can scroll down to the very bottom to see what they are. But in the mean time we are going to do another HUGE sale. ALL courses on Udemy are now $12. Take this opportunity to grab as many courses as you can because you never know when the next sale is going to be!

As usual, I’m providing $12 coupons for all my courses in the links below. Please use these links and share them with your friends!

You can also just type in the coupon code “OCT123” (except Deep Learning pt 1 because I messed it up =), for that use “OCT123A”).

The promo goes until October 10. Don’t wait!

At the end of this post, I’m going to provide you with some additional links to get machine learning prerequisites (calculus, linear algebra, Python, etc…) for $12 too!

But that’s not all… I’m the Lazy Programmer, not just the Lazy Data Scientist – I’ve got $12 coupons for iOS development, Android development, Ruby on Rails, Python, Big Data / Hadoop / Spark, React.js, Angular, and MORE. All important skillsets on ANY engineering team. Got any friends or coworkers in mobile / backend / big data development? Let them know!

If you don’t know what order to take the courses in, please check here: https://deeplearningcourses.com/course_order
Here are the links for my courses:

Deep Learning Prerequisites: Linear Regression in Python
https://www.udemy.com/data-science-linear-regression-in-python/?couponCode=OCT123

Deep Learning Prerequisites: Logistic Regression in Python
https://www.udemy.com/data-science-logistic-regression-in-python/?couponCode=OCT123

Deep Learning in Python
https://www.udemy.com/data-science-deep-learning-in-python/?couponCode=OCT123A

Practical Deep Learning in Theano and TensorFlow
https://www.udemy.com/data-science-deep-learning-in-theano-tensorflow/?couponCode=OCT123

Deep Learning: Convolutional Neural Networks in Python
https://www.udemy.com/deep-learning-convolutional-neural-networks-theano-tensorflow/?couponCode=OCT123

Unsupervised Deep Learning in Python
https://www.udemy.com/unsupervised-deep-learning-in-python/?couponCode=OCT123

Deep Learning: Recurrent Neural Networks in Python
https://www.udemy.com/deep-learning-recurrent-neural-networks-in-python/?couponCode=OCT123

Advanced Natural Language Processing: Deep Learning in Python
https://www.udemy.com/natural-language-processing-with-deep-learning-in-python/?couponCode=OCT123

Advanced AI: Deep Reinforcement Learning in Python
https://www.udemy.com/deep-reinforcement-learning-in-python/?couponCode=OCT123

Deep Learning: GANs and Variational Autoencoders
https://www.udemy.com/deep-learning-gans-and-variational-autoencoders/?couponCode=OCT123

Easy Natural Language Processing in Python
https://www.udemy.com/data-science-natural-language-processing-in-python/?couponCode=OCT123

Cluster Analysis and Unsupervised Machine Learning in Python
https://www.udemy.com/cluster-analysis-unsupervised-machine-learning-python/?couponCode=OCT123

Unsupervised Machine Learning: Hidden Markov Models in Python
https://www.udemy.com/unsupervised-machine-learning-hidden-markov-models-in-python/?couponCode=OCT123

Data Science: Supervised Machine Learning in Python
https://www.udemy.com/data-science-supervised-machine-learning-in-python/?couponCode=OCT123

Bayesian Machine Learning in Python: A/B Testing
https://www.udemy.com/bayesian-machine-learning-in-python-ab-testing/?couponCode=OCT123

Ensemble Machine Learning in Python: Random Forest and AdaBoost
https://www.udemy.com/machine-learning-in-python-random-forest-adaboost/?couponCode=OCT123

Artificial Intelligence: Reinforcement Learning in Python
https://www.udemy.com/artificial-intelligence-reinforcement-learning-in-python/?couponCode=OCT123

SQL for Newbs and Marketers
https://www.udemy.com/sql-for-marketers-data-analytics-data-science-big-data/?couponCode=OCT123

PREREQUISITE COURSE COUPONS

And just as important, $12 coupons for some helpful prerequisite courses. You NEED to know this stuff before you study machine learning:

General (site-wide): http://bit.ly/2oCY14Z
Python http://bit.ly/2pbXxXz
Calc 1 http://bit.ly/2okPUib
Calc 2 http://bit.ly/2oXnhpX
Calc 3 http://bit.ly/2pVU0gQ
Linalg 1 http://bit.ly/2oBBir1
Linalg 2 http://bit.ly/2q5SGEE
Probability (option 1) http://bit.ly/2prFQ7o
Probability (option 2) http://bit.ly/2p8kcC0
Probability (option 3) http://bit.ly/2oXa2pb
Probability (option 4) http://bit.ly/2oXbZSK
OTHER UDEMY COURSE COUPONS

As you know, I’m the “Lazy Programmer”, not just the “Lazy Data Scientist” – I love all kinds of programming!

And I’ve got sales for everything:

iOS courses:
https://lazyprogrammer.me/ios

Android courses:
https://lazyprogrammer.me/android

Ruby on Rails courses:
https://lazyprogrammer.me/ruby-on-rails

Python courses:
https://lazyprogrammer.me/python

Big Data (Spark + Hadoop) courses:
https://lazyprogrammer.me/big-data-hadoop-spark-sql

Javascript, ReactJS, AngularJS courses:
https://lazyprogrammer.me/js

 

 

EVEN MORE COOL STUFF

Into Yoga in your spare time? Photography? Painting? There are courses, and I’ve got coupons! If you find a course on Udemy that you’d like a coupon for, just let me know and I’ll hook you up!

Remember, these links will self-destruct on October 10 (5 days). Act NOW!

 

 

COURSE UPDATES

Recent updates to existing courses because my students are awesome and deserve free stuff:

Deep Learning pt 2 (Theano / Tensorflow):
* Brand new section on batch normalization (7 new lectures!)

Deep Reinforcement Learning (Advanced AI):
* Continuous Mountain Car in Theano and Tensorflow with Policy Gradient

Cluster Analysis / Unsupervised ML:
* Simulating Biological Evolution + Applying Clustering
* Applying Clustering to Donald Trump + Hillary Clinton Tweets from 2016 Election

Numpy Stack:
* Improved quality / resolution of all slides
* Pushed code up so video player controls won’t block it

Unsupervised Deep Learning
* Visualizing t-SNE

Go to comments


Goodbye Theano

September 29, 2017

It’s a sad day for us Theano fans. The developers of Theano have announced that they are halting development following the 1.0 release.

Here’s the original post: https://groups.google.com/forum/#!topic/theano-users/7Poq8BZutbY

Dear users and developers,

After almost ten years of development, we have the regret to announce
that we will put an end to our Theano development after the 1.0 release,
which is due in the next few weeks. We will continue minimal maintenance
to keep it working for one year, but we will stop actively implementing
new features. Theano will continue to be available afterwards, as per
our engagement towards open source software, but MILA does not commit to
spend time on maintenance or support after that time frame.

The software ecosystem supporting deep learning research has been
evolving quickly, and has now reached a healthy state: open-source
software is the norm; a variety of frameworks are available, satisfying
needs spanning from exploring novel ideas to deploying them into
production; and strong industrial players are backing different software
stacks in a stimulating competition.

We are proud that most of the innovations Theano introduced across the
years have now been adopted and perfected by other frameworks. Being
able to express models as mathematical expressions, rewriting
computation graphs for better performance and memory usage, transparent
execution on GPU, higher-order automatic differentiation, for instance,
have all become mainstream ideas.

In that context, we came to the conclusion that supporting Theano is no
longer the best way we can enable the emergence and application of novel
research ideas. Even with the increasing support of external
contributions from industry and academia, maintaining an older code base
and keeping up with competitors has come in the way of innovation.

MILA is still committed to supporting researchers and enabling the
implementation and exploration of innovative (and sometimes wild)
research ideas, and we will keep working towards this goal through other
means, and making significant open source contributions to other projects.

Thanks to all of you who for helping develop Theano, and making it
better by contributing bug reports, profiles, use cases, documentation,
and support.

— Yoshua Bengio,
Head of MILA

Go to comments


Deep Learning and Machine Learning September 2017 Coupons

September 13, 2017

Since I am still busy hacking away at my next course, we are going to do another HUGE sale. ALL courses on Udemy are now $12. Take this opportunity to grab as many courses as you can because you never know when the next sale is going to be!

As usual, I’m providing $12 coupons for all my courses in the links below. Please use these links and share them with your friends!

You can also just type in the coupon code “SEP123”.

The promo goes until September 20. Don’t wait!

At the end of this post, I’m going to provide you with some additional links to get machine learning prerequisites (calculus, linear algebra, Python, etc…) for $12 too!

But that’s not all… I’m the Lazy Programmer, not just the Lazy Data Scientist – I’ve got $12 coupons for iOS development, Android development, Ruby on Rails, Python, Big Data / Hadoop / Spark, React.js, Angular, and MORE. All important skillsets on ANY engineering team. Got any friends or coworkers in mobile / backend / big data development? Let them know!

If you don’t know what order to take the courses in, please check here: https://deeplearningcourses.com/course_order

 
Here are the links for my courses:

Deep Learning Prerequisites: Linear Regression in Python
https://www.udemy.com/data-science-linear-regression-in-python/?couponCode=SEP123

Deep Learning Prerequisites: Logistic Regression in Python
https://www.udemy.com/data-science-logistic-regression-in-python/?couponCode=SEP123

Deep Learning in Python
https://www.udemy.com/data-science-deep-learning-in-python/?couponCode=SEP123

Practical Deep Learning in Theano and TensorFlow
https://www.udemy.com/data-science-deep-learning-in-theano-tensorflow/?couponCode=SEP123

Deep Learning: Convolutional Neural Networks in Python
https://www.udemy.com/deep-learning-convolutional-neural-networks-theano-tensorflow/?couponCode=SEP123

Unsupervised Deep Learning in Python
https://www.udemy.com/unsupervised-deep-learning-in-python/?couponCode=SEP123

Deep Learning: Recurrent Neural Networks in Python
https://www.udemy.com/deep-learning-recurrent-neural-networks-in-python/?couponCode=SEP123

Advanced Natural Language Processing: Deep Learning in Python
https://www.udemy.com/natural-language-processing-with-deep-learning-in-python/?couponCode=SEP123

Advanced AI: Deep Reinforcement Learning in Python
https://www.udemy.com/deep-reinforcement-learning-in-python/?couponCode=SEP123

Deep Learning: GANs and Variational Autoencoders
https://www.udemy.com/deep-learning-gans-and-variational-autoencoders/?couponCode=SEP123

Easy Natural Language Processing in Python
https://www.udemy.com/data-science-natural-language-processing-in-python/?couponCode=SEP123

Cluster Analysis and Unsupervised Machine Learning in Python
https://www.udemy.com/cluster-analysis-unsupervised-machine-learning-python/?couponCode=SEP123

Unsupervised Machine Learning: Hidden Markov Models in Python
https://www.udemy.com/unsupervised-machine-learning-hidden-markov-models-in-python/?couponCode=SEP123

Data Science: Supervised Machine Learning in Python
https://www.udemy.com/data-science-supervised-machine-learning-in-python/?couponCode=SEP123

Bayesian Machine Learning in Python: A/B Testing
https://www.udemy.com/bayesian-machine-learning-in-python-ab-testing/?couponCode=SEP123

Ensemble Machine Learning in Python: Random Forest and AdaBoost
https://www.udemy.com/machine-learning-in-python-random-forest-adaboost/?couponCode=SEP123

Artificial Intelligence: Reinforcement Learning in Python
https://www.udemy.com/artificial-intelligence-reinforcement-learning-in-python/?couponCode=SEP123

SQL for Newbs and Marketers
https://www.udemy.com/sql-for-marketers-data-analytics-data-science-big-data/?couponCode=SEP123

 

 

PREREQUISITE COURSE COUPONS

And last but not least, $12 coupons for some helpful prerequisite courses. You NEED to know this stuff before you study machine learning:

General (site-wide): http://bit.ly/2oCY14Z
Python http://bit.ly/2pbXxXz
Calc 1 http://bit.ly/2okPUib
Calc 2 http://bit.ly/2oXnhpX
Calc 3 http://bit.ly/2pVU0gQ
Linalg 1 http://bit.ly/2oBBir1
Linalg 2 http://bit.ly/2q5SGEE
Probability (option 1) http://bit.ly/2prFQ7o
Probability (option 2) http://bit.ly/2p8kcC0
Probability (option 3) http://bit.ly/2oXa2pb
Probability (option 4) http://bit.ly/2oXbZSK

 

 

 
OTHER UDEMY COURSE COUPONS

As you know, I’m the “Lazy Programmer”, not just the “Lazy Data Scientist” – I love all kinds of programming!

And I’ve got sales for everything:

iOS courses:
https://lazyprogrammer.me/ios

Android courses:
https://lazyprogrammer.me/android

Ruby on Rails courses:
https://lazyprogrammer.me/ruby-on-rails

Python courses:
https://lazyprogrammer.me/python

Big Data (Spark + Hadoop) courses:
https://lazyprogrammer.me/big-data-hadoop-spark-sql

Javascript, ReactJS, AngularJS courses:
https://lazyprogrammer.me/js

 

 

 

EVEN MORE COOL STUFF

Into Yoga in your spare time? Photography? Painting? There are courses, and I’ve got coupons! If you find a course on Udemy that you’d like a coupon for, just let me know and I’ll hook you up!

Remember, these links will self-destruct on September 20 (7 days). Act NOW!

Go to comments