Where to get cheap web hosting: GoDaddy vs. BlueHost vs. etc…

November 23, 2015

My brand-spanking new blog software told me that this title was too long: GoDaddy vs. BlueHost vs. DreamHost vs. Hostgator vs. Namecheap

Read all the way to the end if you want to find out who I chose, why it took me so long to choose, and for some nice discount buttons.

If you noticed, there’ve been some subtle changes on this site lately.

In actuality, the site has gone through a complete overhaul. I tried my hardest to create my own WordPress theme to be exactly like the old theme (which was hosted on Tumblr), with a few enhancements.

So as you probably already know, WordPress needs to be hosted on paid servers that have PHP installed. The typical players are GoDaddy et al. (the ones listed above, except for Namecheap). For the somewhat technical, your site needs to live on boxes like these:

I moved all of my domain names to Namecheap from GoDaddy awhile ago when I read about some of their shady business practices (they initially supported SOPA).

So I’m browsing around, basically just price-shopping. I know all these companies have bad reviews for poor customer service. I try to do my due diligence by noting the percentage of downtime and cPanel features. But really I just want a good price on hosting so I can test out new sites cheaply and efficiently.

Most of these guys have a 1-3 year discount rate, something like $6.99/month if you sign up for 3 years. Then it goes up to $11.99/month after that.

I’m not going to list the exact prices here because they are always changing, but approx. $5+ to approx. $10+ after the discount period is typical. The problem is, most of these sites don’t post their non-discount rate.

As a sidenote, I’ve noticed lately that a lot of sites, including banks, who you’d think would be more professional, have a lot of footnotes. Like this.1

And then, nothing on the page actually explains what it refers to. It’s like saying, “read the fine print if you want to understand how we’re going to fuck you”, and then just excluding that fine print, in hopes you’ll just forget about it.

Alright, lemme talk to their support guys to tally up what everyone’s non-discounted rate is. This helped me narrow down my options a little bit. HostGator’s support was very unhelpful. I partly wrote this post to tell you not to use HostGator. After some time doing this it hit me… Namecheap offers hosting.

I’ve been using Namecheap for years for my domains, and I’ve found their user interface and support to be above par. I look up their rate for shared web hosting and to my surprise, it’s $19.88 for the first year. After this discount period, it goes up to $78.88, which is $6.57 / month, which is way better than what everyone I mention above provides.

If you use the coupon code COOLDAYS you can get 20% off, which means you’ll only pay $15. This is basically just above $1 / month. I think (not 100% sure) the coupon code only lasts until the end of November, so get that shit now.

And since Namecheap sucks at advertising their hosting, which is why I almost signed up for an inferior service instead of theirs, I am going to post some buttons here in case you’re looking for a new domain, web hosting, and a free WhoIsGuard (so people can’t stalk you on the net, since by default your full name, address, and email are shown when someone looks you up).

Namecheap.com

Namecheap.com

Full disclosure, these are referral links, so buying from these links helps me too and helps keep this site up. =)

Namecheap.com

It’s entirely possible I haven’t covered a really awesome, cheap web host in this post, so if I missed something, please let me know in the comments below.

#bluehost #cheap web hosting #coupon code #dreamhost #godaddy #hostgator #namecheap #web hosting

Go to comments


Tutorial: Principal Components Analysis (PCA)

November 20, 2015

I remember learning about principal components analysis for the very first time. I remember thinking it was very confusing, and that I didn’t know what it had to do with eigenvalues and eigenvectors (I’m not even sure I remembered what eigenvalues and eigenvectors were at the time).

I assure you that in hindsight, understanding PCA, despite its very scientific-sounding name, is not that difficult to understand at all. There are two “categories” I think it makes sense to split the discussion of PCA into.

1) What does it do? A written and visual description of how PCA is used.

2) The math behind PCA. A few equations showing that the things mentioned in (1) are true.

What does PCA do?

Linear transformation

PCA finds a matrix \( Q \) that, when multiplied by your original data matrix \( X \), gives you a linearly transformed data matrix \( Z \), where:

$$ Z = XQ $$

In other words, for a single sample vector \( x \), we can obtain its transformation \( z = Q^Tx \).

The \( Q \) and \( x \) appear in a different order here because when we load a data matrix, it’s usually \( N \times D \), so each sample vector is \( 1 \times D \), but when we talk about vectors by themselves we usually think of them as \( D \times 1 \).

The interesting thing about PCA is how it chooses \( Q \).

Dimensionality reduction

When we think of machine learning models we often study them in the context of 1-3 dimensions. But if you take a dataset like MNIST, where each image is 28×28 pixels, then each input is 784 dimensions. 28×28 is a tiny image. The new iPhone 6 has a resolution of 1080 × 1920 pixels. Each image would thus have 2073600 \( \approx \) 2 million dimensions.

PCA reduces dimensionality by moving as much “information” as possible into as few dimensions as possible. We measure information via unpredictability, i.e. variance. The end result is that our transformed matrix \( Z \) has most of its variance in the first column, less variance in the second column, even less variance in the third column, etc.

You will see later how our choice of \( Q \) accomplishes this.

De-correlation

\( D \approx 2000000 \) is a much bigger problem than the ones you learn about in the classroom. But often there are correlations in the data that make many of the dimensions redundant. This is related to the problem of linear regression – if we can accurately predict one dimension of the inputs, let’s say, \( x_3 \), in terms of another, \( x_5 \), then one of those dimensions is redundant.

The final result of removing these correlations is that every dimension of \( Z \) is uncorrelated with the other.

PCA

The image above shows a scatterplot of the original data on the original x-y axis. The arrows on the plot itself show that the new v1-v2 axis is a) rotated (matrix multiplication rotates a vector) b) still perpendicular, and c) the data relative to the new axis is normally distributed and the distributions on each dimension are independent and uncorrelated.

Visualization

Once we’ve transformed \( X \) into \( Z \), noting that most of the “information” from \( X \) is in the first few dimensions of \( Z \) – let’s say… 90% of the variance is in the first 2 dimensions – we can do a 2-D scatterplot of Z to see the structure of our data. Humans have a hard time visualization anything greater than 2 dimensions.

We can consider the first 2 dimensions to represent the real underlying trend in the data, and the other dimensions just small perturbations or noise.

Pre-processing

You may have heard that too many parameters in your model can lead to overfitting. So if you are planning to use a logistic regression model, and the dimensionality of each input is 2 million, then the number of weights in your model will also be 2 million. This is not good news if you have only a few thousand samples. One rule of thumb is we would like to have 10x the number of samples compared to the number of parameters in our model.

So instead of going out and finding 20 million samples, we can use PCA to reduce the dimensionality of our data to say, 20, and then we only need 200 samples for our model.

You can also use PCA to pre-process data before using an unsupervised learning algorithm, like k-means clustering. PCA, by the way, is also an unsupervised algorithm.

The math behind PCA

Everybody’s favorite part!

Ok, so this requires both (a) some statistics knowledge (knowing how to find the covariance matrix), and (b) some linear algebra knowledge (knowing what eigenvalues and eigenvectors are, and basic matrix manipulation).

Covariance

I stated above that in the rotated v1-v2 coordinate system, the data on the v1 axis was not correlated with the data on the v2 axis.

Intuitively, when you have a 2-D Gaussian distribution, the data looks like an ellipse. If that ellipse is perpendicular to the x-y grid, then the x and y components are independent and uncorrelated.

The image below shows what happens with different values of the correlation between \( X_i \) and \( X_j \) (called ‘c’ in the image):

ellipses

(If you really know your statistics, then you will recall that independence implies 0 correlation, but 0 correlation does not imply independence, unless the distribution is a joint Gaussian.)

So, in the first image, since the ellipse is not perpendicular to the x-y axis, the distributions p(x) and p(y) are not independent. But in the rotated v1-v2 axis, the distributions p(v1) and p(v2) are independent and uncorrelated.

If this all sounds crazy, just remember this one fact: if 2 variables \( X_i \) and \( X_j \) are uncorrelated, their corresponding element in the covariance matrix, \( \Sigma_{ij} = 0 \). And since the covariance matrix is symmetric (for our purposes), \( \Sigma_{ji} = 0 \) also.

For every variable to be uncorrelated to every other variable, all the elements of \( \Sigma \) will be 0, except those along the diagonal, \( \Sigma_{ii} = \sigma_i^2 \), which is just the regular variance for each dimension.

How do we calculate the covariance matrix from the data? Simple!

$$ \Sigma_X = \frac{1}{N} (X – \mu_X)^T(X – \mu_X) \tag{1}$$

In non-matrix form this is:

$$ \Sigma_{ij} = \frac{1}{N} \sum_{n=1}^{N} (x(n)_i – \mu_i)(x(n)_j – \mu_j) $$

I’ve labeled the equation because we’re going to return to it. You can verify that \( \Sigma \) is \( D \times D \) since those are the outer dimensions of the matrix multiply.

Note the mathematical sleight of hand I used above. \( X \) is \( N \times D \) and the mean \( \mu_X \) is \( 1 \times D \), so technically you can’t subtract them under the rules of matrix algebra, but let’s assume that the above notation means each row of \( X \) has \( \mu_X \) subtracted from it.

Eigenvalues and Eigenvectors

This is where the real magic of PCA happens.

For no particular reason at all, suppose we compute the eigenvalues and eigenvectors of \( \Sigma_X \).

If you don’t know what eigenvalues/eigenvectors are: remember that usually, when we multiply a vector by a matrix, it changes the direction of the vector.

So for example, the vector \( (1,2) \) is pointing in the same direction as \( (2,4) = 2(1,2) \). But if we were to multiply \( (1,2) \) by a matrix:

$$ \begin{pmatrix} 5 \\ 11 \end{pmatrix} = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \begin{pmatrix} 1 \\ 2 \end{pmatrix} $$

The result is a vector in a different direction. Eigenvalues \( \lambda \) and eigenvectors \( v \) have the property that, if you multiply \( \Sigma_Xv \), this is the same by multiplying the eigenvector by a constant – the eigenvalue, i.e. the eigenvector does not change direction, it only gets shorter or longer by a factor of \( \lambda \). In other words:

$$ \Sigma_Xv = \lambda v $$

How many pairs of these eigenvectors and eigenvalues can we find? It just so happens that there are \( D \) such pairs. We’ll say, each eigenvalue \( \lambda_i \) has a corresponding eigenvector \( v_i \).

Now again, for no particular reason at all, we are going to combine all the \( v_i \) into a matrix, let’s call it \( V \).

We could line up the \( v_i \) in any random order, but instead we are going to choose a very particular order – we are going to line them up such that the corresponding eigenvalues \( \lambda_i \) are in descending order. In other words:

$$ \lambda_1 > \lambda_2 > \lambda_3 > … > \lambda_D $$

So:

$$ V = \begin{pmatrix}
| & | & | & … & | \\
v_1 & v_2 & v_3 & … & v_D \\
| & | & | & … & |
\end{pmatrix}
$$

We are also going to combine the eigenvalues into a diagonal matrix like this:

$$ \Lambda = \begin{pmatrix}
\lambda_1 & 0 & … & 0 \\
0 & \lambda_2 & … & 0 \\
… & … & … & … \\
0 & 0 & … & \lambda_D
\end{pmatrix}
$$

As a sidenote, don’t worry about how we find the eigenvalues and eigenvectors. That is a huge problem in and of itself. The method you used in high school – solving a polynomial to get the eigenvalues, plugging the eigenvalues into the eigenvalue equation to get the eigenvectors, etc. doesn’t really translate to computer code. There exist some very involved algorithms to solve this problem, but we’ll just use numpy.

In any case, it is easy to prove to yourself, with a simple 2-D example, that:

$$ \Sigma_X V = V \Lambda \tag{2}$$

Just one more ingredient. We want to show that \( V \) is orthonormal. In terms of the individual eigenvectors this means that \( {v_i}^T v_j = 0 \) if \( i \ne j \), and \( {v_i}^T v_i = 1 \). In matrix form this means \( V^T V = I \), or in other words, \( V^T = V^{-1} \).

Proof of orthogonality: Consider \( \lambda_j {v_i}^T v_j \) for \( i \ne j \). \( \lambda_j {v_i}^T v_j = {v_i}^T (\lambda_j v_j) = {v_i}^T \Sigma_X v_j = (\Sigma_X^T v_i)^T v_j \). But we know that \( \Sigma_X \) is symmetric so this is \( (\Sigma_X v_i)^T v_j = (\lambda_i v_i)^T v_j = \lambda_i {v_i}^T v_j \). Since \( \lambda_i \ne \lambda_j \), this can only be true if \( {v_i}^T v_j = 0 \).

Normalizing eigenvectors is easy since they are not unique – just choose values so that their length is 1.

Finally, using \( (1) \), consider the covariance of the transformed data, \( Z \).

$$ \begin{align*}
\Sigma_Z &= \frac{1}{N} (Z – \mu_Z)^T (Z – \mu_Z) \\
&= \frac{1}{N} (XQ – \mu_XQ)^T (XQ – \mu_XQ) \\
&= \frac{1}{N} Q^T(X – \mu_X)^T (X – \mu_X)Q \\
&= Q^T \Sigma_X Q
\end{align*}
$$

Now, suppose we choose \( Q = V \). Then, by plugging in \( (2) \), we get:

$$ \begin{align*}
\Sigma_Z &= V^T \Sigma_X V \\
&= V^T V \Lambda \\
&= \Lambda
\end{align*}
$$

So what does this tell us? Since \( \Lambda \) is a diagonal matrix, there are no correlations in the transformed data. The variance of each dimension of \( Z \) is equal to the eigenvalues. In addition, because we sorted the eigenvalues in descending order, the first dimension of \( Z \) has the most variance, the second dimension has the second-most variance, etc. So most of the information is kept in the leading columns, as promised.

You can verify this in Python:

import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

pca = PCA()
X = np.random.random((100,10)) # generate an N = 100, D = 10 random data matrix
Z = pca.fit_transform(X)

# visualize the covariance of Z
plt.imshow(np.cov(Z.T))
plt.show()

covariance of output of PCA

Observe that the off-diagonals are 0 and that the variances are in decreasing order.

You can also confirm that \( Q^T Q = I \).

QTQ = pca.components_.T.dot(pca.components_)
plt.imshow(QTQ)
plt.show()

print np.around(QTQ, decimals=2)

qtq

Remember to leave a comment below if you have any questions on the above material. If you want to go more in depth on this and other data science topics (both with the math and the code), check out some of my data science video courses online.

#anomaly detection #covariance #dimensionality reduction #eigenvalues #eigenvectors #linear algebra #machine learning #pca #principal components analysis #statistics

Go to comments


Go machine learning

November 17, 2015

Autoresponder algorithm converts “Monday or Wednesday” to “Tuesday”.

 

Autoresponder interprets "Monday or Wednesday" as "Tuesday".

Go to comments


SMTP -> ERROR: Failed to connect to server: Connection timed out (110)

November 17, 2015

I get this error frequently when I use the PHPMailer library, i.e. for a WordPress site.

The strange thing is, this will work on your localhost but not on the remote server.

It is not a problem with PHPMailer, or WordPress, or your mail server.

It is a problem with your hosting configuration (I’m using Namecheap for hosting but I assume you have cPanel installed).

How to fix:

  1. Log into cPanel, and under the “Mail” section, choose “MX Entry”.
  2. Add MX records as shown below. (I’m using Zoho as my mail server, and their addresses are mx.zoho.com and mx2.zoho.com)

Screen Shot 2015-11-16 at 11.46.38 PM

#hosting #php #phpmailer #smtp #wordpress

Go to comments


Linear regression video course in Python

November 11, 2015

Hi all!

Do you ever get tired of reading walls of text, and just want a nice video or 10 to explain to you the magic of linear regression and how to program it with Python and numpy?

Look no further, that video course is here.

UPDATE: Please go here for the latest and fullest version of the course: https://deeplearningcourses.com/c/data-science-linear-regression-in-python/

#linear regression #numpy #python #statistics

Go to comments


Logistic Regression in Python video course

November 11, 2015

Hi all!

Do you ever get tired of reading walls of text, and just want a nice video or 10 to explain to you the magic of logistic regression and how to program it with Python?

Look no further, that video course is here.

UPDATE: Please go here for the latest and fullest version of this course: https://deeplearningcourses.com/c/data-science-logistic-regression-in-python/

#big data #data science #logistic regression #neural networks #numpy #python

Go to comments


Deep Learning and Artificial Intelligence Newsletter

Get discount coupons, free machine learning material, and new course announcements