What’s with all those single-letter variable names in machine learning code?

In this article, I want to discuss a common beginner question, which is:

“What’s with all those single-letter variable names in machine learning code?”

This article is part of a series I started on Common Beginner Questions, so check that out if you’d like to see more.


The short answer

The short answer to this question is many-fold. To summarize:

1) We follow conventions. When you’re on a team, you follow your team’s conventions. In ML, this happens to be pretty conventional.

2) It directly follows the math. If you have a math equation like \( w^T x+b \), then a direct translation into Numpy would look like “w @ x + b” where the code variables match the names of math variables. It is easy to follow. It isĀ not easy to follow when you rename everything to look like “weights @ inputs + bias”. This makes things harder, not easier.

3) Don’t use analogies that were meant for learning purposes in the lectures. For example, don’t call something a “neuron” in your code. Don’t call something a “slot machine” or “animal type” or “digit score”. Again, this makes things more confusing.


EXAMPLES (from the “real world”):

From Keras’ own documentation

From Theano’s documentation

From JAX’s documentation

From Tensorflow source code

But hey, if you think you have superior skills relative to Google engineers because you don’t use single-letter variable names, then you are welcome to think that.


Broad ideas about beginners vs. professionals

Here’s a common pattern I see among software engineers and programmers. There are 2 types:

Type 1) Beginners / students who just graduated a bootcamp / students who just graduated college (for brevity, I’ll simply call these “beginners” in this article)

Type 2) Seasoned professionals / those who have experience working on large teams, working on significant-sized projects (for brevity, I’ll call these “professionals”)


The “beginner” approach is often:

  • to be very gung-ho about fixing everything on their first day
  • to have very ambitious ideas about overhauling legacy systems
  • to parrot many of the “rules” they learned in school

Such “rules” include:

  • don’t use single-letter variable names
  • comment your code
  • use spaces over tabs / tabs over spaces / 2-space indents / 4-space indents (you can see there is already a problem because these are all inconsistent with each other!)
  • don’t do premature optimization (unfortunately, as beginners, they have no idea what constitutes “premature optimization” in the first place, because they haven’t spent enough time learning the system first)

The last point reminds me of people who comment in online forums, who often parrot the phrase “correlation does not imply not causation”.

As a statistician, we think “yes yes, we all know that, but you’re missing the point of the discussion”. Statisticians already have this idea solidly implanted in their minds. There’s no need to repeat it at every opportunity.


The professional, more mature approach differs in the following ways:

They are not gung-ho about fixing everything immediately. In large systems, when you change one thing, it can potentially affect many other things you haven’t even thought of.

This is an example of how beginners “don’t know what they don’t know”. It’s like pulling out a Jenga piece and the whole tower falling down as a result.

Professionals also understand that things might seem weird at first but that there is probably a good reason that they are that way.

Real systems are complex and sometimes compromises have to be made. Professionals get this.


Professionals also temper their ambitions to overhaul legacy systems. They are better at estimating how much time things take, compared to beginners. Beginners often feel they can do anything.

Professionals, thanks to experience, actually know what that “anything” turns into once committed.


Finally, professionals generally do not parrot rules like “don’t use single-letter variable names”. Instead, they understandĀ context and convention.

Obviously, calling your API key “x” doesn’t make sense, but calling your model input “x” when that’s also what it’s referred to in the math does make sense.

Obviously, commenting your code makes sense, but if you are taking a course in which the “commentary” for the code consists of the actual video lecture itself, you shouldn’t expect the same comments to be repeated in the code.

Sometimes, what is appropriate in one context is not appropriate in another. Beginners don’t understand this, and try to apply the same rules in all contexts. They cannot adapt.

Professionals understand that it is more important to conform to the conventions and processes of their team, and to be predictable. This is efficient.

Things are easier to understand when everyone does things the same way everywhere.

As a simple example, if your whole team uses 2-space indents, but you start using tabs, you’re going to mess up the repo for everyone else.

If your whole team uses conventional math symbols (e.g. x for inputs, z for latent variables) and you start writing variable names like latent_cluster_identity_probability it’s going to look very weird (especially when everyone already knows what “z” means).