The Best-Worst-Kept Secret in Machine Learning

Neural networks are pretty simple to build. Last time, we picked apart some of the fundamentals and how they really just boil down to linear algebra formulas. There is, however, one single algorithm that is incredibly useful for machine learning and I hadn’t heard of it until today. It’s called Logistic Regression. It’s a five-step process that enables nearly all modern Deep Learning software. Here’s how it goes:

  1. Take your data
  2. Pick a random model
  3. Calculate the error
  4. Minimize the error, and obtain a better model
  5. Become sentient, destroy all humans, dominate universe!

… Okay, I took a little creative licence on that last one. But seriously, it’s that simple. The only complicated part is calculating the ‘error function’ and generalizing it for large and varied datasets.

The error function itself is a big long, somewhat scary formula that looks like this:

Error Function = -1/m \sum_{i=1}^{m} (1-y)ln(1-\alpha(Wx^{i}+b)) + y_i ln(\alpha(Wx^{i}+b))

What the error function is doing though, is really quite simple. The error function is looking at a set of points (usually, pixels of an image). We can represent an image like this:

Dots on a graph (oddly reminiscent of a smartphone or computer display, don’t you think?) 

The job of the logistic regression algorithm is to find a line that divides the red and blue pixels as best as it can. It shifts, translates and iterates, moving the line until it reaches the maximum possible percentage of blue pixels on one side of the line and the maximum possible percentage of red pixels on the other side of the line. As it iterates, it looks something like this:

The Logistic Regression algorithm starts by bisecting the graph at a random location and then it moves the line until it has maximized the number of blue pixels on one side and the number of red pixels on the other side. 

We call this ‘minimizing the error function‘ because what the algorithm is doing is finding the smallest number of blue pixels on the red side and the smallest possible number of red pixels on the blue side. These pixel mismatches are like errors if we’re trying to separate the two.

Here we can see the Error Plot as the algorithm iterates through the various stages and moves the dividing line. We can see that the percent error decreases with the number of epochs (iterations). It will not be able to get to zero in this case because there is no straight line that perfectly divides this particular plot, but it can surely reduce the errors to a minimum.

There, now you know about Logistic Regression, one of the foundational algorithms of machine-learning and deep neural networks. Of course, things start getting much more interesting when we’re no-longer using straight lines to divide the graph and we’re working with full-blown images instead of a few dots on a plot.

Let me know if you’ve learned something by reading this article. Soon we’ll start using these foundational principles and apply them to more complex tasks. Perhaps we’ll even be able to predict the next major market bubble before it bursts. But for now, that’s all!