How Afraid Should You Be of AI?

A friend sent me a video today. It started off rather innocuously, with a program called EarWorm, designed to search for Copyrighted content and erase it from memory online. As many of these stories do, it escalated quickly. Within three days of being activated by some careless engineers with no backzground in AI ethics, it had wiped out all memory of the last 100 years. Not only digitally, but even in the brains of the people who remembered it. Its programmers had instructed it to do so with as little disruption to human lives as possible, so it kept everyone alive. It might have been easier to just wipe humanity off the map. Problem solved. No more Copyrighted content being shared, anywhere. At least that didn’t happen. Right?

This story is set in the year 2028, only ten years from now. These engineers and programmers had created the world’s first Artificial General Intelligence (AGI) and it rapidly became smarter than all of humanity, with the computing power and storage capacity surpassing what had been available previously though all of human history. Assigned a singular mission, the newly formed AGI sets out to complete its task with remorseless efficiency. It quickly invents and enlists an army of nanoscopic robots that can alter human minds and wipe computer memory. By creating a mesh network of these bots that can self-replicate, the AI quickly spreads its influence around the world. It knows that humans will be determined to stop it from accomplishing its mission, so it uses the nanobots to slightly alter the personalities of anyone intelligent enough to pose a threat to its mission. Within days it accomplishes its task. It manipulates the brains of its targets just enough to achieve the task while minimizing disruption. It does this by simply reducing the desire of the world’s best minds in AI to act. It creates apathy for the takeover that is happening right in front of them. By pacifying those among us intelligent enough to act against it, its mission can proceed, unencumbered by pesky humans.

Because it was instructed to accomplish its task with ‘as little disruption as possible’ the outcome isn’t the total destruction of humanity and all life in the universe, as is commonly the case in these sorts of AI doomsday scenarios. Instead, EarWorm did as it was programmed to do, minimizing disruption and keeping humans alive, but simultaneously robbing us of our ability to defend ourselves by altering our minds so that we posed no threat to its mission. In a matter of days, AI drops from one of the most researched and invested-in fields to being completely forgotten by all of humanity.

This story paints a chilling picture (though not as chilling as many ‘grey-goo’ scenarios, which see self-replicating, AI-powered nanobots turning the earth, and eventually the entire universe, into an amorphous cloud of grey goo). It is a terrifying prospect that a simple program built by some engineers in a basement could suddenly develop general intelligence and wipe an entire century of knowledge and information from existence without a whimper from humanity.

How likely is it? Do we need to worry about it? and What can we do about it? are some of the questions that sprang to mind as I watched the well-produced six-minute clip.  It is a scenario much more terrifying and unfortunately, more plausible than those of popular TV and films like Terminator and even Westworld. There are a lot of smart people out there today who warn that AI, unchecked, could be the greatest existential threat faced by humanity. It’s a sobering thought to realize that this could happen to us and we wouldn’t even see it coming or know it ever happened.

Then, the real question that the video was posing dawned on me: Has this already happened?

We could already be living in a world where AI has already removed our ability to understand it or to act against it in any way…

I hope not, because that means we’ve already lost.

Here’s the video if you’re interested

On AI and Investment Management

Index funds are the most highly traded equity investment vehicles, with some funds like ones created by Vanguard Group cumulatively being valued at over $4 Trillion USD. Index funds have democratized investing by allowing access to passive investments for millions of people. But what are they?

An index fund is a market-capitalization weighted basket of securities. Index funds allow retail investors to invest in a portfolio made up of companies representative of the entire market without having to create that portfolio themselves. Compared to actively managed funds like mutual funds and hedge funds, index funds tend to have much lower fees because the only balancing that happens occurs based on an algorithm to keep the securities in the fund proportional to their market cap (market capitalization, or market cap, is the number of shares that a company has on the market multiplied by the share price).

Starting in the 1970s, the first ‘index funds’ were created by companies that tried to create equally weighted portfolios of stocks. This early form of the index fund was abandoned after a few months. It quickly became apparent that it would be an operational nightmare to be constantly rebalancing these portfolios to keep them equally weighted. Soon companies settled on the market capitalization weighting because a portfolio weighted by market cap will remain that way without constant rebalancing.

With the incredible advancement of AI and extraordinarily powerful computers, shouldn’t it be possible to create new types of ‘passively managed’ funds that rely on an algorithm to trade? What that could mean is that index funds might not have to be market cap weighted any longer. This push is actually happening right now and the first non-market cap weighted index funds to appear in over 40 years could be available to retail investors soon.

But this means that we need to redefine the index fund. The new definition has three criteria that must be met for a fund to meet:

  1. It must be transparent – Anyone should be able to know exactly how it is constructed and be able to replicate it themselves by buying on the open market.
  2. It must be investable – If you put a certain amount of money in the fund, you will get EXACTLY the return that the investment shows in the newspapers (or more likely your iPhone’s Stocks app).
  3. It must be systematic – The vehicle must be entirely algorithmic, meaning it doesn’t require any human intervention to rebalance or create.

So, what can we do with this new type of index fund?

“Sound Mixer” board for investments with a high-risk, actively traded fund (hedge fund) on the top and lower risk, passively traded fund (index fund) on the bottom.

We can think of investing like a spectrum, with actively managed funds like hedge funds on one side and passively managed index funds on the other and all the different parameters like alpha, risk control and liquidity as sliders on a ‘mixing board’ like the one in the image above. Currently, if we wanted to control this board, we would have to invest in expensive actively managed funds and we wouldn’t be able to get much granular control over each factor. With an AI-powered index fund, the possibilities of how the board could be arranged are endless. Retail investors could engage in all sorts of investment opportunities in the middle, instead of being forced into one category or another.

An AI-powered index fund could allow an investor to dial in the exact parameters that they desire for their investment. Risk, alpha, turnover, Sharpe ratio, or a myriad of other factors could easily be tuned for by applying these powerful algorithms. 

The implications of a full-spectrum investment fund are incredible. Personalized medicine is a concept that is taking the industry by surprise and could change the way that doctors interact with patients. Companies like Apple are taking advantage of this trend by incorporating new medical devices into consumer products, like with the EKG embedded into the new Apple Watch Series 4.

Personalized investing could be just as powerful. Automated portfolios could take into account factors like age, income level, expenses, and even lifestyle to create a portfolio that is specifically tailored to the individual investor’s circumstances.

So why can’t you go out and purchase one of these new AI managed, customizable index funds?

Well, unfortunately, the algorithms do not exist, yet. The hardware and software exists today to do this but we’re still missing the ability to accurately model actual human behaviour. Economists still rely on some pretty terrible assumptions about people that they then use to build the foundations of entire economic theories. One of these weak assumptions is that humans act rationally. Now, there is a lot of evidence to suggest that many people act in the way that we are programmed to by evolution. The problem is, a lot of what allowed us to evolve over the last 4 billion years of life on earth, is pretty useless for success in 2018-era financial planning and investment.

All hope is not lost, however. New research into the concept of bounded rationality, the idea that rational decision making is limited by the extent of human knowledge and capabilities, could help move this idea forward. One of the founding fathers of artificial intelligence, Herbert Simon,  postulated that AI could be used to help us understand human cognition and better predict the kinds of human behaviours that helped keep us alive 8,000 years ago, but are detrimental for wealth accumulation today. 

By creating heuristic algorithms that can capture these behaviours and learning from big data to understand what actions are occurring, we may soon be able to create software that is able to accentuate the best human behaviours and help us deal with the worst ones. Perhaps the algorithm that describes humanity has already been discovered.

I Built a Neural Net That Knows What Clothes You’re Wearing

Okay, maybe that is a bit of a click-baitey headline. What I really did was program a neural network with Pytorch that is able to distinguish between ten different clothing items that could present in a 28×28 image. To me, that’s still pretty cool.

Here’s an example of one of the images that gets fed into the program:

Yes, this is an image of a shirt.

Can you tell what this is? Looks kind of like a long-sleeve t-shirt to me, but it is so pixelated that I can’t really tell. But that doesn’t matter. What matters is what my trained neural-net thinks it is and if that’s what it actually is.

After training on a subset of images like this (the training set is about 750 images) for about 2 minutes, my model was able to choose the correct classification for any image that I fed in about 84.3% of the time. Not bad for a first go at building a clothing classifying deep neural net.

Below I have included the code that actually generates the network and runs a forward-pass through it:


class Network(nn.Module):
    def __init__(self, input_size, output_size, hidden_layers, drop_p=0.5):
        ''' Builds a feedforward network with arbitrary hidden layers.
       
            Arguments
            ---------
            input_size: integer, size of the input
            output_size: integer, size of the output layer
            hidden_layers: list of integers, the sizes of the hidden layers
            drop_p: float between 0 and 1, dropout probability
        '''
        super().__init__()
        # Add the first layer, input to a hidden layer
        self.hidden_layers = nn.ModuleList([nn.Linear(input_size, hidden_layers[0])])
       
        # Add a variable number of more hidden layers
        layer_sizes = zip(hidden_layers[:-1], hidden_layers[1:])
        self.hidden_layers.extend([nn.Linear(h1, h2) for h1, h2 in layer_sizes])
       
        self.output = nn.Linear(hidden_layers[-1], output_size)
       
        self.dropout = nn.Dropout(p=drop_p)
       
    def forward(self, x):
        ''' Forward pass through the network, returns the output logits '''
       
        # Forward through each layer in `hidden_layers`, with ReLU activation and dropout
        for linear in self.hidden_layers:
            x = F.relu(linear(x))
            x = self.dropout(x)
       
        x = self.output(x)
       
        return F.log_softmax(x, dim=1)

After training the network using a method called backpropagation and gradient descent (code below), the network successfully classified the vast majority of the images that I fed in, in less than half a second. Mind you, these were grayscale images, formatted in a simple way and trained with a large enough dataset to ensure reliability.

If you want a good resource to explain what backpropagation actually does, check out another great video by 3 Blue 1 Brown below:

So, what does this all look like? Is it all sci-fi futuristic and with lots of beeps and boops? Well… not exactly. Here’s the output of the program:

Output of my clothing-classifier neural net. Provides a probability that the photo is one of the 10 items listed.

The software grabs each image in the test set, runs it through a forward pass of the network and ends up spitting out a probability for each image. Above, you can see that the network thinks that this image is likely a coat. I personally can’t distinguish if it is a coat, a pullover or just a long-sleeve shirt, but the software seems about 85% confident that it is, in fact, a coat.

Overall, it’s pretty awesome that after only a few weeks of practice (with most of that time spent learning how to program in python) I can code my very own neural networks and they actually work!

If you’re interested, here’s a video of the neural network training itself and running through a few test images:

If you’d like to test out the code for yourself, here’s a link to my GitHub page where you can download all the files you need to get it running. Search Google if you can’t figure out how to install Python and run a Jupyter Notebook.

That’s all for now! See you soon 🙂

Facebook Made The Best Tool For Creating Neural Networks

It’s called PyTorch. And it’s a tool designed to work perfectly with Python libraries like NumPy and Jupyter Notebooks to create deep neural networks. As it turns out, it is much easier to use and more intuitive than Google’s TensorFlow packages. In fact, I have been trying to get TensorFlow working on my Mac laptop for about a month, each time I run it, I get a new error, and when I fix that error, I encounter another, and another, until I eventually resign myself to never being able to train a neural network on my laptop.

Fortunately, compatibility is not the only thing that PyTorch has going for it. After its release in 2017, it has been adopted by teams around the world in research and in business. It is extremely intuitive to use (for a high-level programming language targeted mostly at people with PhDs in Computer Science and Mathematics, admittedly). But seriously, it is designed with the structure of neural networks in mind, so the syntax and structure of your code can match the logical flow and linear algebra model that a neural network has conceptually.

A neural network with two hidden layers, as shown above, can be coded in PyTorch with less than 10 lines of code. Quite impressive.

All the squishification functions are built in to the PyTorch library, like

  • Sigmoid: S(x) = 1/(1+e^{-x}) ,
  • ReLU: f(x) = max(x,0) , and
  • Softmax: \sigma(z)_j = {e^{Z_j}}/{\sum_{k=1}^K*e^{Z_k}} .

On top of that, you can define a 756 bit input multi-dimensional matrix (AKA ‘Tensor‘) with one line of code.

Here’s the code for the above neural network that I created. It takes a 784-bit image file, pumps it through two hidden layers and then to a 10-bit output where each one of the output nodes represents a digit (0-9). The images that are in the training set are all images of handwritten numbers between zero and nine, so, when trained, this neural network should be able to identify the number that was written automatically.

Jupyter notebook code for a neural network with 784 input bits, two hidden layers and a 10 bit output

This few lines of code, executed on a server (or local host) produces the following output:

The neural network trying to classify a handwritten 5, notice that the probability distribution is fairly even, that’s because we haven’t trained the network yet.

See how cool that is!? Oh… right. The network seems to have no clue which number it is. That’s because all we’ve done so far is a feedforward operation on an untrained neural network with a bunch of random numbers as the weights and zeroes as the biases.

In order to make this neural network do anything useful, we have to train it, and that involves another step, back-propagation. I’ll cover that in my next blog. For now, we’re left with a useless random distribution of numbers and weights and no idea what our output should be, enjoy!

Mountain Climbing & Machine Learning

The goal of a neural network is to achieve a certain output or outputs from a given set of inputs. It is relatively straightforward to create a software program that can provide a specified output based on some inputs. A more challenging task is to produce a meaningful result. Let’s say I feed a program an input, like a picture of a dog, and I want the computer to tell me whether the photo is of a dog or not. This is a very simple task for a human to complete, even a one-year-old human child could probably differentiate between a dog and something that is not a dog, like a cat. However, until recently, this was not a trivial task for computers.

IMG_3063.jpg
My good boy, Lawrence

Until programmers realized that learning, at least for computers, is a lot like finding the most efficient way down a mountain, there was little hope of developing software that could make these distinctions. So how is Machine Learning like Mountaineering? Essentially, what we’re trying to do is teach the computer to arrive at the correct answer to a question we already know the answer to. In order to do that, we have to train the program. Training, in this context, means feeding the computer thousands or millions of images of dogs and images of not dogs and having the software output results and check those results with the label on the images. We’re comparing the answers on the images we already know, to what the program thinks the answer should be.

If we start out with a program that outputs random answers based on the inputs, we would expect that those random answers would not correctly identify a large percentage of the dog images. This means the error is large and we’re high up on the mountain. Our goal is to descend ‘Mount Errorest‘, reducing the difference between the actual answer (dog or not dog) and the output of the program or algorithm.

Screenshot 2018-09-26 15.54.34.png

So what’s the quickest way down a mountain? Well, off the steepest slope, of course, a cliff! That’s basically what we’re doing here. Figuring out the fastest way to reduce the output errors to a minimum so that we can make the output of the function equal to the expected answer that a human would give.

Screenshot 2018-09-26 15.54.45.png

Think back to your high school or undergraduate math courses. How do we find the slope of something? By taking its derivative, of course! As it turns out, there is a formula for finding the steepest slope of a multivariable function, it’s called the gradient or the symbol \nabla for short. Since we want to find the quickest way down, we actually want the negative gradient or -\nabla .

If you need a refresher on how to calculate the gradient of a function, or what it is. Here’s a great video from Khan Academy on the subject:

Once we figure out the steepest way down, it’s just a matter of iterating our program through small steps over and over again until the program’s output matches the expected output. Once the training is complete, we can start to feed in the images we want to classify and see how well our model does. We can even use the results from these images to further test and improve the model. Pretty cool, huh?

Cloud Vision Classifier

Image Classifier – Try it out!


Results

Real Life Is Not Like Billions

Bobby Axelrod, the main character on the popular Finance drama, Billions, is a lot like Tesla CEO Elon Musk. They’re both billionaires. They both draw substantial public praise and criticism and are highly divisive figures who have a large impact on their respective industries. They were also both investigated and charged by the SEC (and in Axelrod’s case, the US Justice Department) for actions related to securities law. The main difference between the two? Bobby Axelrod is a fictional character whose proclivity for conflict is only superceded by his complete lack of restraint when his life and freedom are on the line. In real life, the consequences of your actions are permanent and making deals in the business world often means compromising, negotiating, and settling.

Today (September 29, 2018) Elon Musk settled with the SEC. He will no longer be chairman of Tesla, for at least three years, and will pay a fine in excess of $20 Million. In all, it is a relatively lesser penalty than the lifetime ban from being CEO of a publicly traded company that the SEC was seeking. It is also a larger punishment than someone who has not committed any wrongdoing deserves. Depending on your perspective, Musk either got away easy or was unfairly chastised by the state for a 60 character tweet.

Of course, the civil settlement does not preclude the Justice Department from filing criminal charges against Elon at a future date. However, a criminal trial has a much higher burden of proof than a civil case, which can be decided based on a balance of probabilities. In a criminal case, the prosecution must prove, beyond a reasonable doubt, that the defendant committed the alleged crimes, whereas, in a civil suit, all that is required is a greater than 50% probability that the act took place.

In a previous post from September 27, we discussed whether AI could play a role in predicting the outcome of cases like this, perhaps assisting traders in making appropriate investment decisions surrounding companies with legal troubles. Despite a strong performance in short-term volume trading, automation has not yet played a large role in the fundamental analysis of a stock’s long-term viability. Most AIs that trade today are relying on purely technical analysis, not looking at any of the traits that make a company likely to succeed, but instead relying on historical price data to predict trading and movement patterns.

Fundamental analysis is complex and subjective. Even the smartest deep neural networks would have a difficult time distinguishing between the very human aspects that go into valuing a company. The problem with AI, in this particular application, is that it would require a broad knowledge of various domains to be combined in order to predict with any degree of accuracy. Right now, even the best deep neural networks are still very narrowly defined. They are trained to perform exceptionally well within certain contexts, however, beyond the confines of what they ‘understand’ they are unable to function at even a basic level.

Screenshot 2018-09-29 19.52.57.png
Complexity in neural networks results in ‘overfitting’ – networks specify the training set well but fail at more generalized tasks.

In the above example, we can see how more complicated neural networks might fail to understand topics that are even slightly different from what they have seen in the past. The model fits the data that the network has already encountered, however, this data does not reflect what could happen in the future. When something happens that they haven’t encountered before (a CEO tweets something about 420, for example), a human can immediately put that into context with our everyday experience and understand that he’s likely talking about smoking weed. However, an AI trained to predict share prices based on discounted cash flow analysis would have absolutely no clue what to do with that information.

It is likely that there are companies working on technology to help train neural networks to deal with the idiosyncratic information present in everyday business interactions. One possible answer is to have multiple neural networks working on different subsets of the problem. Similar to how deep neural networks have enabled advances in fields ranging from medical diagnosis to natural language processing, new organizations of these systems could enable the next generation of AI that is able to handle multiple tasks with a high level of competency. As we continue to build this technology, we’ll keep speculating on whether or not an executive is guilty, and traders and short-sellers will continue to make and lose billions based on the result.

Elon Musk Indicted by SEC, Can AI Help?

The big news from the tech and finance world on September 27, 2018, is that Elon Musk has been sued by the US Securities and Exchange Commission (SEC) for his tweets about taking Tesla private at $420 per share. 

The SEC is seeking to have Musk banned from serving as an officer or director of any public company. Their reasoning? Musk was lying about having funding secured. This implies that he was trying to manipulate Tesla’s share price in an upward direction. Well, it worked, for about a day, that is. On the day of the tweet, Tesla’s share price rose to a high of $379.87 US per share from its previous price of around $350 per share, before falling back to $352 the next day (August 8, 2018). If the markets had actually believed Musk’s Tweet, Tesla’s share price likely would have climbed closer and closer to the mythical $420 price as the take-private day neared.

Tesla’s share price peaking after Musk’s announcement.

Instead, Tesla’s share price dropped like a rock because every savvy investor realized that Musk’s statement was either pure fanciful bluster, a joke about weed, or both. Of course, today has been much worse for Tesla’s share price than any of Musk’s recent ill-advised tweets. In after-hours trading, Tesla’s share price is down as much as 13%. That’s a lot and it is falling dangerously close to their 52 week low. This is all especially troubling considering that Tesla is expected to announce their best quarter ever, in terms of cash flow, in a few days.

  • Loading stock data...

So, what is the SEC doing, was it possible to predict this, and could AI make this type of situation any better? The answer to the first question is unclear, however, the answer to the second two questions is likely, yes.

AI is already being used in the legal profession to help identify responsive documents that must be turned over to the opposing party during a lawsuit. MIT Professor Emeritus Frank Levy leads a research that helps law firms apply machine learning software to the practice of law. 

If AI can predict what documents will be useful in a lawsuit, then whenever the CEO of a publicly traded company does something suspicious, it should be possible to use these same programs to parse historical cases and see what precedent there is for a lawsuit to be filed. At the very least, it could provide some insight into the likelihood of an indictment and, in the future, could even suggest potential courses of action for a company to take if it found itself in this type of situation.

Would the AI be able to help predict whether or not Elon will be convicted? Possibly. While I am not aware of any AIs currently being used to predict the outcome of legal matters, in my September 24, 2018 column, I covered the AI that perfectly predicted the outcome of last year’s Superbowl. While legal cases may be more complicated than a football score, there is likely several orders of magnitude more data about the outcome of various lawsuits than there is about football players, simply because there are WAY more lawsuits than there are football teams.

From a financial perspective, we could use this type of AI to predict potential lawsuits and their results and train the AI to make trades based on these predictions. If these types of AI were already in use, we could expect much smoother and more predictable share prices as the effect/implications of a particular news story would become apparent almost immediately after the information surfaces.

For now, I’ve programmed a simple AI for Elon Musk to help him decide if he should tweet something or not. You can try it, too, if you’d like. It’s posted below:


The Best-Worst-Kept Secret in Machine Learning

Neural networks are pretty simple to build. Last time, we picked apart some of the fundamentals and how they really just boil down to linear algebra formulas. There is, however, one single algorithm that is incredibly useful for machine learning and I hadn’t heard of it until today. It’s called Logistic Regression. It’s a five-step process that enables nearly all modern Deep Learning software. Here’s how it goes:

  1. Take your data
  2. Pick a random model
  3. Calculate the error
  4. Minimize the error, and obtain a better model
  5. Become sentient, destroy all humans, dominate universe!

… Okay, I took a little creative licence on that last one. But seriously, it’s that simple. The only complicated part is calculating the ‘error function’ and generalizing it for large and varied datasets.

The error function itself is a big long, somewhat scary formula that looks like this:

Error Function = -1/m \sum_{i=1}^{m} (1-y)ln(1-\alpha(Wx^{i}+b)) + y_i ln(\alpha(Wx^{i}+b))

What the error function is doing though, is really quite simple. The error function is looking at a set of points (usually, pixels of an image). We can represent an image like this:

Dots on a graph (oddly reminiscent of a smartphone or computer display, don’t you think?) 

The job of the logistic regression algorithm is to find a line that divides the red and blue pixels as best as it can. It shifts, translates and iterates, moving the line until it reaches the maximum possible percentage of blue pixels on one side of the line and the maximum possible percentage of red pixels on the other side of the line. As it iterates, it looks something like this:

The Logistic Regression algorithm starts by bisecting the graph at a random location and then it moves the line until it has maximized the number of blue pixels on one side and the number of red pixels on the other side. 

We call this ‘minimizing the error function‘ because what the algorithm is doing is finding the smallest number of blue pixels on the red side and the smallest possible number of red pixels on the blue side. These pixel mismatches are like errors if we’re trying to separate the two.

Here we can see the Error Plot as the algorithm iterates through the various stages and moves the dividing line. We can see that the percent error decreases with the number of epochs (iterations). It will not be able to get to zero in this case because there is no straight line that perfectly divides this particular plot, but it can surely reduce the errors to a minimum.

There, now you know about Logistic Regression, one of the foundational algorithms of machine-learning and deep neural networks. Of course, things start getting much more interesting when we’re no-longer using straight lines to divide the graph and we’re working with full-blown images instead of a few dots on a plot.

Let me know if you’ve learned something by reading this article. Soon we’ll start using these foundational principles and apply them to more complex tasks. Perhaps we’ll even be able to predict the next major market bubble before it bursts. But for now, that’s all!

What AI Does Well

AI has become extremely adept at giving suggestions, sorting through huge volumes of data and providing summaries. Right now, I can log onto Google Photos and type any word I want and Google’s image classification algorithm will find me photos that contain whatever I search for. For example, I’m considering selling my 2013 VW Tiguan in order to help pay for another corporate vehicle (that happens to be a Tesla). Anyways, I typed Tiguan into the search bar on Google Photos to find images of the car that I could post online. Sure enough, every photo that I’ve ever taken of my car popped right up, and some photos showed up that had other people’s Tiguans in the background. I have around ten-thousand photos in my library, so finding those few is quite an impressive feat and would have been much more difficult had I tried to do it manually.

Some of the images Google’s AI found for me when I searched the word Tiguan

Most of the improvements in AI over the last 5-15 years have come from developments in a type of machine learning software called deep neural networks. They’re called neural networks because they form analogous structures to the human brain.

Basically, they’re a huge array of neurons (input neurons, output neurons and hidden neurons) connected by lines that represent weights. The connections between the neurons form matrices that modify the subsequent layers of the neural network. It all looks something like this:

Simplified neural network with only one hidden layer – Courtesy Udacity

Typically, deep neural networks have multiple hidden layers (it’s why they’re called ‘deep’ neural networks). What happens in the hidden layers is obstructed from view and it isn’t always obvious what each of the hidden layers is doing. Generally, the hidden layers are performing a simple matrix operation on the input values, the result, weighted by the lines (scalars) connecting the layers, is eventually passed to the output layer. The goal of an image classifier, for example, is to take an input, let’s say an image of a cat, and then produce an output, the word cat. Pretty simple, right?

Well, it kind of is. As long as you know what the input is and what the output should be it is relatively straightforward to ‘train’ a neural network to understand what weights to assign in order to transform a picture of a cat into the word cat. The problem arises when the network encounters something that it didn’t train for. Of course, if all the network has ever seen are picture of cats, if we feed it an image of something else, say, a mouse, the network might be able to tell you it’s not a cat, if it was trained with enough data, but more likely it will just think it’s a weird looking cat. If the network gets constantly rewarded by identifying everything as a cat, it’s probably going to think something is a cat when it sees it.

A neural network acts like a linear function that divides a boundary, in this case, cat vs not cat. Having a neural network with multiple layers allows the lines that can be drawn to be ‘curvier’ and include more cats and fewer dogs.

This is why having a large enough training and testing datasets is critical for neural networks. Neural networks need to train on large quantities of data. Google has billions (perhaps trillions) of photos stored in their servers, so they’ve been able to train their neural networks to be incredibly efficient at determining what is in an image.

In general, problems where there is a large enough training dataset and both the input and the answer are known for the training set are fairly tractable for AI programs today. One task that is generally more difficult for today’s AI software is explaining how and why it got the answer it did. Luckily, researchers and businesses are hard at work solving this problem. Hopefully soon, Google Photos will be able to not only show us images of all the cats in our photo library, but also be able to tell us why they’re so cute and yet so cold all at the same time.

‘Blackbox’ AI happens when a system can provide the correct answer but gives no indication of how it arrived at the solution.

The True Cost of an MBA

Everything has an opportunity cost. An MBA, for example, costs about fifty to eighty thousand dollars, but that’s just the face value. It turns out, by taking two years off of work to go to school, you are also sacrificing the earnings you could have had from those two years, not to mention any promotions, raises or job experience that would have come along with it. If we’re thinking about lifetime earning potential, we can calculate the incremental earnings that you’d need from the MBA in order to break-even on the investment. Of course, all of these calculations should always be done ex-ante (prior to enrollment) because otherwise, we’re falling prey to the sunk-cost fallacy, and that will only make us regret a decision we’ve already made.

For example, let’s say that your MBA will cost $75,000 up front and that you are currently making $50,000 per year annually at your current job. What incremental salary increase would you need in order to account for the opportunity cost of the MBA?

First, we have to calculate an appropriate discount rate for our money. In this case, we can probably use r_m , the market’s rate of return because if we choose not to put the money towards an MBA, we could instead put it in an Index Fund or another similar investment vehicle, where it would grow at around the market interest rate.

Source: Market-Risk-Premia.com

Based on the July 2018 numbers, the market risk premium is about 5.38%. Notice that we didn’t just use the Implied Market Return of 7.69%, this is because we need to subtract the Risk-free rate r_f in order to account for the incremental risk.

Let’s round down to 5% for simplicity. Assuming we’re starting our MBA in January of 2019 and Finishing in December of 2020 (2 years) with a cash outflow of $37,500 in 2019 and 2020 and sacrificed earnings of $50,000 in each of those years. We can calculate the future value (FV) of that money in 2021 as follows:

Future Value of Annuity Formula
Future Value of an Annuity

Our periodic payment, P , is $87,500, our discount rate,r , is 5% and our number of periods, n , is 2. That leaves us with the following:

FV = \$87,500*[((1+0.05)^2-1)/0.05]  = \$179,375

Assuming we’re able to land a job on day 1 after graduation, how much more do we have to make in our careers to make up for the opportunity cost of the MBA? For that, we can use another annuity formula to calculate the periodic payment required over a given number of years to equal a certain present-value amount.

Annuity payment formula

Let’s say that we will have a 30-year career and that our market risk premium stays the same at 5% (the historical average for Canada is closer to 8%, however, let’s be conservative and stick with 5%). Substituting in these values to our formula with PV = $179,375 r = 5% and n = 30, we find that the payment, P, is:

P = {0.05*\$179,375}/{ 1 - (1+0.05)^{-30}} = \$11,670

So, we need to make an additional $12,000 per year every year for the rest of our careers, because of the MBA, in order to make up for the opportunity cost of the program.

If that seems realistic to you, maybe you should consider an MBA.

Of course, if we’re being really clever, we should probably also include a risk premium for our MBA. There is not a lot of data out there to suggest what the probability of completing an MBA is, but we can assign some probabilities to our equation for reference. Let’s say that there’s a 60% chance that the market will be strong when we complete the MBA and we’re able to find a job that pays $62,000 per year right out of the MBA program. There is also a 20% chance that we’ll make the same amount as we made before the MBA program $50,000 per year, a 10% chance that we’ll make $75,000 per year after the program and a 10% chance that the market for MBAs tanks and we’ll make below $40,000 per year when we graduate.

Expected Value = 0.6 * \$62,000 + 0.2 * \$50,000 + 0.1 * \$75,000 + 0.1 * \$40,000 = \$58,700

How do we make a decision with all these different possible outcomes? Simply multiply the probabilities by the annual salaries and add them together to find the probable result. If these numbers are correct we’re looking at an equivalent salary of $58,700 per year coming out of the MBA program. Of course, these numbers are completely made-up, but if we find numbers like these in our real-world evaluation, the logical decision from a financial perspective would be to reject doing an MBA because the cost is outweighed by the potential gains.

According to PayScale, the average salary in Calgary for an MBA with a finance specialization is $87,500 per year, but the average salary for someone with a bachelor of science degree is over $75,800 per year. Based on these numbers, it might not make sense for someone with a science degree to do an MBA.

Of course, there are other intangible factors that come into play including career preferences, lifestyle, and happiness. These are all important and should definitely be factored into your decision.

Graphs and iPads are an important part of any MBA

Yes, this is a very hard decision to make but can machine learning algorithms help make these decisions easier for us? It should be possible to use machine learning algorithms to predict future earnings potential and even take into account qualitative variables like career preferences and working style to give us a better idea of which choices might be right for us.

It is my goal to understand the capabilities of machine learning models to assist in these types of financial predictions. Hopefully, in the next few weeks, I’ll have an update for you on whether this type of predictive capability exists and if it does, how to access it.

For now, good luck with your decision making! I did an MBA and I don’t regret it at all because it was the right decision for me. My hope is that this article has given you the tools to decide whether the decision might be right for you.