What AI Does Well

AI has become extremely adept at giving suggestions, sorting through huge volumes of data and providing summaries. Right now, I can log onto Google Photos and type any word I want and Google’s image classification algorithm will find me photos that contain whatever I search for. For example, I’m considering selling my 2013 VW Tiguan in order to help pay for another corporate vehicle (that happens to be a Tesla). Anyways, I typed Tiguan into the search bar on Google Photos to find images of the car that I could post online. Sure enough, every photo that I’ve ever taken of my car popped right up, and some photos showed up that had other people’s Tiguans in the background. I have around ten-thousand photos in my library, so finding those few is quite an impressive feat and would have been much more difficult had I tried to do it manually.

Some of the images Google’s AI found for me when I searched the word Tiguan

Most of the improvements in AI over the last 5-15 years have come from developments in a type of machine learning software called deep neural networks. They’re called neural networks because they form analogous structures to the human brain.

Basically, they’re a huge array of neurons (input neurons, output neurons and hidden neurons) connected by lines that represent weights. The connections between the neurons form matrices that modify the subsequent layers of the neural network. It all looks something like this:

Simplified neural network with only one hidden layer – **Courtesy Udacity**

Typically, deep neural networks have multiple hidden layers (it’s why they’re called ‘deep’ neural networks). What happens in the hidden layers is obstructed from view and it isn’t always obvious what each of the hidden layers is doing. Generally, the hidden layers are performing a simple matrix operation on the input values, the result, weighted by the lines (scalars) connecting the layers, is eventually passed to the output layer. The goal of an image classifier, for example, is to take an input, let’s say an image of a cat, and then produce an output, the word cat. Pretty simple, right?

Well, it kind of is. As long as you know what the input is and what the output should be it is relatively straightforward to ‘train’ a neural network to understand what weights to assign in order to transform a picture of a cat into the word cat. The problem arises when the network encounters something that it didn’t train for. Of course, if all the network has ever seen are picture of cats, if we feed it an image of something else, say, a mouse, the network might be able to tell you it’s not a cat, if it was trained with enough data, but more likely it will just think it’s a weird looking cat. If the network gets constantly rewarded by identifying everything as a cat, it’s probably going to think something is a cat when it sees it.

A neural network acts like a linear function that divides a boundary, in this case, cat vs not cat. Having a neural network with multiple layers allows the lines that can be drawn to be ‘curvier’ and include more cats and fewer dogs.

This is why having a large enough training and testing datasets is critical for neural networks. Neural networks need to train on large quantities of data. Google has billions (perhaps trillions) of photos stored in their servers, so they’ve been able to train their neural networks to be incredibly efficient at determining what is in an image.

In general, problems where there is a large enough training dataset and both the input and the answer are known for the training set are fairly tractable for AI programs today. One task that is generally more difficult for today’s AI software is explaining how and why it got the answer it did. Luckily, researchers and businesses are hard at work solving this problem. Hopefully soon, Google Photos will be able to not only show us images of all the cats in our photo library, but also be able to tell us why they’re so cute and yet so cold all at the same time.

Share this:

Like this: