Deep learning is something of a buzz-word nowadays. It crops up everywhere from recommending products on websites, to image recognition and fraud protection in banks. But what is deep learning? What does it mean?
Deep learning is something of a rebranding of a concept called Neural Networks. The idea behind neural networks was that we could mimic the structure of our brain to teach computer programs to learn. We can have lots of weighted connections between variables, just like synapses in our brain. Over time we can strengthen connections that are important, as we’re presented new evidence.
The image below shows the concept behind a simple neural network. This particular network has 3 inputs. Let’s say that the 3 inputs are the time of a credit card transaction, the time zone in which the transaction took place, and the amount of money transferred.
Our network has a layer of weights in the middle, this is what we call a “hidden layer”. Our hidden layer has 4 “nodes” and each node combines different variables.
From the output of our neural network we want to know the probability that the transaction is fraudulent. Our network has two outputs. One output represents the chance that the transaction is fraudulent. The other represents the chance it was genuine.
To train the network we need to gather a large amount of data, containing time, time zones, and amounts for the inputs, and whether or not the transaction was fraudulent.
Once the network is trained we can take information about a new transaction and make an educated guess about whether or not it is fraudulent.
The network that we specified in the previous section wouldn’t work very well. We only have one hidden layer, and the information that we have coming in is probably not even enough for a human to deduce whether a transaction was suspicious.
Also, the idea that we’re trying to mimic synapses is today seen as old-hat and a little sci-fi. What actually tends to be going on here is that we use well-known mathematical techniques like matrix multiplication. We still use the terminology weights, but these generally refer to matrix elements rather than synapse strengths.
What could we do to improve? We could add more inputs. Perhaps the average transaction amount for that credit card? What about the latitude and longitude of the current transaction vs the latitude, longitude and time of the last transaction? If someone spends $100 in one city, and 1 minute later spends $100 100 miles away then something is probably wrong.
Now we have 9 inputs, so we’re going to need more nodes in our hidden layer to manage these all these combinations right?
Well that used to be the paradigm, but actually, keeping a single hidden layer, or having a small number of hidden layers to manage combinations, doesn’t work too well. That is what is known as a “wide” network.
So this is where deep learning comes in. It turns out that it works much better to have lots of hidden layers with a smaller number of nodes than it does to have few hidden layers with lots of nodes.
It’s quite a simple difference, but the improvements in performance of deep neural networks compared to wide neural nets is huge.
To use our network with 9 inputs to generate a good indication of whether something is fraudulent or not we’d probably have to build and train a network that looked like the image below.
There is still no guarantee that our network will work well. We still need to ensure that we have good training data, the more the better. In the age of “big data” this is becoming much less of a problem.
Even given good data, we still need to pick the features that we use as inputs. That can be a hard task in itself, sometimes equally as hard as building and testing the network.
If you’d like to know more there is a great website by Google with an interactive network builder. The website uses TensorFlow which I used to build intelli.bet. There are some concepts we’ve not covered, such as activation functions, but it’s fun to play around with.