Neural networks are at the core of the AI revolution. What are neural networks?
In fact, the proper name is “artificial neural networks,” because the real neural networks are biological entities to be found in brains. Artificial neural networks are algorithms that are roughly inspired by their biological counterparts. The basic components are called (artificial) neurons that can become active or inactive. In some neural networks, the neurons are like lamps in the sense that they can be “off” or “on”. In most modern neural networks, the neurons are like lamps with a dimmer, they can be “off” or “on”, but also in any intermediate state.
One of the simplest neural networks called the Perceptron was proposed by the psychologist Frank Rosenblatt in 1957. On July 8 1958, the New York Times reported about an electronic implementation of the Perceptron purchased by the Navy. The report stated:
The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” It is amazing to see that sixty years ago, the expectations about neural networks (or AI) sound like those of today.
The Perceptron consists of a single neuron that can be “on” or “off”. Its functioning bears some resemblance to a spider centred in its web. The neuron acts like the spider that senses and combines inputs through its filaments that emanate from the the boundary of the web towards the centre. The Perceptron processes labelled instances that consist of a sequences of numbers. In one of our previous blogs we gave the example of clients of a bank that had to be classified as “reliable” or “unreliable”. Here we use the example of recognising the hand-written digits “0” to “9”. Instances consist of a table or matrix of “pixels”. Each pixel is black if it is covered by ink and white otherwise.
The left figure above illustrates such an instance labelled “8”. The illustrated matrix contains 8 (columns) times 9 (rows) pixels. The black and white squares represent the pixels and are just a convenient representation for humans. For computer algorithms, such as Perceptrons, these are simply represented by zeros (white) and ones (black). Even the matrix format is a visual convenience for humans.
The right figure shows the input that the Perceptron receives. For visual convenience, the numbers are arranged in the same format as the rows and columns of the table, but the Perceptron simply receives an array of 72 (8 times 9) zeros and ones. Inspired by biological neurons that also receive inputs (from other neurons), the Perceptron weighs the importance of each input. Given a classification task, important inputs are assigned a higher weight than unimportant inputs.
Let us suppose that we want to build a Perceptron that decides if its input corresponds to a hand-written “8”. Whenever an “8” is presented at its input, it turns “on”. For any other digit it turns “off”.
What values should the weights have to create such an “8-detector”? The Perceptron switches “on” whenever the summed weighted inputs exceed a given threshold value. A simple procedure would be to assign high weights (let’s say 1) to the inputs (pixels) that are typically black for any hand-written “8” and zero weights for inputs that are typically white for any hand-written “8”.
Such weights also form a sequence of numbers, but can also be arranged like an 8 times 9 table as in the figure. In fact, with our selection of weight values, the weights may look very much like the inputs illustrated in the figure. In other words, the weights form a template of a typical hand-written “8”. If the template corresponds exactly to the input (as in our example), all inputs that are equal to 1 will be multiplied by their corresponding weights (which are all 1 as well). The rest of the inputs are 0 and will be multiplied by zero.
The total weighted sum would correspond to the number of black pixels (or ones) in the input: 15. If this number exceeds a given threshold, the Perceptron would switch “on” signalling the presence of an “8” in the input. What would be an appropriate value for the threshold? Imagine the digit “3” that looks very much like the “8” in the figure. It only misses a few pixels (let’s say 4) in comparison to the “8”. Presenting the “3” to the input of the Perceptron would lead to a weighted sum of 11 (15-4). So, to avoid confusion between an “8” and a “3”, the threshold should be larger than 11.
Automatic learning in the Perceptron
The manual setting of the weights of the Perceptron is quite a tedious endeavour. For many practical applications, it is very difficult to come up with the proper set of weight values. Fortunately, Frank Rosenblatt proposed an algorithm for setting the weights automatically. He called this rule the delta learning rule. The essence of the learning algorithm is to start with random weights and present the Perceptron with a training instance. Whenever the input is an “8” and the Perceptron is “off”, the delta learning rule changes the weights and the threshold value slightly in a direction that reduces the risk of future errors. After presenting many training instances to the Perceptron, it has automatically adapted the weights into values that lead to a good “8” detection performance.
The automatic learning capability of the Perceptron ignited enthusiastic reactions from the scientific (and non-scientific) community. It was generally considered to be the first step towards “brain-like” computers. The formulation of the Perceptron as a neural network definitely contributed to its popularity.
The end of the first neural network wave
The discovery of the Perceptron is now generally viewed as the first neural network wave. After the initial enthusiasm, the limitations of Perceptrons became apparent. One of the founding fathers of AI, Marvin Minsky (together with Seymour Papert) wrote a thorough analysis of the limitations of Perceptrons. Their findings were at least partially responsible for a declined interest in neural networks in the 1970ties. Instead, AI research into expert systems and knowledge systems flourished. Until about 1989 when the second wave of neural networks occurred.