After the demise of the perceptron around 1961, AI research shifted its focus to the development of alternative methods that were based on formal logic (e.g., IF-THEN rules that describe knowledge). It took until about 1985 before neural networks regained popularity.
This is part II of our “Neural networks” series. Read part I here: Neural networks: the core of the AI revolution – Part I: Perceptrons
The main weakness of the perceptron was that it could learn to classify simple patterns only. Although it was generally recognised that more complex patterns could be classified by combining several perceptrons into a multilayered structure, it turned out to be very difficult to come up with a method to automatically tune the weight values of such a combined perceptron. The delta learning rule, which automatically sets the weights of a single perceptron could not be extended into a “generalised” delta learning rule that performed the same tasks for a multilayer variant.
The left illustration below shows a simple perceptron with two inputs and an output. Typically, the number of inputs is larger (as was the case in our 8-classification task in our previous blog), but we confined it to two for illustration purposes. The right illustration depicts a “multilayer” perceptron with two inputs and one output, which combines three perceptrons (or “neurons”) into a so-called “hidden layer.” Each arrow is a weight, a free parameter that has to be tuned to its appropriate value. Multilayer perceptrons can have more than one hidden layer. By increasing the number of neurons in the hidden layer or by adding additional hidden layers, the multilayer perceptron becomes more powerful, allowing for learning more difficult tasks.
The successful formulation and implementation of the generalised delta rule initiated the second neural network hype. With the generalised learning rule, multilayer perceptrons could be trained in a similar way that the simple perceptrons could be trained. The combination of perceptrons in a multilayer variant enabled the automatic learning of more difficult classification tasks.
Successful applications of multilayer perceptrons
Whereas the perceptron could only deal with very simple tasks (e.g., distinguishing very dissimilar images or patterns), the multilayer perceptron was able to learn more complex tasks (e.g., distinguishing images or patterns with subtle differences). We list three prominent examples of successful applications of multilayer perceptrons below.
- Text-to-speech translation: given a text, a trained multilayer perceptron could generate the speech in a “read out loud” fashion.
- Sonar classification: given sonar patterns of underwater obstacles, a trained multilayer perceptron could classify (for instance) rock from metal on the basis of their sonar reflection patterns.
- Evaluating loan application: given a client profiles, a trained multilayer perceptron could automatically decide on acceptance.
In 1987, the New York Times reported about these and other successes of multilayer perceptrons and cited Pentti Kanerva stating:
“I’m convinced that this will be the next large-scale computer revolution.”
Feverish hype and instabilities
Like the first wave, the second wave of neural network was also surrounded by feverish hype and high expectations. The reasoning was that adding more layers to a multilayer perceptron would improve its learning ability considerably. Really challenging tasks, such as natural image classification (e.g., deciding if a picture contains a cat or a dog) may become feasible.
Unfortunately, it quickly turned out that adding more than two layers to realize “deep” networks, led to numerical instabilities in the training procedure of multilayer perceptrons. As a result, deeper networks could not be trained and more challenging tasks, such as natural image recognition, could not be solved.
On top of this failure, many scientists were frustrated by the somewhat mysterious procedures surrounding the training of multilayer perceptrons. For instance, it was not clear how many hidden neurons and layers were needed to solve a given task. It required quite some experience to become familiar with proper settings of neural networks. Statisticians were amongst the fiercest criticizers of neural networks. They claimed that the methods lacked a proper statistical foundation and had far too many free parameters (the weights in the network) to make sense as a statistical model. In statistics (and in science in general) the adagio is to use the simplest model possible to make predictions. Multilayer perceptrons seemed to violate this adagio by using much more parameters than necessary.
The end of the second neural network wave
In the early nineties of the previous century, multilayer perceptrons were outperformed in prediction accuracy by so-called support vector machines. The enthusiasm for multilayer perceptrons waned quickly and it was generally assumed that neural networks were history.