10 years ago, in May 2015, we published the first working very deep gradient-based feedforward neural networks (FNNs) with hundreds of layers (previous FNNs had a maximum of a few dozen layers). To overcome the vanishing gradient problem, our Highway Networks used the residual connections first introduced in 1991 by @HochreiterSepp to achieve constant error flow in recurrent NNs (RNNs), gated through multiplicative gates similar to the forget gates (Gers et al., 1999) of our very deep LSTM RNN. Highway NNs were made possible through the work of my former PhD students @rupspace and Klaus Greff. Setting the Highway NN gates to 1.0 effectively gives us the ResNet published 7 months later. Deep learning is all about NN depth. LSTMs brought essentially unlimited depth to recurrent NNs; Highway Nets brought it to feedforward NNs.
20,65K