Juergen Schmidhuber:

20 years later everybody is talking about Deep Learning! A first milestone of deep learning research was the 1991 diploma thesis of Sepp Hochreiter [1], my very first student, who is now a professor in Linz. His work formally showed that deep neural networks are hard to train, because they suffer from the now famous "vanishing gradient" problem: in typical deep or recurrent networks, back-propagated error signals (e.g., [3]) vanish rapidly. They decay exponentially in the number of layers (or they explode). All subsequent deep learning research of the 1990s and 2000s was motivated by this insight.

The thesis is in German, but don't worry, all basic results are documented in the universal language of mathematics [1]. (Google Translate actually does a reasonable job on it.) 10 years later, an additional survey came out in English [2].

Our first deep learner of 1991 partially overcame the fundamental deep learning problem through unsupervised pre-training for a hierarchy of recurrent neural networks [4].

References

[1] Sepp Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut für Informatik, Technische Universität München, 1991.http://www.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf

(Alternative PDFs underhttp://www.bioinf.jku.at/publications/older.html - scroll all the way down.)

[2] Sepp Hochreiter, +Yoshua Bengio , +Paolo Frasconi, Jürgen Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, eds., A Field Guide to Dynamical Recurrent Neural Networks. IEEE press, 2001.ftp://ftp.idsia.ch/pub/juergen/gradientflow.pdf

[3] Paul J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974

[4] J. Schmidhuber. Deep Learning since 1991http://www.idsia.ch/~juergen/deeplearning.html

20 years later everybody is talking about Deep Learning! A first milestone of deep learning research was the 1991 diploma thesis of Sepp Hochreiter [1], my very first student, who is now a professor in Linz. His work formally showed that deep neural networks are hard to train, because they suffer from the now famous "vanishing gradient" problem: in typical deep or recurrent networks, back-propagated error signals (e.g., [3]) vanish rapidly. They decay exponentially in the number of layers (or they explode). All subsequent deep learning research of the 1990s and 2000s was motivated by this insight.

The thesis is in German, but don't worry, all basic results are documented in the universal language of mathematics [1]. (Google Translate actually does a reasonable job on it.) 10 years later, an additional survey came out in English [2].

Our first deep learner of 1991 partially overcame the fundamental deep learning problem through unsupervised pre-training for a hierarchy of recurrent neural networks [4].

References

[1] Sepp Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut für Informatik, Technische Universität München, 1991.http://www.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf

(Alternative PDFs underhttp://www.bioinf.jku.at/publications/older.html - scroll all the way down.)

[2] Sepp Hochreiter, +Yoshua Bengio , +Paolo Frasconi, Jürgen Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, eds., A Field Guide to Dynamical Recurrent Neural Networks. IEEE press, 2001.ftp://ftp.idsia.ch/pub/juergen/gradientflow.pdf

[3] Paul J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974

[4] J. Schmidhuber. Deep Learning since 1991http://www.idsia.ch/~juergen/deeplearning.html

- 2013-09-15 05:21:11Z
- Root Comment : 0 Sub Comment : 0
- Visited : 3
- Send Comment