The History and Near Future of Deep Learning
David Kammeyer Kammeyer Development
Big Data Beers 15.9.2015
What’s the big Deal?
Solving Problems that are Easy for Humans, Hard
for Computers
• Visual Recognition, including OCR • Speech Recognition • Natural Language Processing (Translation,
Sentiment Analysis
Where did this all come from?
1957: The PerceptronFrank Rosenblatt @ Cornell, MIT, ONR
How the Perceptron Works
Limitations and Winter #1
Perceptrons cannot learn the XOR function, or any nonmonotonic function.
Multilayer Perceptrons1989: Cybenko’s Universal Approximation theorem for
Single Hidden Layer Perceptrons
Backpropagation
Training Methods and Winter #2
• Just because you can represent a function as a single hidden layer net doesn’t mean you can learn it (Might need more layers to be able to learn)
• SVMs provided better learning guarantees
The Renaissance
Convolutional Neural NetworksLeCun, 1993
ImageNet 2012A. Krizhevsky’s AlexNet wins ImageNet Competition
Image CaptioningKarpathy 2015
What Changed?
GPUs
• 40x Speedup relative to CPUs, allows the training of much larger models
than before
Very Deep Models• Allows for Hierarchical Representation of Knowledge
Big Data
Newer TechniquesRNN, LSTM, Deep Q-Learning, New Activation
Functions, Max Pooling
What’s Next?
Faster Processing• Faster GPUs • FPGAs • ASICS
More Recurrence, Bidirectional Hierarchies
• LSTM and RNN models have taken over at the state of the art.
• Next step is Deep Recurrent models to capture conceptual hierarchies
• Will Require new learning algorithms
Hierarchical Representations in the Brain
Attentional ModelsAllow the network to sequentially focus attention on a
particular part of the input
Simulated (or Real) Worlds• Lots of Data Needed to Train Large Models • We’re going to have to Generate it, or Capture it from the Real World
More Researchers
Thanks!