80
Statistical Learning Theory and Applications Class Times: Monday and Wednesday 1pm-2:30pm Units: 3-0-9 H,G Location: 46-5193 Instructors: C. Ciliberto, G. Evangelopoulos, C. Frogner, T. Poggio, L. Rosasco Web site: http://www.mit.edu/~9.520/ Office Hours: Friday 2-3 pm in 46-5156, CBCL lounge (by appointment) Email Contact : [email protected] 9.520 in 2014

Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Statistical Learning Theory and

ApplicationsClass Times: Monday and Wednesday 1pm-2:30pm Units: 3-0-9 H,G Location: 46-5193 !!!Instructors: C. Ciliberto, G. Evangelopoulos, C. Frogner, T. Poggio, L. Rosasco

Web site: http://www.mit.edu/~9.520/

Office Hours: Friday 2-3 pm in 46-5156, CBCL lounge (by appointment) !Email Contact : [email protected]

!9.520 in 2014

Page 2: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Class 3 (Wed, Sept 10): Mathcamps !!!• Functional analysis (~45mins) !!!!!!!!!!!!

• Probability (~45mins) !!!

Class http://www.mit.edu/~9.520/

Functional Analysis: Linear and Euclidean spaces scalar product, orthogonality orthonormal bases, norms and semi-norms, Cauchy sequence and complete spaces Hilbert spaces, function spaces and linear functional, Riesz representation theorem, convex functions, functional calculus.

Probability Theory: Random Variables (and related concepts), Law of Large Numbers, Probabilistic Convergence, Concentration Inequalities.

Linear Algebra Basic notion and definitions: matrix and vectors norms, positive, symmetric, invertible matrices, linear systems, condition number.

& Multivariate Calculus: Extremal problems, differential, gradient.

Page 3: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Course description

!!The class covers foundations and recent advances of Machine Learning in the framework of Statistical Learning Theory. • Classical methods such as Regularization Networks and Support Vector

Machines • State of the art techniques based on the concepts of geometry, sparsity, online

learning algorithms, feature selection, structured prediction and multitask learning.

• New final part of the course on connections between Radial Basis Functions and deep learning networks. Also new techniques for learning to learn good input representations.

The goal of this class is to provide students with the knowledge needed to use and develop effective machine learning solutions to challenging problems.

3

Page 4: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

!Rules of the game: !• problem sets (2) • final project: you have to give us title + abstract before November 26th • participation • Grading is based on Psets (27.5%+27.5%) + Final Project (32.5%) + Participation (12.5%) !Slides on the Web site (most classes on blackboard) Staff mailing list is [email protected] Student list will be [email protected] Please fill form! !send email to us if you want to be added to mailing list

Class http://www.mit.edu/~9.520/

Problem Set 1: Mo 29 Sept (Class 8) Problem Set 2: Wed 05 Nov (Class 19) Final Project Decision : Wed 26 Nov (Class 24)

Page 5: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Final Project (this year is different)

5

The final project should be a Wikipedia entry and only in exceptional cases a research project. !For the Wikipedia article we suggest to post 1-2 pages (short) using Wikipedia standard format (of course). !For the research project  (either Application or Theory) you should use the template on the Web site. !!!!

Page 6: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

6

Projects  2013  

!!!

!-­‐-­‐  Learning  to  rank  papers/grants:  replacing  review  panels  !-­‐-­‐  Simula8ons  of  associa8ve  memories  for  object  recogni8on:  bounds  on  #  items  stored,  noiseless  coding,  sparseness  !-­‐-­‐  The  surprising  usefulness  of  sloppy  arithme8c:  study  of  bits  and  their  tradeoff  in  hierarchical  architectures  !-­‐-­‐  Implement  and  test  least  square  algorithms  in  GURLS

Research Projects

Page 7: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Project: posting/editing article on Wikipedia (past examples below)

● Kernel methods for vector output: http://en.wikipedia.org/wiki/Kernel_methods_for_vector_output ● Principal component regression: http://en.wikipedia.org/wiki/Principal_component_regression ● Reproducing kernel Hilbert space: http://en.wikipedia.org/wiki/Reproducing_kernel_Hilbert_space ● Proximal gradient methods for learning: http://en.wikipedia.org/wiki/Proximal_gradient_methods_for_learning ● Regularization by spectral filtering: https://en.wikipedia.org/wiki/Regularization_by_spectral_filtering ● Online learning and stochastic gradient descent: http://en.wikipedia.org/wiki/Online_machine_learning ● Kernel embedding of distributions: http://en.wikipedia.org/wiki/Kernel_embedding_of_distributions ● Vapnik–Chervonenkis theory: https://en.wikipedia.org/wiki/VC_theory ● Deep learning: http://en.wikipedia.org/wiki/Deep_learning ● Early stopping and regularization: http://en.wikipedia.org/wiki/Early_stopping ● Statistical learning theory: http://en.wikipedia.org/wiki/Statistical_learning_theory ● Representer theorem: http://en.wikipedia.org/wiki/Representer_theorem ● Regularization perspectives on support vector machines: http://en.wikipedia.org/wiki/Regularization_perspectives_on_support_vector_machines ● Semisupervised learning: http://en.wikipedia.org/wiki/Semi_supervised_learning ● Bayesian interpretation of regularization: http://en.wikipedia.org/wiki/Bayesian_interpretation_of_regularization ● Regularized least squares (RLS): http://en.wikipedia.org/wiki/User:Bdeen/sandbox

7

Page 8: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Summary of today’s overview

• Motivations for this course: a golden age for new AI and the key role of Machine Learning !

• Statistical Learning Theory !

• Success stories from past research in Machine Learning: examples of engineering applications !

• A new phase in machine learning: computer science and neuroscience, learning and the brain, CBMM: the beginning of a new era in Machine Learning?

Page 9: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Summary of today’s overview

• Motivations for this course: a golden age for new AI and the key role of Machine Learning !

• Statistical Learning Theory !

• Success stories from past research in Machine Learning: examples of engineering applications !

• A new phase in machine learning: computer science and neuroscience, learning and the brain, CBMM:

• the beginning of a new era in Machine Learning?

Page 10: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

The problem of intelligence is one of the great problems in science, probably the greatest. !!Research on intelligence: • a great intellectual mission • will help develop intelligent machines !These advances will be critical to of our society’s

• future prosperity • education, health, security

The problem of intelligence: how it arises in the brain and how to replicate it

in machines

Page 11: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

At the core of the problem of Intelligence

is the problem of Learning

Learning is the gateway to understanding the brain and to

making intelligent machines. !!Problem of learning: a focus for o math o computer algorithms o neuroscience

Page 12: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

!• Learning is becoming the lingua franca of Computer

Science • Learning is at the center of recent successes in AI over the last 15

years • The next 10 year will be a golden age for technology based on

learning: Google, MobilEye, Siri etc. • The next 50 years will be a golden age for the science and

engineering of intelligence. Theories of learning and their tools will be a key part of this. !

Theory of Learning

Page 13: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Summary of today’s overview

• Motivations for this course: a golden age for new AI and the key role of Machine Learning !

• Statistical Learning Theory !

• Success stories from past research in Machine Learning: examples of engineering applications !

• A new phase in machine learning: computer science and neuroscience, learning and the brain, CBMM:

• the beginning of a new era in Machine Learning?

Page 14: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

LEARNING THEORY +

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experiments

ENGINEERING APPLICATIONS

!!• Bioinformatics • Computer vision • Computer graphics, speech synthesis, creating a virtual actor

How visual cortex works

Theorems on foundations of learning !Predictive algorithms

Learning:    Math,  Engineering,  Neuroscience  

Page 15: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

LEARNING THEORY +

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experiments

ENGINEERING APPLICATIONS

!!• Bioinformatics • Computer vision • Computer graphics, speech synthesis, creating a virtual actor

How visual cortex works

Theorems on foundations of learning !Predictive algorithms

Statistical  Learning  Theory  !

Page 16: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

INPUT OUTPUTfGiven a set of l examples (data)

!Question: find function f such that

is a good predictor of y for a future input x (fitting the data is not enough!)

Statistical Learning Theory:supervised learning

Page 17: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

(92,10,…)(41,11,…)

(19,3,…)

(1,13,…)

(4,24,…)(7,33,…)

(4,71,…)

Regression

Classification

Statistical  Learning  Theory:  supervised  learning  

Page 18: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

y

x

= data from f

= approximation of f

= function f

Generalization: estimating value of function where there are no data (good generalization means predicting the function well; important is for empirical or validation error to be a good proxy of the prediction error)

Statistical Learning Theory:prediction, not curve fitting

Page 19: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Statistical Learning Theory:part of mainstream math not just statistics

(Valiant, Vapnik, Smale, Devore...)

Page 20: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

The learning problem: summary so far

There is an unknown probability distribution on the productspace Z = X � Y , written µ(z) = µ(x , y). We assume that X isa compact domain in Euclidean space and Y a bounded subsetof R. The training set S = {(x1, y1), ..., (xn, yn)} = {z1, ...zn}

consists of n samples drawn i.i.d. from µ.

H is the hypothesis space, a space of functions f : X ⇤ Y .

A learning algorithm is a map L : Z n ⇤ H that looks at S andselects from H a function fS : x⇤ y such that fS(x) ⇥ y in apredictive way.

Tomaso Poggio The Learning Problem and Regularization

Statistical Learning Theory:supervised learning

Page 21: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Statistical Learning Theory

Page 22: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Consider a prototypical learning algorithm: ERM (empirical risk minimization) !!!What are the conditions ensuring generalization? !It turns out that choosing an appropriately simple hypothesis space H (for instance a compact set of continuous functions) can guarantee generalization

minf2H1

l

lX

i=1

V (f, zi)

Statistical Learning Theory:supervised learning

Page 23: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

J. S. Hadamard, 1865-1963

A problem is well-posed if its solution

exists, unique and

is stable, eg depends continuously on the data (here examples)

Statistical Learning Theory:the learning problem should be well-posed

Page 24: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Mukherjee, Niyogi, Poggio, Rifkin, Nature, March 2004

An algorithm is stable if the removal of any one training sample from any large set of samples results almost always in a small change in the learned function.

For ERM the following theorem holds for classification and regression !

ERM on H generalizes if and only if the hypothesis space H is uGC and if and only if ERM on H is CVloo stable

Statistical Learning Theory:theorems extending foundations of learning theory

Page 25: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Conditions for generalization in learning theory

have deep, almost philosophical, implications: !

they can be regarded as equivalent conditions that guarantee a

theory to be predictive (that is scientific) !

‣ theory must be chosen from a small set !‣ theory should not change much with new data...most of the time

Statistical Learning Theory:theorems extending foundations of learning

theory

Page 26: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Equation includes splines, Radial Basis Functions and SVMs (depending on choice of V).

implies

For a review, see Poggio and Smale, 2003; see also Schoelkopf and Smola, 2002; Bousquet, O., S. Boucheron and G. Lugosi; Cucker and Smale; Zhou and Smale...

Statistical Learning Theory:classical algorithms: Kernel Machines eg

Regularization in RKHS

Page 27: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

has a Bayesian interpretation: data term is a model of the noise and the stabilizer is a prior on the hypothesis space of functions f. That is, Bayes rule

leads to

Statistical Learning Theory:classical algorithms: Regularization

Page 28: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

implies

Classical learning algorithms: Kernel Machines (eg Regularization in RKHS)

Remark (for later use): !Kernel machines correspond to shallow networks

X1

f

Xl

Statistical Learning Theory:classical algorithms: Regularization

Page 29: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Two connected and overlapping strands in learning theory: !!

!q Bayes, hierarchical models, graphical models… !

q Statistical learning theory, regularization

Statistical Learning Theory:note

Page 30: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Summary of today’s overview

• Motivations for this course: a golden age for new AI and the key role of Machine Learning !

• Statistical Learning Theory !

• Success stories from past research in Machine Learning: examples of engineering applications !

• A new phase in machine learning: computer science and neuroscience, learning and the brain, CBMM: the beginning of a new era in Machine Learning?

Page 31: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

31

Supervised learning

!Since the introduction of supervised learning techniques 20 years ago, AI has made significant (and not well known) advances in a few domains: !

• Vision • Graphics and morphing • Natural Language/Knowledge retrieval (Watson and Jeopardy) • Speech recognition (Nuance) • Games (Go, chess,...)

Page 32: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center
Page 33: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

LEARNING THEORY +

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning !Predictive algorithms

Sung & Poggio 1995

Engineering of Learning!

Page 34: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center
Page 35: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

LEARNING THEORY +

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning !Predictive algorithms

Face detection is now available in digital cameras (commercial systems)

Engineering of Learning!

Page 36: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

LEARNING THEORY +

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning !Predictive algorithms

Papageorgiou&Poggio, 1997, 2000 also Kanade&Scheiderman

Engineering of Learning!

Page 37: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

LEARNING THEORY +

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning !Predictive algorithms

Papageorgiou&Poggio, 1997, 2000 also Kanade&Scheiderman

Engineering of Learning!

Page 38: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center
Page 39: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

LEARNING THEORY +

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experimentsHow visual cortex works

Theorems on foundations of learning !Predictive algorithms

Pedestrian and car detection are also “solved” (commercial systems, MobilEye)

Engineering of Learning!

Page 40: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center
Page 41: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

http://www.volvocars.com/us/all-cars/volvo-s60/pages/5-things.aspx?p=5

Page 42: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Computer Vision • Face detection • Pedestrian detection • Scene understanding • Video categorization

Decoding the Neural Code Bioinformatics Graphics Text Classification Artificial Markets …..

INPUT OUTPUT

Engineering of Learning!

Page 43: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Learning: read-out from the brain!

Page 44: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

The end station of the ventral stream in visual cortex is IT

Page 45: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

77 objects, 8 classes

Chou Hung, Gabriel Kreiman, James DiCarlo, Tomaso Poggio, Science, Nov 4, 2005

Reading-out the neural code in AIT

Page 46: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Recording at each recording site during passive viewing

100 ms 100 ms

• 77 visual objects • 10 presentation repetitions per object • presentation order randomized and counter-balanced

time

Page 47: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Example of one AIT cell

Page 48: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

INPUT OUTPUTfFrom a set of data (vectors of activity of n neurons (x) and object label (y)

!Find (by training) a classifier eg a function f such that

is a good predictor of object label y for a future neuronal activity x

Learning: read-out from the brain!

Page 49: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Decoding the neural code … using a classifier

x

Learning from (x,y) pairs

y ∈ {1,…,8}

Page 50: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Categorization

• Toy

• Body

• Human Face

• Monkey Face

• Vehicle

• Food

• Box

• Cat/Dog

Video speed: 1 frame/sec

Actual presentation rate: 5 objects/sec Neuronal population

activity

Classifier prediction

Hung, Kreiman, Poggio, DiCarlo. Science 2005

We can decode the brain’s code and read-out from neuronal populations:reliable object categorization (>90% correct) using ~200 arbitrary AIT “neurons”

Page 51: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

We can decode the brain’s code and read-out from neuronal populations:

reliable object categorization using ~100 arbitrary AIT sites

Mean single trial performance

• [100-300 ms] interval!

• 50 ms bin size

Page 52: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

LEARNING THEORY +

ALGORITHMS

COMPUTATIONAL NEUROSCIENCE:

models+experiments

ENGINEERING APPLICATIONS

!!• Bioinformatics • Computer vision • Computer graphics, speech synthesis • Neuroinformatics, read-out

How visual cortex works

Theorems on foundations of learning !Predictive algorithms

Engineering of Learning!

Page 53: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

⇒ Bear (0° view)

⇒ Bear (45° view)

Learning: image analysis!

Page 54: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

UNCONVENTIONAL GRAPHICS

Θ = 0° view ⇒

Θ = 45° view ⇒

Learning: image synthesis!

Page 55: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Blanz and Vetter, MPI SigGraph ‘99

Learning: image synthesis!

Page 56: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Blanz and Vetter, MPI SigGraph ‘99

Learning: image synthesis!

Page 57: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

A- more in a moment

Tony Ezzat,Geiger, Poggio, SigGraph 2002

Mary101

Page 58: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Phone Stream

Trajectory Synthesis

MMM

Phonetic Models

Image Prototypes

1. Learning !!!

System learns from 4 mins of video face appearance (Morphable Model) and speech dynamics of the

person

2. Run Time !

For any speech input the system provides as output a synthetic video

stream

Page 59: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center
Page 60: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

B-Dido

Page 61: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

C-Hikaru

Page 62: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

D-Denglijun

Page 63: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

E-Marylin

Page 65: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

G-Katie

Page 66: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

H-Rehema

Page 67: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

I-Rehemax

Page 68: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

L-real-synth

A Turing test: what is real and what is synthetic?

Page 69: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Tony Ezzat,Geiger, Poggio, SigGraph 2002

A Turing test: what is real and what is synthetic?

Page 70: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Summary of today’s overview

• Motivations for this course: a golden age for new AI and the key role of Machine Learning !

• Statistical Learning Theory !

• Success stories from past research in Machine Learning: examples of engineering applications !

• A new phase in machine learning: computer science and neuroscience, learning and the brain, CBMM: the beginning of a new era in Machine Learning?

Page 71: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Center Name Presenter Name

!!

The Center for Brains, Minds a Machines !Ignite Presentation

Page 72: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

72

Page 73: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Vision Accumulated knowledge and technology, now in place, enables a rapid leap in our scientific understanding of intelligence and our ability to replicate intelligence in engineered systems. !!Mission We aim to create a new field by bringing together computer scientists, cognitive scientists and neuroscientists to work in close collaboration. The new field – the Science and Engineering of Intelligence – is dedicated to developing a computationally centered understanding of human intelligence and to establishing an engineering practice based on that understanding.

Page 74: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Machine Learning Computer Science

Science + Technology of Intelligence

Convergence of progress: a key opportunity

Vision  for  CBMM

Cognitive Science

Neuroscience Computational Neuroscience

74

Page 75: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

MITBoyden,  Desimone  ,Kaelbling  ,  Kanwisher,    

Katz,  Poggio,  Sassanfar,  Saxe,    Schulz,  Tenenbaum,  Ullman,  Wilson,    

Rosasco,  Winston  

HarvardBlum,  Kreiman,  Mahadevan,    Nakayama,  Sompolinsky,  

 Spelke,  Valiant

CornellHirsh

Hunter Wellesley Puerto  Rico Howard

Allen  Ins:tuteKoch

RockefellerFreiwald

UCLAYuille

StanfordGoodman

Epstein,... Hildreth,  Conway... Bykhovaskaia,  Vega... Manaye,...

City  U.  HKSmale

Hebrew  U.Shashua  

IITMePa,  Rosasco,  

Sandini

MPIBuelthoff  

WeizmannUllman

GoogleNorvig

IBMLemnios

MicrosoHBlake

Genoa  U.Verri Raghavan

NCBS

Schlumberger        GE          Siemens

OrcamShashua

MobilEyeShashua

Rethink  Robo:cs

Brooks

Boston  DynamicsRaibert

DeepMindHassabis

A*starTan

Page 76: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

Thrust  5:  Theory  of  IntelligenceTomaso Poggio

Thrust  2:  Circuits  for  Intelligence

Thrust  3:  Visual  Intelligence

Thrust  4:  Social  Intelligence

Thrust  1:  Development  of  Intelligence

Josh Tenenbaum Gabriel Kreiman

Shimon Ullman Nancy Kanwisher

Page 77: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

77

Page 78: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

The  core  CBMM  challenge:  measuring  progress

The  core  challenge  is  to  develop  computaYonal  models  from  experiments  that  answer  quesYons  about  images  and  videos  such  as    •  what  is  there  /    who  is  there  /    what  is  the  person  doing  and  eventually  more  difficult  quesYons  such  as    •  who  is  doing  what  to  whom?    •  what  happens  next?  at  the  computaYonal,  psychophysical  and  neural  levels.    

78

CB

thg0t,g1t...gnt

functi

Page 79: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

The who question: face recognition from experiments to theory

CBMM  Challenge

Visual  Intelligence

Social  Intelligence

Neural  Circuits  of    Intelligence

Model  ML                              AL                  AM

Page 80: Statistical Learning Theory and Applications9.520/fall14/slides/class01/class01.pdf · •Learning is becoming the lingua franca of Computer Science • Learning is at the center

The first phase (and successes) of ML: supervised learning:

Remark:    a  paradigm  shi]    in  machine  learning?  

n→∞

The next phase of ML: unsupervised learning of invariant representations for learning: n→ 1