24
READINGS IN DEEP LEARNING 4 Sep 2013

READINGS IN DEEP LEARNING

  • Upload
    lorin

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

READINGS IN DEEP LEARNING. 4 Sep 2013. ADMINSTRIVIA. New course numbers (11-785/786) are assigned Should be up on the hub shortly Lab assignment 1 up Due date: 2 weeks from today Google group: is everyone on? Website issues.. Wordpress not yet an option (CMU CS setup) Piazza?. - PowerPoint PPT Presentation

Citation preview

Page 1: READINGS IN DEEP LEARNING

READINGS IN DEEP LEARNING

4 Sep 2013

Page 2: READINGS IN DEEP LEARNING

ADMINSTRIVIA

• New course numbers (11-785/786) are assigned– Should be up on the hub shortly

• Lab assignment 1 up– Due date: 2 weeks from today

• Google group: is everyone on?• Website issues..– Wordpress not yet an option (CMU CS setup)– Piazza?

Page 3: READINGS IN DEEP LEARNING

Poll for next 2 classes

• Monday, Sep 9– The perceptron: A probabilistic model for

information storage and organization in the brain• Rosenblatt• Not really about the logistic perceptron, more about

the probabilistic interpretation of learning in connectionist networks

– Organization of behavior• Donald Hebb• About the Hebbian learning rule

Page 4: READINGS IN DEEP LEARNING

Poll for next 2 classes

• Wed, Sep 11– Optimal unsupervised learning in a single-layer

linear feedforward neural network. • Terence Sanger• Generalized Hebbian learning rule

– The Widrow Hoff learning rule• Widrow and Hoff • Will be presented by Pallavi Baljekar

Page 5: READINGS IN DEEP LEARNING

Notices

• Success of course depends on good presentations• Please send in your slides 1-2 days before the

presentations– So that we can ensure they are OK

• You are encouraged to discuss your papers with us/your classmates while preparing for them– Use the google group for discussion

Page 6: READINGS IN DEEP LEARNING

A new project

• Distributed large scale training of NNs..

• Looking for volunteers

Page 7: READINGS IN DEEP LEARNING

The Problem: Distributed data

• Training enormous networks– Billions of units

• from large amounts of data– Billions or Trillions of instances– Data may be localized..– Or distributed

Page 8: READINGS IN DEEP LEARNING

The problem: Distributed computing

• A single computer will not suffice– Need many processors– Tens or hundreds or thousands of computers• Of possibly varying types and capacity

Page 9: READINGS IN DEEP LEARNING

Challenge• Getting the data to the computers– Tons of data to many computers• Bandwidth problems• Timing issues

– Synchronizing the learning

Page 10: READINGS IN DEEP LEARNING

Logistic Challenges

• How to transfer vast amounts of data to processors

• Which processor gets how much data..– Not all processors equally fast– Not all data take equal amounts of time to process

• .. and which data– Data locality

Page 11: READINGS IN DEEP LEARNING

Learning Challenges

• How to transfer parameters to processors– Networks are large, billions or trillions of

parameters– Each processor must have the latest copy of

parameters

• How to receive updates from processors– Each processor learns on local data– Updates from all processors must be pooled

Page 12: READINGS IN DEEP LEARNING

Learning Challenges• Synchronizing processor updates

– Some processors slower than others– Inefficient to wait for slower ones

• In order to update parameters at all processors

• Requires asynchronous updates– Each processor updates when done– Problem: Different processors now have different set of parameters

• Other processors may have updated parameters already

• Requires algorithmic changes– How to update asynchronously– Which updates to trust

Page 13: READINGS IN DEEP LEARNING

Current Solutions

• Faster processors• GPUs– GPU programming required

• Large simple clusters– Simple distributed programming

• Large heterogeneous clusters– Techniques for asynchronous

learning

Page 14: READINGS IN DEEP LEARNING

Current Solutions

• Still assume data distribution nota major problem

• Assume relatively fast connectivity– Gigabit ethernet

• Fundamentally cluster-computingbased – Local area network

Page 15: READINGS IN DEEP LEARNING

New project

• Distributed learning

• Wide area network– Computers distributed across the world

Page 16: READINGS IN DEEP LEARNING

New project

• Supervisor/Worker architecture

• One or more supervisors– May be a hierarchy

• A large number of workers• Supervisors in charge of resource and task

allocation, gathering and redistributing updates, synchronization

Page 17: READINGS IN DEEP LEARNING

New project

• Challenges

• Data allocation– Optimal policy for data distribution• Minimal latency• Maximum locality

Page 18: READINGS IN DEEP LEARNING

New project

• Challenges

• Computation allocation– Optimal policy for learning• Compute load proportional to compute capacity• Reallocation of data/task as

required

Page 19: READINGS IN DEEP LEARNING

New project

• Challenges

• Parameter allocation– Do we have to distribute all parameters– Can learning be local

Page 20: READINGS IN DEEP LEARNING

New project

• Challenges

• Trustable updates– Different processors/LANs have different speeds– How do we trust their updates• Do we incorporate or reject?

Page 21: READINGS IN DEEP LEARNING

New project

• Optimal resychronization: how much do we transmit– Should not have to retransmit everything– Entropy coding?– Bit-level optimization?

Page 22: READINGS IN DEEP LEARNING

Possibilities

• Massively parallel learning• Never ending learning• Multimodal learning • GAIA..

Page 23: READINGS IN DEEP LEARNING

Asking for Volunteers

• Will be an open source project• Write to Anders

Page 24: READINGS IN DEEP LEARNING

Today

• Bain’s theory: Lars Mahler– Linguist, mathematician, philosopher– One of the earliest people to propose connectionist

architecture– Anticipated much of modern ideas

• McCulloch and Pitts: Kartik Goyal– Early model of neuron: Threshold gates– Earliest model to consider excitation and inhibition