READINGS IN DEEP LEARNING

READINGS IN DEEP LEARNING

4 Sep 2013

ADMINSTRIVIA

• New course numbers (11-785/786) are assigned– Should be up on the hub shortly

• Lab assignment 1 up– Due date: 2 weeks from today

• Google group: is everyone on?• Website issues..– Wordpress not yet an option (CMU CS setup)– Piazza?

Poll for next 2 classes

• Monday, Sep 9– The perceptron: A probabilistic model for

information storage and organization in the brain• Rosenblatt• Not really about the logistic perceptron, more about

the probabilistic interpretation of learning in connectionist networks

– Organization of behavior• Donald Hebb• About the Hebbian learning rule

Poll for next 2 classes

• Wed, Sep 11– Optimal unsupervised learning in a single-layer

linear feedforward neural network. • Terence Sanger• Generalized Hebbian learning rule

– The Widrow Hoff learning rule• Widrow and Hoff • Will be presented by Pallavi Baljekar

Notices

• Success of course depends on good presentations• Please send in your slides 1-2 days before the

presentations– So that we can ensure they are OK

• You are encouraged to discuss your papers with us/your classmates while preparing for them– Use the google group for discussion

A new project

• Distributed large scale training of NNs..

• Looking for volunteers

The Problem: Distributed data

• Training enormous networks– Billions of units

• from large amounts of data– Billions or Trillions of instances– Data may be localized..– Or distributed

The problem: Distributed computing

• A single computer will not suffice– Need many processors– Tens or hundreds or thousands of computers• Of possibly varying types and capacity

Challenge• Getting the data to the computers– Tons of data to many computers• Bandwidth problems• Timing issues

– Synchronizing the learning

Logistic Challenges

• How to transfer vast amounts of data to processors

• Which processor gets how much data..– Not all processors equally fast– Not all data take equal amounts of time to process

• .. and which data– Data locality

Learning Challenges

• How to transfer parameters to processors– Networks are large, billions or trillions of

parameters– Each processor must have the latest copy of

parameters

• How to receive updates from processors– Each processor learns on local data– Updates from all processors must be pooled

Learning Challenges• Synchronizing processor updates

– Some processors slower than others– Inefficient to wait for slower ones

• In order to update parameters at all processors

• Requires asynchronous updates– Each processor updates when done– Problem: Different processors now have different set of parameters

• Other processors may have updated parameters already

• Requires algorithmic changes– How to update asynchronously– Which updates to trust

Current Solutions

• Faster processors• GPUs– GPU programming required

• Large simple clusters– Simple distributed programming

• Large heterogeneous clusters– Techniques for asynchronous

learning

Current Solutions

• Still assume data distribution nota major problem

• Assume relatively fast connectivity– Gigabit ethernet

• Fundamentally cluster-computingbased – Local area network

New project

• Distributed learning

• Wide area network– Computers distributed across the world

New project

• Supervisor/Worker architecture

• One or more supervisors– May be a hierarchy

• A large number of workers• Supervisors in charge of resource and task

allocation, gathering and redistributing updates, synchronization

New project

• Challenges

• Data allocation– Optimal policy for data distribution• Minimal latency• Maximum locality

New project

• Challenges

• Computation allocation– Optimal policy for learning• Compute load proportional to compute capacity• Reallocation of data/task as

required

New project

• Challenges

• Parameter allocation– Do we have to distribute all parameters– Can learning be local

New project

• Challenges

• Trustable updates– Different processors/LANs have different speeds– How do we trust their updates• Do we incorporate or reject?

New project

• Optimal resychronization: how much do we transmit– Should not have to retransmit everything– Entropy coding?– Bit-level optimization?

Possibilities

• Massively parallel learning• Never ending learning• Multimodal learning • GAIA..

Asking for Volunteers

• Will be an open source project• Write to Anders

Today

• Bain’s theory: Lars Mahler– Linguist, mathematician, philosopher– One of the earliest people to propose connectionist

architecture– Anticipated much of modern ideas

• McCulloch and Pitts: Kartik Goyal– Early model of neuron: Threshold gates– Earliest model to consider excitation and inhibition

Documents

READINGS IN DEEP LEARNING