Upload
lorin
View
55
Download
0
Tags:
Embed Size (px)
DESCRIPTION
READINGS IN DEEP LEARNING. 4 Sep 2013. ADMINSTRIVIA. New course numbers (11-785/786) are assigned Should be up on the hub shortly Lab assignment 1 up Due date: 2 weeks from today Google group: is everyone on? Website issues.. Wordpress not yet an option (CMU CS setup) Piazza?. - PowerPoint PPT Presentation
Citation preview
READINGS IN DEEP LEARNING
4 Sep 2013
ADMINSTRIVIA
• New course numbers (11-785/786) are assigned– Should be up on the hub shortly
• Lab assignment 1 up– Due date: 2 weeks from today
• Google group: is everyone on?• Website issues..– Wordpress not yet an option (CMU CS setup)– Piazza?
Poll for next 2 classes
• Monday, Sep 9– The perceptron: A probabilistic model for
information storage and organization in the brain• Rosenblatt• Not really about the logistic perceptron, more about
the probabilistic interpretation of learning in connectionist networks
– Organization of behavior• Donald Hebb• About the Hebbian learning rule
Poll for next 2 classes
• Wed, Sep 11– Optimal unsupervised learning in a single-layer
linear feedforward neural network. • Terence Sanger• Generalized Hebbian learning rule
– The Widrow Hoff learning rule• Widrow and Hoff • Will be presented by Pallavi Baljekar
Notices
• Success of course depends on good presentations• Please send in your slides 1-2 days before the
presentations– So that we can ensure they are OK
• You are encouraged to discuss your papers with us/your classmates while preparing for them– Use the google group for discussion
A new project
• Distributed large scale training of NNs..
• Looking for volunteers
The Problem: Distributed data
• Training enormous networks– Billions of units
• from large amounts of data– Billions or Trillions of instances– Data may be localized..– Or distributed
The problem: Distributed computing
• A single computer will not suffice– Need many processors– Tens or hundreds or thousands of computers• Of possibly varying types and capacity
Challenge• Getting the data to the computers– Tons of data to many computers• Bandwidth problems• Timing issues
– Synchronizing the learning
Logistic Challenges
• How to transfer vast amounts of data to processors
• Which processor gets how much data..– Not all processors equally fast– Not all data take equal amounts of time to process
• .. and which data– Data locality
Learning Challenges
• How to transfer parameters to processors– Networks are large, billions or trillions of
parameters– Each processor must have the latest copy of
parameters
• How to receive updates from processors– Each processor learns on local data– Updates from all processors must be pooled
Learning Challenges• Synchronizing processor updates
– Some processors slower than others– Inefficient to wait for slower ones
• In order to update parameters at all processors
• Requires asynchronous updates– Each processor updates when done– Problem: Different processors now have different set of parameters
• Other processors may have updated parameters already
• Requires algorithmic changes– How to update asynchronously– Which updates to trust
Current Solutions
• Faster processors• GPUs– GPU programming required
• Large simple clusters– Simple distributed programming
• Large heterogeneous clusters– Techniques for asynchronous
learning
Current Solutions
• Still assume data distribution nota major problem
• Assume relatively fast connectivity– Gigabit ethernet
• Fundamentally cluster-computingbased – Local area network
New project
• Distributed learning
• Wide area network– Computers distributed across the world
New project
• Supervisor/Worker architecture
• One or more supervisors– May be a hierarchy
• A large number of workers• Supervisors in charge of resource and task
allocation, gathering and redistributing updates, synchronization
New project
• Challenges
• Data allocation– Optimal policy for data distribution• Minimal latency• Maximum locality
New project
• Challenges
• Computation allocation– Optimal policy for learning• Compute load proportional to compute capacity• Reallocation of data/task as
required
New project
• Challenges
• Parameter allocation– Do we have to distribute all parameters– Can learning be local
New project
• Challenges
• Trustable updates– Different processors/LANs have different speeds– How do we trust their updates• Do we incorporate or reject?
New project
• Optimal resychronization: how much do we transmit– Should not have to retransmit everything– Entropy coding?– Bit-level optimization?
Possibilities
• Massively parallel learning• Never ending learning• Multimodal learning • GAIA..
Asking for Volunteers
• Will be an open source project• Write to Anders
Today
• Bain’s theory: Lars Mahler– Linguist, mathematician, philosopher– One of the earliest people to propose connectionist
architecture– Anticipated much of modern ideas
• McCulloch and Pitts: Kartik Goyal– Early model of neuron: Threshold gates– Earliest model to consider excitation and inhibition