1
SpiNNaker - a million coreARM-powered neural HPC
The Advanced Processor Technologies Group
Cameron [email protected]
School of Computer Science, The University of Manchester, UK
2
Outline
Motivation
SpiNNaker Architecture
Machines
Software
State of the Nation
Conclusions and Futures
3
Motivation
Ubiquity of parallelism
The human brain is the best example
Grand challenges
UK: “GC5: Architecture of Brain and Mind”
Can we learn from the brain?
As a processor technologies group
4
Brains demonstrate:massive parallelism (1011 neurons)
massive connectivity (1015 synapses)
excellent power-efficiency
low-performance components (~ 100 Hz)
low-speed communication (~ metres / s)
adaptivity – tolerant of component failure
autonomous learning
The Biological Brain
5
Simplified Structure
6
Taking Inspiration
The Grand Challenges - work both ways
By mimicking the brain, can we understand it?
Use it to perform 'unethical' experiments
Improve treatment regimes
Can we learn lessons from the biology?
Apply to parallel computing e.g.
Energy efficiency
Fault-tolerance
7
Artificial Neural Nets
Taxonomy
Three generations of neural modelling
Granularity of Simulation
8
Outline
Motivation
SpiNNaker Architecture
Machines
Software
State of the Nation
Conclusions and Futures
9
Network Scaling
Large-scale ANNs require lots of neuronsSpiNNaker's aim is 1 billion plausible neurons
Large-scale ANNs require lots of bandwidthIn the brain discrete 'wiring'
Resulting in:109 neurons * 10 Hz * 103 synapses = 1013 (10 trillion) network events / s
Often the limiting factor for large simulations
10
Biology vs Electronics
Luckily biology is 'slow' & electronics 'fast'
This is exploited in SpiNNaker
Model multiple neurons/synapses on a core
SpiNNaker models neurons in software on ARM
Quantity depends on fidelity required
Route spikes using AER
SpiNNaker has a rich interconnection fabric
Su pports very large number of small packets (spikes)
11
SpiNNaker Principles
Energy frugality
Low-power processors – ARM, embedded
GALS (Globally Asynchronous, Locally Synchronous)
Event-Driven
Redundancy18 cores per chip
6 links per chip
Real-time modelling
12
SpiNNaker Project
Multi-core SpiNNaker nodes
18 ARM968 cores
Programmable
Interconnects
Scalable up to 216 nodes in a system
over a million processors
>108 MIPS total
13
Flattened Topology
14
System On a Chip
Async Ext. Links
2 Async NoCs Comms
Packetised via Router
SystemShared resource
In package RAM
Ethernet
18 Proc. Nodes
15
Processor Node
ARM968E-SSynthesizable
Fixed Point
“Efficient”
32KB & 64KB instruction/data
Local peripheralsCustom DMAC
JTAG
16
Fabricated CMPUMC 130nmDie area - 101.64 mm2
>100 million transistors Power consumption:
1W at 1.2V, 180MHz
Peak performance ~4 GIPS
17
Choice of process technologyUMC 130nm 1.2V 1P8M Fusion process
Standard Performance & Low Leakage libraries
Mature, competitively priced
Physical LayoutAsync logic crafted with commercial EDA tools
Customized macrocells for key asynchronous circuits
Chip Design Considerations 1
18
Power OptimizationLow power embedded processors
Relatively low frequency – 180 MHz
32-bit fixed point arithmetic
Mobile DDR SDRAM
Idle processor cores put to sleep mode
Architecture and logic-level clock gating
Power-aware synthesis throughout the design flow
Chip Design Considerations 2
19
GALS Clocked Islands: 2 * cores @ 180MHz, router @ 180MHz, system
peripherals @ 100MHz, SDRAM @ 166 MHz
Fault Tolerance and MonitoringRedundancy: 18 cores, 6 bidir. I/O links, 2 PLLs
Runtime diagnostics, temps and reconfiguration
Diagnostic comms along with application traffic
Comms NoC – parity and framing error detection
DMA – optional CRC
Emergency Routing – for inter-chip comms failures
Chip Design Considerations 3
20
Emergency Routing
21
Outline
Motivation
SpiNNaker Architecture
Machines
Software
State of the Nation
Conclusions and Futures
22
SpiNNaker Board
3rd Generation SpiNNaker board
23
4th Generation Board
24
Hexagonal PCB structure
25
SpiNNaker Machines
26
Outline
Motivation
SpiNNaker Architecture
Machines
Software
State of the Nation
Conclusions and Futures
27
Software on SpiNNaker
SpiNNaker primarily for ANNsNot limited to this
Finite Element Analysis
Ray Tracing
Heat Diffusion
All require mapping from the problem space/graph to the machine itself
As this scales, the problem gets significant
28
Mapping Paths
PlaceRoute
SplitterA B C
Desc.Desc.
2 paths being developed for large machines
PACMAN – for mapping models to hardware
Partition And Configuration MANager
Machine and Model Libraries
29
Neural Simulation
Processors16 application + 1 monitor (+1 spare)
Simulate ~1000 neurons/proc.
SDRAM holds synaptic dataBrought to core by DMA across System NoC
Spikes coded as packetsBespoke router with multicast & point-to-point routing tables
and emergency routing mechanism
Source-addressed MC 'spike' packets over Comms NoC
30
Software Operation
API provides h/w abstraction for modelling3 main events ANN software deals with:
Packet Received
Buffer and request DMA
DMA Event
Read / update synapse table
Timer Event
Calculate and update neurons
31
Example ANN Problem
Constraint:10 neurons / core
(due to mem/cpu etc.)
A splits into 2 coresMapped to core 1&2
B, C and D map to 3-5
1:1
48
All:All
4 25% 12 4
50%
4
A
B
DC
Routes: 1↔3 & 2↔3 (A↔B), 1↔5 & 2↔5 (A↔D), 3↔5 (B↔D), 4↔5 (C↔D).
32
PyNN Integration
PACMAN
33
Real-Time I/O 1
34
Real-Time I/O 2
35
Real-Time I/O 3
36
Outline
Motivation
SpiNNaker Architecture
Machines
Software
State of the Nation
Conclusions and Futures
37
Full 18-core chip: arrived May 2011
4th Gen Card: 48 chips, 864 processors June 2012
Neuron models: LIF, Izhikevich, MLP
Synapse models: STDP, NMDA
Networks: PyNN, NEF (Nengo) -> SpiNNaker
various small tools to build Router tables, etc
Plans: 104 machine (Q4 2012), 105 (1H 2013), 50,000-chip 106 machine (Q3 2013).
Current Project Status
38
Outline
Motivation
SpiNNaker Architecture
Machines
Software
State of the Nation
Conclusions and Futures
39
SpiNNaker MPSoC Power-efficiency
Scalable communications
Programmability
Fault-tolerance
SpiNNaker machine
Massively-parallel, programmable platform
Aim to help neuroscientists to explore and understand information processing mechanisms in the brain
Other parallel applications too
Conclusions
40
Manchester Team
41
Any Questions
?http://apt.cs.man.ac.uk/projects/SpiNNaker/
Search the Web and YouTube for “SpiNNaker Manchester|chip”