SpiNNaker - a million core ARM-powered neural...

Preview:

Citation preview

1

SpiNNaker - a million coreARM-powered neural HPC

The Advanced Processor Technologies Group

Cameron Pattersoncameron.patterson@cs.man.ac.uk

School of Computer Science, The University of Manchester, UK

2

Outline

Motivation

SpiNNaker Architecture

Machines

Software

State of the Nation

Conclusions and Futures

3

Motivation

Ubiquity of parallelism

The human brain is the best example

Grand challenges

UK: “GC5: Architecture of Brain and Mind”

Can we learn from the brain?

As a processor technologies group

4

Brains demonstrate:massive parallelism (1011 neurons)

massive connectivity (1015 synapses)

excellent power-efficiency

low-performance components (~ 100 Hz)

low-speed communication (~ metres / s)

adaptivity – tolerant of component failure

autonomous learning

The Biological Brain

5

Simplified Structure

6

Taking Inspiration

The Grand Challenges - work both ways

By mimicking the brain, can we understand it?

Use it to perform 'unethical' experiments

Improve treatment regimes

Can we learn lessons from the biology?

Apply to parallel computing e.g.

Energy efficiency

Fault-tolerance

7

Artificial Neural Nets

Taxonomy

Three generations of neural modelling

Granularity of Simulation

8

Outline

Motivation

SpiNNaker Architecture

Machines

Software

State of the Nation

Conclusions and Futures

9

Network Scaling

Large-scale ANNs require lots of neuronsSpiNNaker's aim is 1 billion plausible neurons

Large-scale ANNs require lots of bandwidthIn the brain discrete 'wiring'

Resulting in:109 neurons * 10 Hz * 103 synapses = 1013 (10 trillion) network events / s

Often the limiting factor for large simulations

10

Biology vs Electronics

Luckily biology is 'slow' & electronics 'fast'

This is exploited in SpiNNaker

Model multiple neurons/synapses on a core

SpiNNaker models neurons in software on ARM

Quantity depends on fidelity required

Route spikes using AER

SpiNNaker has a rich interconnection fabric

Su pports very large number of small packets (spikes)

11

SpiNNaker Principles

Energy frugality

Low-power processors – ARM, embedded

GALS (Globally Asynchronous, Locally Synchronous)

Event-Driven

Redundancy18 cores per chip

6 links per chip

Real-time modelling

12

SpiNNaker Project

Multi-core SpiNNaker nodes

18 ARM968 cores

Programmable

Interconnects

Scalable up to 216 nodes in a system

over a million processors

>108 MIPS total

13

Flattened Topology

14

System On a Chip

Async Ext. Links

2 Async NoCs Comms

Packetised via Router

SystemShared resource

In package RAM

Ethernet

18 Proc. Nodes

15

Processor Node

ARM968E-SSynthesizable

Fixed Point

“Efficient”

32KB & 64KB instruction/data

Local peripheralsCustom DMAC

JTAG

16

Fabricated CMPUMC 130nmDie area - 101.64 mm2

>100 million transistors Power consumption:

1W at 1.2V, 180MHz

Peak performance ~4 GIPS

17

Choice of process technologyUMC 130nm 1.2V 1P8M Fusion process

Standard Performance & Low Leakage libraries

Mature, competitively priced

Physical LayoutAsync logic crafted with commercial EDA tools

Customized macrocells for key asynchronous circuits

Chip Design Considerations 1

18

Power OptimizationLow power embedded processors

Relatively low frequency – 180 MHz

32-bit fixed point arithmetic

Mobile DDR SDRAM

Idle processor cores put to sleep mode

Architecture and logic-level clock gating

Power-aware synthesis throughout the design flow

Chip Design Considerations 2

19

GALS Clocked Islands: 2 * cores @ 180MHz, router @ 180MHz, system

peripherals @ 100MHz, SDRAM @ 166 MHz

Fault Tolerance and MonitoringRedundancy: 18 cores, 6 bidir. I/O links, 2 PLLs

Runtime diagnostics, temps and reconfiguration

Diagnostic comms along with application traffic

Comms NoC – parity and framing error detection

DMA – optional CRC

Emergency Routing – for inter-chip comms failures

Chip Design Considerations 3

20

Emergency Routing

21

Outline

Motivation

SpiNNaker Architecture

Machines

Software

State of the Nation

Conclusions and Futures

22

SpiNNaker Board

3rd Generation SpiNNaker board

23

4th Generation Board

24

Hexagonal PCB structure

25

SpiNNaker Machines

26

Outline

Motivation

SpiNNaker Architecture

Machines

Software

State of the Nation

Conclusions and Futures

27

Software on SpiNNaker

SpiNNaker primarily for ANNsNot limited to this

Finite Element Analysis

Ray Tracing

Heat Diffusion

All require mapping from the problem space/graph to the machine itself

As this scales, the problem gets significant

28

Mapping Paths

PlaceRoute

SplitterA B C

Desc.Desc.

2 paths being developed for large machines

PACMAN – for mapping models to hardware

Partition And Configuration MANager

Machine and Model Libraries

29

Neural Simulation

Processors16 application + 1 monitor (+1 spare)

Simulate ~1000 neurons/proc.

SDRAM holds synaptic dataBrought to core by DMA across System NoC

Spikes coded as packetsBespoke router with multicast & point-to-point routing tables

and emergency routing mechanism

Source-addressed MC 'spike' packets over Comms NoC

30

Software Operation

API provides h/w abstraction for modelling3 main events ANN software deals with:

Packet Received

Buffer and request DMA

DMA Event

Read / update synapse table

Timer Event

Calculate and update neurons

31

Example ANN Problem

Constraint:10 neurons / core

(due to mem/cpu etc.)

A splits into 2 coresMapped to core 1&2

B, C and D map to 3-5

1:1

48

All:All

4 25% 12 4

50%

4

A

B

DC

Routes: 1↔3 & 2↔3 (A↔B), 1↔5 & 2↔5 (A↔D), 3↔5 (B↔D), 4↔5 (C↔D).

32

PyNN Integration

PACMAN

33

Real-Time I/O 1

34

Real-Time I/O 2

35

Real-Time I/O 3

36

Outline

Motivation

SpiNNaker Architecture

Machines

Software

State of the Nation

Conclusions and Futures

37

Full 18-core chip: arrived May 2011

4th Gen Card: 48 chips, 864 processors June 2012

Neuron models: LIF, Izhikevich, MLP

Synapse models: STDP, NMDA

Networks: PyNN, NEF (Nengo) -> SpiNNaker

various small tools to build Router tables, etc

Plans: 104 machine (Q4 2012), 105 (1H 2013), 50,000-chip 106 machine (Q3 2013).

Current Project Status

38

Outline

Motivation

SpiNNaker Architecture

Machines

Software

State of the Nation

Conclusions and Futures

39

SpiNNaker MPSoC Power-efficiency

Scalable communications

Programmability

Fault-tolerance

SpiNNaker machine

Massively-parallel, programmable platform

Aim to help neuroscientists to explore and understand information processing mechanisms in the brain

Other parallel applications too

Conclusions

40

Manchester Team

41

Any Questions

?http://apt.cs.man.ac.uk/projects/SpiNNaker/

Search the Web and YouTube for “SpiNNaker Manchester|chip”

Recommended