19
WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

Embed Size (px)

Citation preview

Page 1: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

WildFire: A Scalable Path for SMPs

Erik Hagersten and Michael Koster

Presented by Andrew Waterman

ECE259 Spring 2008

Page 2: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

Insight and Motivation

• SMP abandoned for more scalable cc-NUMA• But SMP bandwidth has scaled faster than CPU

speed• cc-NUMA is scalable but more complicated

– Program/OS specialization necessary– Communication to remote memory is slow– May not be optimal for real access patterns

• SMP (UMA) is simpler– More straightforward programming model– Simpler scheduling, memory management– No slow remote memory access

• Why not leverage SMP to the extent that it scales?

Page 3: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

Multiple SMP (MSMP)

• Connect few, large SMPs (nodes) together• Distributed shared memory

– Weren't we just NUMA-bashing?• Several CPUs per node => many local memory refs• Few nodes => unscalable coherence protocol OK

Page 4: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

WildFire Hardware

• MSMP with 112 UltraSPARCs– Four unmodified Sun E6000 SMPs

• GigaPlane bus (3.2 GB/s within a node)• 16 2CPU or I/O cards per node• WildFire Interface (WFI) is just another I/O board

(cool!)– SMPs (UMA) connected via WFI == cc-NUMA (!)

• But this is OK... few, large nodes• Full cache coherence, both intra- & inter-node

WildFire from 30,000 ft

(emphasis on a single SMP node)

Page 5: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

WildFire Software

• ABI-compatible with Sun SMPs– It's the software, stupid!

• Slightly (allegedly) modified Solaris 2.6• Threads in same process grouped onto same node• Hierarchical Affinity Scheduler (HAS)• Coherent Memory Replication (CMR)

– OK, so this isn't purely a software technique

Page 6: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

Coherent Memory Replication (CMR)

• S-COMA with fixed home locations for each block– For those keeping score, that means it's not COMA

• Local physical pages “shadow” remote physical pages– Keep frequently-read pages close: less avg. latency

• Implementation: hardware counters– CMR page allocation handled within OS– Coherence still in hardware at block granularity

• Enabled/disabled at page granularity• CMR memory allocation adjusts with mem. pressure

Page 7: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

Hierarchical Affinity Scheduling (HAS)

• Exploit locality by scheduling a process on the last node on which it executed

• Only reschedule onto another node when load imbalance exceeds a threshold

• Works particularly well when combined with CMR– Frequently-accessed remote pages still shadowed

locally after a context switch• Lagniappe locality

Page 8: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

WildFire Implementation

A single Sun E6000 with WFI. Recall: WildFire Interface is just one of 16 standard cards on the GigaPlane bus.

Page 9: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

WildFire Implementation

• Network Interface Address Controller (NIAC) + Network Interface Data Controller (NIDC) == WFI

• NIAC interfaces with GigaPlane bus and handles inter-node coherence

• Four NIDCs talk to point-to-point interconnect between nodes– Three ports per NIDC (one for each remote node)– 800MB/s in each direction with each remote node

Page 10: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

WildFire Cache Coherence

• Intra-node coherence: bus + snoopy• Inter-node (global) coherence: directory

– Directory state kept at a block's home node• Directory cache (SRAM) backed by memory

– Home node determined by high-order address bits– MOSI– Nothing special since scalability not an issue

• Blocking directory, 3-stage WB => no corner cases

• NIAC sits on bus and asserts “ignore” signal for requests that global coherence must attend to– NIAC intervenes if block's state is inadequate or

resides in remote memory

Page 11: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

WildFire Cache Coherence

• Coherent Memory Replication complicates matters– A local shadow page has a different physical

address from its corresponding remote page– If a block's state is insufficient, must look up global

address in order for WFI to issue remote request• Stored in LPA2GA SRAM• Also cache the reverse lookup (GA2LPA)

Page 12: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

Coherence Example

Page 13: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

WildFire Memory Latency

• WildFire compared to SGI Origin (2x R10K per node) and Sequent NUMA-Q (4x Xeon per node)

• WF's remote mem. latency mediocre (2.5x Origin, similar to NUMA-Q), but less relevant because remote accesses less frequent (1/14 as many as Origin, 1/7 as many as NUMA-Q)

Page 14: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

Evaluating WildFire

• Clever performance evaluation methodology: isolate on WildFire itself by comparing single-node 16-cpu system with two-node, 8cpu/node system– Pure SMP vs. WildFire

• Also compare with NUMA-fat– Basically WF with no OS support, i.e. no CMR, no

HAS, no locality-aware memory allocation, no kernel replication

• And compare with NUMA-thin– NUMA-fat but with small (2 CPU) nodes

• Finally, turn off HAS and CMR to evaluate their contribution to WF's performance

Page 15: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

Evaluating WildFire

• WF with HAS+CMR comes within 13% of pure SMP• Speedup(HAS+CMR) >> Speedup(HAS)*Speedup(CMR)• Locality-aware allocation and large nodes are important

Page 16: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

Evaluating WildFire

• Performance trends correlate with locality of reference• HAS + CMR + Kernel Replication + Initial Allocation

improve locality of access from 50% (i.e. uniform distribution between two nodes) to 87%

Page 17: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

Summary

• WildFire = a few large SMP nodes + directory-based coherence between nodes + fast point-to-point interconnect + clever scheduling and replication techniques

• Pretty good performance (unfortunately, no numbers for 112 CPUs)

• Good idea?– I think so, but I doubt much room for scalability

• Then again, that wasn't the point• Criticisms?

– Authors are very proud of their slow directory protocol

– Kernel modifications may not be so slight

Page 18: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

Questions?

Erik Hagersten and Michael Koster

Presented by Andrew Waterman

ECE259 Spring 2008

Page 19: WildFire: A Scalable Path for SMPs Erik Hagersten and Michael Koster Presented by Andrew Waterman ECE259 Spring 2008

WildFire: A Scalable Path for SMPs

Erik Hagersten and Michael Koster

Presented by Andrew Waterman

ECE259 Spring 2008