A Lock-Free Hash Table - Purdue University · 2 | ©2008 Azul Systems, Inc. Azul Systems •Designs...

A Lock-Free Hash Table

2009 Transaction Memory Workshop

Azul's Experiences with Hardware Transactional Memory

Dr. Cliff ClickChief JVM Architect & Distinguished Engineerblogs.azulsystems.com/cliffAzul SystemsJanuary 31, 2009

www.azulsystems.com

Azul Systems

• Designs our own chips (fab'ed by TSMC)• Builds our own systems• Targeted for running business Java• Large core count - 54 cores per die

─ Up to 16 die are cache-coherent─ Very weak memory model meets Java spec w/fences

• “UMA” - Flat medium memory speeds─ Business Java is irregular computation─ Have supercomputer-level bandwidth

• Modest per-cpu caches─ 54*(16K+16K) = 1.728Meg fast L1 cache─ 6*2M = 12M L2 cache─ Groups of 9 CPUs share L2

www.azulsystems.com

Azul Systems

• Cores are classic in-order 64-bit 3-address RISCs• Each core can sustain 2 cache-missing ops

─ Plus each L2 can sustain 24 prefetches─ 2300+ outstanding memory references at any time

• Some special ops for Java─ Read & Write barriers for GC─ Array addressing and range checks─ Fast virtual calls

• But core clock rate not real high• So task-level parallelism is the name of the game

www.azulsystems.com

The Bottleneck is not the Platform

• JVM scales linear to 864 CPUs─ Have bandwidth to feed them all as well

• Lite micro-kernel OS─ Easily supports >100K runnable threads

• Heaps >500Gig─ Sustained allocation rates >40Gig/sec

How do we enable users to write programs with hundreds of runnable threads?

www.azulsystems.com

The Bottleneck is not the Platform

• “Big Thread” programs tend to fall into 2 main camps─ Parallel data “science” (or really “financial modeling”) apps─ Web-tier app-server thread-pool + worklist apps

• Data-parallel apps tend to scale nice─ After a (short) round of tweaking─ Although JDK concurrency libs often an issue at > 64 cpus.

• Web-tier apps are more common─ And scale less well─ Internal locking of shared structures─ e.g. legacy uses of Hashtable

• Frequently see <10 cpus without tweaking• After app tuning see frequently see <50 cpus

www.azulsystems.com

Legacy Locking is an Issue

• Lots of Old Java out there• 3rd-party apps being linked in

─ Source not available (company defunct?)

• Intended to run on 1 cpu (and maybe now 2 cpu) machines• But lots of the locking is useless

─ Data contention is rare─ At least on the users' data─ Lock contention is far more common─ e.g. Hashtable─ But also shared large sync'd HashMaps─ Lots of other Container classes

www.azulsystems.com

No “atomic” keyword

• No desire to enter the “language wars”• Costumers want old code to “just run faster”

─ Dusty-deck acceleration

• So Azul built TM support to accelerate Java locks─ Speed up “synchronized” keyword─ No “atomic” keyword

• Uncontended locks are by far the most common─ And uncontended locks already fast ─ Azul's CAS and Fences can “hit in cache”

• Contended locks already fast as can be─ Lite microkernel OS; very fast context switch times

www.azulsystems.com

Hardware TM Support

• Built in our first chips• Leverage L1

─ Extra bits per cache-line─ “Speculatively-read”, “speculatively-written”

• No changes to L2 at all─ No other CPU is aware that this CPU is attempting a XTN

• SPECULATE instruction─ Flips CPU speculate mode; starts tracking reads & writes

• ABORT instruction, or abort on eviction from L1─ Mark spec-lines as “invalid”

• COMMIT instruction─ Clear spec-bits

www.azulsystems.com

Hardware TM Support

• No hardware register rollback─ Software responsible for all recovery on Abort─ Possible because only accelerating Java locks

• Limited by size of L1• Limited by associativity of L1

─ And L2, since shared inclusive L2

• No graceful fallback on Abort─ No attempt at STM support

• No abort on TLB miss or function call or ...• Either an XTN fits in cache or it doesn't• Software heuristics determine when to use the HTM

www.azulsystems.com

Software TM Support

• Just “synchronized” keywords• HotSpot “thin locks” for uncontended locks

─ Just use the fast CAS, no HTM

• Contended locks attempt HTM─ Success-time/fail-time ratios kept─ After some initial attempts

─ If ratio is bad, switch to OS lock─ Periodically re-measure success/fail ratio

www.azulsystems.com

Experiences

• Some apps get 2x speedup (e.g. Trade6)• Most get <10% (e.g. Calypso)• Lots of teething problems with heuristic

─ Easy to get 10-20% slowdown for constant fail/retry

• Currently turned on always & shipping• Always see a handful of locks using the HTM• But rarely the “right” locks to get more CPUs busy

• Failure is almost always to due conflictand not capacity.

www.azulsystems.com

XTN Size

• Limited by size & associativity of L1 & L2• But this appears to be generous:• XTN's of many thousands of instructions happen

─ Which include 100's of cache-hitting loads

• Most interesting XTNs fail for conflictand not capacity.

www.azulsystems.com

Issues

• Users don't write “TM friendly” code• Neither do library writers:

─ e.g. “modcount” - bumped per mod to Hashtable─ Large shared Hashtable, most updates are unrelated─ Update itself works well in the HTM─ But updates to shared “modcount” blow out HTM─ “modcount” is mostly unused & useless field update

• Same issue with many performance counters─ Locked writes to a shared variable kills TM

• Many times a small rewrite makes HTM possible─ But blows the “dusty deck” goal

www.azulsystems.com

Issues

• Hard part is to get customer past “dusty deck” thinking• Once a code rewrite is “on the table”

─ Customer goes whole-hog─ And rewrites w/fine-grained locking

• Generally only need to “crack” a few locks• Generally fairly easy, once exact locks are known• Code then scales on all platforms, not just Azul• Locks have known performance issues

─ Better than unknown TM rollback/retry issues

www.azulsystems.com

Hardware Abort is “Too Good”

• Hardware abort rolls back ALL memory writes─ No “breadcrumbs” on failure

• Hard to record fail reason!─ Must pass fail code out in a reserved register─ Even on success path

• First CPUs did not report hardware failure─ Cannot tell capacity issues from associativity issues from...

• Failure reason is required for a robust heuristic

www.azulsystems.com

Summary

http://blogs.azulsystems.com/cliff/

• Modest gains• Rewrites of “dumb” code could help a lot• Azul did some of this for common JDK pieces

─ Just too much of it Out There• Future work:

─ Robust customer friendly tool for finding XTN “unfriendly” code

─ Let customer solve the “why does HTM fail?”

Thank You

WWW.AZULSYSTEMS.COM

#1 Platform for Business Critical Java™

A Lock-Free Hash Table - Purdue University · 2 | ©2008 Azul Systems, Inc. Azul Systems •Designs...

Documents

Java SE State Of The Union - QCon San Francisco · 2020-05-18 · Java SE State Of The Union Gil Tene, CTO & co-Founder, Azul Systems @giltene ©2016 Azul Systems, Inc. Agenda

A JVM Does What? - Azul Systems, Inc....©2011 Azul Systems, Inc. 3 Agenda • What is a Java Virtual Machine • What does it do for Java? • Compilers and optimization • Garbage

Understanding Zulu Garbage Collection by Matt Schuetze, Director of Product Management at Azul Systems

How NOT to Measure Latency - Azul Systems · 2020-01-11 · ©2013 Azul Systems, Inc. “Low Latency” Trading aka “soft” real time Goal A: Be fast enough to make some good plays

Low Latency Java in the Real World - Azul Systems, Inc.©2015 Azul Systems, Inc. ©2015 LMAX Exchange Ltd. Real World Results Zing helped LMAX tame GC-related latency outliers Highly-engineered

Understanding Hardware Transactional Memory - QCon...Understanding Hardware Transactional Memory Gil Tene, CTO & co-Founder, Azul Systems @giltene ©2016 Azul Systems, Inc. Agenda

Azul Systems - Our corporate overview

JVM: Memory Management Details · 2020. 1. 11. · Tools: JProfiler Biggest Retained Sets ©2011 Azul Systems, Inc. 40 Tools: JProfiler Difference Between 2/12 ©2011 Azul Systems,

Granjero azul

Understanding Java Garbage Collection - Azul Systems · ©2014 Azul Systems, Inc. Presented at Detroit JUG, November 2014 Understanding Java Garbage Collection A Shallow Dive into

Exteriores COLOR GUIDE · 2020. 10. 16. · E211 AZUL CAL E214 AZUL ARLÉS E561 GRIS CÉLTICO E549 AZUL BRISA DE MAR E553 AZUL MAR E557 AZUL LAGUNA E562 GRIS NÓRDICO E550 AZUL MARINERO

Danubio Azul

Division Azul

SPRAYSKIRT/COVER FIT CHART - Wilderness Systems … · · 2017-09-12SPRAYSKIRT/COVER FIT CHART ... (ocean) - CHESAPEAKE LIGHT ... aZul odyssey 41/23 aZul Olympia (tandem) 34/19

Faster Objects and Arrays Closing the [last?] inherent C ... · Faster Objects and Arrays ©2015 Azul Systems, Inc. About me: Gil Tene co-founder, CTO @Azul Systems ... off-heap solutions

PRODUCT Montaña Azul Montaña Azul

The C4 Collector - Azul Systems · ©2011 Azul Systems, Inc. 99 Generational Collection for modern servers… •A modern server: 100s of GB of memory, 10s of cores, each allocating

Azul Systems, Inc. - Zulu User's GuideZuluInstallationGuide AzulSystems 9 1.LogintoMicrosoftAzureandnavigatetotheAzureGalleryofvirtualmachines. 2.SelectandcreateyourAzurevirtualmachine

Costa azul

What’s new in the JVM in Java SE 8 · ©2011 Azul Systems, Inc. What’s new in the JVM in Java SE 8 Gil Tene, CTO & co-Founder, Azul Systems