Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani

Inferring Locks for Atomic Sections

Cornell University(summer intern at Microsoft Research)

Microsoft ResearchMicrosoft Research

Sigmund CheremTrishul ChilimbiSumit Gulwani

Inferring Locks for Atomic Sections | Sigmund Cherem, Trishul Chilimbi, and Sumit Gulwani

What Is This Talk About?• Multi-cores widely available

• Developing concurrent software is not trivial• Many challenges: parallelization, synch., isolation• Manual locking is error prone, non compositional

• Recent proposal: atomic sections• Raising the level of abstraction, is compositional

• Optimistic (transactions) implementations[Herlihy, Moss ISCA’93; Hammond et al. ISCA’04] [Shavit,

Touitou PDC’95; Dice et al. DISC’06; Fraser, Harris TOPLAS’07]

• Limitations: non-reversible ops, overhead• This talk: compiler support for atomic sections

via pessimistic concurrency


Static Lock Inference Framework• Compiler support for atomic sections

based on pessimistic concurrency• Prevent conflicts using locks, no

deadlocks• Goal: reduce contention while avoiding

deadlocks

Lock InferenceCompiler

Concurrent program with atomic sections(runs on STM)

Same program with locks for implementing atomic sections• Specifies “where”,

but not “how”• Lightweight runtime support (locking library)• Automatically supports non-reversible ops.


Moving List Elementsmove (list* to, list* from) { atomic { elem* x = to->head;

elem* y = from->head; from->head = null; …

while (x->next != null) { x = x->next;

} x->next = y; } }

head

to

head

from

x y


Moving List Elementsmove (list* to, list* from) { atomic { elem* x = to->head;



} x->next = y; }}

head

to

from

x y


Attempt 1: Global Lockmove (list* to, list* from) { elem* x = to->head;



} x->next = y; }

head

to

from

x y

Problem with Attempt 1:No parallelism with any other atomic sections

acquire( GLOBAL );

release( GLOBAL );

Global lock protects entire

memory


move (list* to, list* from) { elem* x = to->head;

elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; releaseAll();}

Attempt 2: Fine-Grain Locks

head

to

from

x y

acquire( &(from->head) );

…

acquire( &(to->head) );

acquire( &(x->next) );

acquire( &(x->next) ); A fine-grain lock protects an individual memory address


head

to

from

x y

move (list* to, list* from) { elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next; } x->next = y; releaseAll();}

Attempt 2: Fine-Grain Locks

Problem with Attempt 2:may lead to deadlock

acq(&(a->head) );

// deadlock here

acq(&(b->head) );

acq(&(b->head) );

acq(&(a->head) );

move(a, b) move(b, a)| |






move (list* to, list* from) {


} x->next = y; releaseAll();}

Attempt 3: Fine-Grain Locks at Entry

elem* x = to->head;

elem* y = from->head; from->head = null;

… acquire( &(x->next) );




acquireAll({

} );head

to

from

x y

Challenge #1:Protect locations ahead of time (at entry of atomic), i.e., find which addresses will be used inside atomic


Protect when Entering Atomic Block

• Find corresponding expressions Acquire a lock for each shared location

accessed within the atomic section, expressed in terms of expressions valid at the entry of the atomic block

atomic { list* x = y[5]; list* d = x; d->head = NULL;}

acquire( &(d->head) )acquire( &(x->head) )acquire( &(y[5]->head) )

Contribution #1: Identifying appropriate fine-grain locks at entry (via inter-procedural backward data-flow analysis)


head

move (list* to, list* from) { acquireAll({

} ); elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next;


Attempt 3: Fine-Grain at Entryto

from&(to->head)&(from->head)

&(to->head->next)

Problem with Attempt 3:Can’t protect unbounded number of locations

head


head

move (list* to, list* from) { acquireAll({

} ); elem* x = to->head; elem* y = from->head; from->head = null; … while (x->next != null) { x = x->next;


Attempt 4: Multi-Grain Locks at Entry

to

from

head

A coarse-grain lock protects a set of memory

locations

Challenge #2:Mixing locks of multiple granularities while avoiding deadlocks


Defining Multi-Grain Locks

• A fine-grain lock protects a single location

• A coarse-grain lock protects a set of locations• Any traditional heap abstraction can be used

to define coarse-grain locks• E.g. types, points-to sets, shape abstractions

• Our compiler framework is parameterized• Clients can specify the kind locks they want

to use


• Borrow Database’s locking protocol based on intention locks [Gray ’76]

Mixing Locks of Multiple Granularities

Can’t be held concurrently

Global lock

Coarse-grain locks

Fine-grain locks

Memory locations

Contribution #2: We allow mixing locks of multiple granularities and avoid deadlocks


• Sound locking structure provided• Protected by child is also

protected by parent• Map of expressions to locks• Bounded (for termination)

• Soundness Theorem• Compiler chooses set of locks protecting

all memory accesses within atomic block

Soundness Results

&(to->head->next)&(to->head->next->next)

… &(*->next)

*

Contribution #3:Framework is sound (for any sound lock structure instantiation)


• Lock structure instance: 3-level locks + effects

• Experiments• Concurrent data-structures: rb-tree, hashtable

• Concurrent get (read-only), put, and remove operations

• 1.86Gz Intel Xeon dual-quad core machine

Experimental Evaluation

Global lock

Points-to set locks [Steensgard’s ’96]

Expression locks (limited in size)

rw

rorw

ro…


Scalability Results

Number of threads

Execu

tion

tim

e (

sec)

706050403020100

1 2 3 4 5 6 7 8

Global lockTL2 STM [Dice et al. DISC’06]

Only coarse-grain locksCoarse + fine-grain locks


706050403020100

TH (rb-tree + hash w/rehash): 80% gets

1 2 3 4 5 6 7 8Number of threads

Execu

tion

tim

e (

sec)



Compiler didn’t use fine-grain locks

Scalability comparable to STM

Global lock (exclusive) doesn’t scale


706050403020100

TH (rb-tree + hash w/rehash): 80% puts


Execu

tion

tim

e (

sec)



2 coarse-grain (exclusive) locks are better than a single global lock

High contention from re-hashing degrades STM performance


simple-hashtable: 80% gets


Execu

tion

tim

e (

sec)

454035302520151050



Compiler didn’t use fine-grain locks for gets

STM allows put and get concurrently


simple-hashtable: 80% puts


Execu

tion

tim

e (

sec)

454035302520151050



Compiler uses fine-grain locks for puts


Differences with Recent Work

• No programmer annotations (other than atomic)• Autolocker [McCloskey et al POPL’06] requires

programmer annotations to choose appropriate granularity

• Moving fine-grain lock acquisitions to entry of atomic • Acquiring fine-grain locks right before first use

[Hindman, Grossman MSPC‘06] is not fully pessimistic• may generate deadlocks and need rollbacks

• Multi-grain locks without deadlocks• Several pessimistic approaches use coarse-grained

locks only [Hicks et al ’06; Halpert et al. ’07; Emmi et al.’07]


Conclusions and Future Work

• Lock inference framework for atomic sections• Multi-grain locks to reduce contention and avoid

deadlocks• Soundness: accesses are protected, atomicity

preserved• Validation: resulting performance depends on

application• Locks preferable for non-reversible ops. or high-contention

• Future directions• Better locking hierarchy instantiations (e.g. ownership)• Optimizations (e.g. delay lock acquisitions)• Hybrid systems (e.g. compiler support to optimize

STMs)

?

Documents

Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani