View
841
Download
0
Category
Preview:
DESCRIPTION
A presentation given at the Programming Languages Meetup in San Francisco (Jun 10, 2014). Computation is about communicating state machines, but the message is lost in the endless debates on threads vs. events, iterators vs.. reactive approaches. There are lightweight coroutine and thread options available in all major mainstream languages, which help combine the easy sequential thread programming, with performance of event-oriented code. You can have it all.
Citation preview
Communicating State Machines
sriram srinivasan!sriram@malhar.net
www.malhar.net/sriram
Programming Languages Meetup, San Francisco, June 10, 2014
• Fundamental building block of computation
• Communicating State Machines model
• Synchronous and Asynchronous composition
• Hierarchical State Machines specification
• (Edward A. Lee and Pravin Varaiya, Structure and Interpretation of Signals and Systems, LeeVaraiya.org)
State machines
• Distributed Systems
• Hardware interfaces
• Components of a memory hierarchy
• Stream producers and consumers
• Parsers and Lexers
• Filesystem and tree walker.
• Networking stack and Socket consumer
• Bidirectional communication
CSMs are Ubiquitous
C++ Boost.struct Active : sc::simple_state< Active, StopWatch, Stopped > { public: typedef sc::transition< EvReset, Active > reactions; ! Active() : elapsedTime_( 0.0 ) {} double ElapsedTime() const { return elapsedTime_; } double & ElapsedTime() { return elapsedTime_; } private: double elapsedTime_; }; struct Running : sc::simple_state< Running, Active >
{ public: typedef sc::transition< EvStartStop, Stopped > reactions; ! Running() : startTime_( std::time( 0 ) ) {} ~Running() { context< Active >().ElapsedTime() += std::difftime( std::time( 0 ), startTime_ ); } private: std::time_t startTime_; };Ugh!
• Language used in TinyOS to program wireless motes
nesC
• Components with bidirectional interfaces
• Separate configuration to stitch together components
nesC Bidirectional Interfacesinterface StdControl { command result_t init(); } !interface Timer { command result_t start(char type, uint32_t interval); command result_t stop(); event result_t fired(); } !interface Send { command result_t send(TOS_Msg *msg, uint16_t length); event result_t sendDone(TOS_Msg *msg, result_t success); } !interface Device { command result_t getData(); event result_t dataReady(uint16_t data); }
nesC Implementationmodule ChirpM { provides interface StdControl; uses interface Device; uses interface Timer; uses interface Send; implementation { uint16_t sensorReading; command result_t StdControl.init() { return call Timer.start(TIMER_REPEAT, 1000); } event result_t Timer.fired() { call Device.getData(); return SUCCESS; } event result_t Device.dataReady(uint16_t data) { sensorReading = data; ... send message with data in it ... return SUCCESS; } } }
StdControl
ChirpMTimer Device Send
nesC configurationChirpC
configuration ChirpC { provides interface StdControl; } implementation { components ChirpM, BarometerC, RadioAnnouncerC; ! StdControl = ChirpM.StdControl ChirpM.Timer -> HWTimer.Timer ChirpM.Device -> Barometer ChirpM.Send -> RadioAnnouncerC }
StdControl
ChirpMTimer Device Send
StdControl
Timer
HWTimer
Device
Barometer
Send
RadioAnnouncerC
Why aren’t more systems structured this way?
Synchronous Communication
Stacks as State Machinesvoid readMsgs( socket) { numMsgsRead = 0 while (true) { msg = readMsg(socket) dispatch(msg) log(numMsgsRead++) } } void readMsg(socket) { len := readLen(socket) readBody(len) } void readLen(socket) { byte[4] len for i = 0 .. 4 { len[i] = readByte(socket) } return len }
• Thread of control
• Control plane = Call Chain (each frame remembers its pc)
• Sequential flow of control defines hidden states
• Functions define major states
• Data plane = Vars local in each frame
• Blocking semantics == synchronous (lock-step) communication
• readByte and dispatch interact with network
• Easy API; that’s why Posix and most db calls are synchronous
State machine in Erlangbark() -> io:format("Dog says: BARK! BARK!~n"), receive pet -> wag_tail(); _ -> io:format("Dog is confused~n"), bark() after 2000 -> bark() end. !wag_tail() -> io:format("Dog wags its tail~n"), receive pet -> sit(); _ -> io:format("Dog is confused~n"), wag_tail() after 30000 -> bark() end.
sit() -> io:format("Dog is sitting. Gooooood boy!~n"), receive squirrel -> bark(); _ -> io:format("Dog is confused~n"), sit() end.
• Tail-call optimization renderschange trivial
Credit: http://learnyousomeerlang.com/finite-state-machines
• Problem: Obtain leaves from a tree one at a time
Leaves from a Tree
• Problem: Obtain leaves from a tree one at a time
Leaves from a Tree
• Problem: Obtain leaves from a tree one at a time
• Two interacting state machines:
• Producer: tree, Consumer: user code that acts on the leaves.
• Pull solution: Iterators
• Convenient for clients
• for leaf in tree: print leaf.name
• Push solution: Functional approach
• Tree pushes data to visitors or user-defined functions
• tree.visit( myfunc )
• Ideally: Duals of each other
• In practice: Duel with each other
Leaves from a Tree
Pull Solution: Iteratorsclass Node: … def __iter__(self): return Iter(self) !class Iter: def __init__(self, root): self.nxt = root.first_leaf() self.prev = None def next(self): nxt = self.nxt if nxt: # First time entry into iterator self.nxt = None self.prev = nxt return nxt
(contd).
prev = self.prev if prev.sibling: nxt = prev.sibling.first_leaf() else: # explore cousins .. children of parent's siblings parent = prev.parent while parent: uncle = parent.sibling if uncle: nxt = uncle.first_leaf() break else: parent = parent.parent # continue loop if nxt: self.prev = nxt # for next iter return nxt else: raise StopIteration
• Consumer code drives iteration
• Producer code (iterable) needs to save state between iterations
Push solution
class Node: … def leaves(self, callback): if self.is_leaf(): callback(self) else: for c in self.children: c.leaves(callback) ! if self.sibling: self.sibling.leaves(callback)
def cb(node): print node.name !tree.leaves(cb)
• Consumer side:
• Callback hell
• Visitor pattern is an abomination
• Does not have flow-control between events
• Producer side:
• drives iteration
• stack for storing recursive state
• Allows async consumers to deliver events
Consumer
Producer
Push: Consumer-side troubleexports.processJob = function(options, next) { db.getUser(options.userId, function(err, user) { if (error) return next(err); db.updateAccount(user.accountId, options.total, function(err) { if (err) return next(err); http.post(options.url, function(err) { if (err) return next(err); next(); }); }); }); };
def sameFringe(treeA, treeB): itreeA = iter(treeA) itreeB = iter(treeB) while 1: nodeA = itreeA.next() nodeB = itreeB.next() if node A .name != nodeB.name: return False ….
Callback Hell !Sequential chain of events verbose to express !Inversion of control
Concurrent Traversals trivial in Pull approach
Generators/Coroutines
Generators: Concurrent Stackso = odds() !print o.next() print o.next() print o.next() !# Print infinite stream for n in odds() : print n
def odds(): i = 1 while True: yield i i += 2
for leaf in tree.leaves(): print leaf.name
class Tree: def leaves(self): if self.is_leaf(): yield self else: for c in self.children: for leaf in c.leaves(): yield leaf
• Generators/Coroutines are simply a compiler transformation of threaded to event-driven code on same kernel thread
• Flow of control alternates between consumer and producer
• Cheap user-level tasks with explicit cooperative scheduling
• Scheduler calls next()
• Task calls yield() whenever necessary
• Wrapped in an abstraction called Fiber
• Ruby: Fiber.yield, Javascript: function*, yield/yield*
• Symmetric vs Asymmetric coroutines
• Lazy streams — Infinite streams on demand
Generators
Asynchrony and Multiprocessing
• All threads have the same fixed size set at creation time: usually set to worst case
• Kernel Thread context switching is expensive (in μs)
• Preemption at any time ==> Save all registers: 16 general purpose registers, PC, SP, segment registers, 16 XMM registers, FP coprocessor state, X AVX registers, all MSRs
• TLB flushes, cache invalidation, crossing kernel protection boundary
• Even cooperative yields are expensive.
• A kernel thread is a precious resource. Can’t block it.
• No, not for IO-bound code, says Paul Tyma
Why can’t we just use Kernel Threads?
> ulimit –s 8192
Threads vs Events Debate
• But horrible user-programming model
• libuv, libasync, EventMachine (Ruby), netty (Java)
• User-code must not block, not call other I/O operations
Event-driven I/O is faster
Netty inversion of controlio.netty.handler.codec.DecoderException: java.lang.RuntimeException: No packet with id 78 at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:263) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:131) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:173) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.handler.codec.ByteToMessageDecoder.handlerRemoved(ByteToMessageDecoder.java:109) at io.netty.channel.DefaultChannelPipeline.callHandlerRemoved0(DefaultChannelPipeline.java:524) at io.netty.channel.DefaultChannelPipeline.callHandlerRemoved(DefaultChannelPipeline.java:518) at io.netty.channel.DefaultChannelPipeline.remove0(DefaultChannelPipeline.java:348) at io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:319) at io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:296) at org.spigotmc.netty.LegacyDecoder.decode(LegacyDecoder.java:38) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:232) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:131) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:149) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:100) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:478) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:447) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:341) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: No packet with id 78 at org.spigotmc.netty.Protocol$ProtocolDirection.createPacket(Protocol.java:272) at org.spigotmc.netty.PacketDecoder.decode(PacketDecoder.java:44) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:232)
Let’s compromise. Let’s both be unhappy.
• All I/O handled by special I/O event loop in separate thread
• Can’t do I/O in callback
• Cannot block
• Handed off to a task on a separate thread pool
• Task cannot block there either; limited threads in thread pool
• Hand-rolled continuations
Current mainstream
Netty
public class WriteTimeOutHandler extends ChannelOutboundHandlerAdapter { @Override public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) { ctx.write(msg, promise); ! if (!promise.isDone() { ctx.executor().schedule(new WriteTimeoutTask(promise), 30, TimeUnit.SECONDS); } } }
Ugh.
Functional Reactive ProgramminggetDataFromNetwork() .skip(10) .take(5) .map({ s -> return s + " transformed" }) .subscribe({ println "onNext => " + it })
• Reactive extensions .NET, RxJava, Scala
• Asynchronous stream. A chain of transformers ending with a callback.
• Effectively with the same kinds of restrictions:
• No blocking, worry about thread context (“can I write to a socket”)
• Pretty sequential code.
• Millions of Threads.
• Block when we want to.
• Receive and Send to other SMs anywhere.
• Receive from multiple sources
• Speed and lightness of Event-driven solutions
Can we have it all?
• kilim.malhar.net
• Bytecode transformer for coroutines/generators and lightweight tasks
• s/Thread/Task/
• s/run()/execute() throws Pausable/
• All functions that may block annotated as “throws Pausable”
• Use typed mailboxes to communicate
• Bytecode transformation of Java code.
• Offline or at class load time
Ta da! Kilim
Kilim Performance vs. Threads
Kilim Server Performance vs. Jetty
• Lightweight threads — C layout, small dynamic stacks
• Multiplex on channel I/O — CSP’s alt operator.
• Fast context switching — three registers to save and restore (PC, SP and DX)
• Syntactic lightness
• Language and idioms fit in my L1 cache
• Closures, Duck-typing
• 0-sized channels == true synchronous lock-step
• What I want: Some aspects of Swift/Rust!
What I like about Go
Gopackage main func main() { ch := make(chan int) ! go func() { // producer i := 1 for { ch <– i i += 2 } }() ! for { // consumer println(<–ch) } }
Gofunc main() { // Listen and accept loop tcpaddr, err := net.ResolveTCPAddr("tcp", "localhost:9999") check(err) tcp_acceptor, err := net.ListenTCP("tcp", tcpaddr) check(err) fmt.Println("Listening on ", tcp_acceptor.Addr()) ! for true { tcp_conn, err := tcp_acceptor.AcceptTCP() check(err) go serve(tcp_conn) } }
func serve(conn *net.TCPConn) { for true { dec := gob.NewDecoder(conn) //var msg Msg var data string //err := dec.Decode(&msg) err := dec.Decode(&data) check(err) println("Server: Rcvd ", data) //println("Server: Rcvd ", msg.Data, "from", msg.From) …. }
• Compiler transformation of ‘go’ blocks into event-driven code
• All blocking calls must be made directly inside a go block
• Channel receives and sends cannot be made in a called function
• In general, all approaches relying only on compiler transformations leak abstractions. Need Go/Erlang like deep runtime support
Clojure core.async
• Threaded style is easy to write and understand
• Actors are not internally concurrent; no internal data races.
• Undesirable combination: Aliasing + Mutability
• Either aliased+immutable — clojure approach
• Unaliased+mutable — KIlim, Rust, Go approach.
• Isolate actor state, and exchange messages. Rust’s linear type system is wonderful.
• Go mantra: Share by communicate, not communicate by sharing
• No more threads vs. events debates. You can have it all
• Erlang, Go, Rust, Kilim for Java, Akka for Scala, F#
Takeaways
Recommended