Optimizing Erlang Code for Speed



Considers optimizations allow to reach microseconds latencies and GBs throughput in intelligent network management solution written in Erlang

Citation preview

Optimizing Erlang code for speedRevelations from a real-world project based on Erlang on Xen


Maxim KharchenkoCTO, Cloudozer LLPmk@cloudozer.com

The road map● Erlang on Xen intro

● Speed-related notes

– Arguments are registers

– ETS tables are (mostly) ok

– Do not overuse records

– GC is key to speed

– gen_server vs. barebone process

– NIFS: more pain than gain

– Fast counters● Q&A


Erlang on Xen 101● A new Erlang runtime that runs without OS

● Conceived in 2009

● Highly-compatible with Erlang/OTP

● Built from scratch, not a “port”

● Optimised for low startup latency

● Not an open source (yet)

● The public build service is free

Go to erlangonxen.org


Zerg demo: zerg.erlangonxen.org

The road map● Erlang on Xen intro

● Speed-related notes

– Arguments are registers

– ETS tables are (mostly) ok

– Do not overuse records

– GC is key to speed

– gen_server vs. barebone process

– NIFS: more pain than gain

– Fast counters● Q&A


Arguments are registers

● Many arguments do not make a function any slower

● Do not reshuffle arguments:

animal(batman = Cat, Dog, Horse, Pig, Cow, State) ->feed(Cat, Dog, Horse, Pig, Cow, State);

animal(Cat, deli = Dog, Horse, Pig, Cow, State) ->pet(Cat, Dog, Horse, Pig, Cow, State);


%% SLOWanimal(Cat, Dog, Horse, Pig, Cow, State) ->

feed(Goat, Cat, Dog, Horse, Pig, Cow, State);...


ETS tables are (mostly) ok● A small ETS table lookup = 10x function activations

● Do not use ets:tab2list() inside tight loops

● Treat ETS as a database; not a pool of global variables

● 1-2 ETS lookups on the fast path are ok

● Beware that ets:lookup(), etc create a copy of the data on the heap of the caller, similarly to message passing


Do not overuse records● selelement() creates a copy of the tuple

● State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?) copies of the tuple

● Use tuples explicitly in the performance-critical sections to see the heap footprint of the code

%% from 9p.erlmixer({rauth,_,_}, {tauth,_,AFid,_,_}, _) -> {write_auth,AFid};mixer({rauth,_,_}, {tauth,_,AFid,_,_,_}, _) -> {write_auth,AFid};mixer({rwrite,_,_}, _, initial) -> start_attaching;mixer({rerror,_,_}, _, initial) -> auth_failed;mixer({rlerror,_,_}, _, initial) -> auth_failed;mixer({rattach,_,Qid}, {tattach,_,Fid,_,_,AName,_}, initial) -> {attach_more,Fid,AName,qid_type(Qid)};mixer({rclunk,_}, {tclunk,_,Fid}, initial) -> {forget,Fid};


Garbage collection is key to speed● Heap is a list of chunks

● 'new heap' is close to its head, 'old heap' - to its tail

● A GC run takes 10μs on average

● GC may run 1000s times per second

● How to tackle GC-related issues:

– (Priority 1) Call erlang:garbage_collect() at strategic points

– (Priority 2) For the fastest code avoid GC completely – restart the fast process regularly

– (Priority 3) Use fullsweep_after option


gen_server vs barebone process ● Message passing using gen_server:call() is 2x slower

than Pid ! Msg

● For speedy code prefer barebone processes to gen_servers

● Design Principles are about high availability, not high performance


NIFs: more pain than gain● A new principle of Erlang development: do not use NIFs

● For a small performance boost, NIFs undermine key properties of Erlang: reliability and soft-realtime guarantees

● Most of the time Erlang code can be made as fast as C

● Most of performance problems of Erlang are traceable to NIFs, or external C libraries, which are similar

● Erlang on Xen does not have NIFs and we do not plan to add them


Fast counters● 32-bit or 64-bit unsigned integer counters with overflow - trivial

in C, not easy in Erlang

● FIXNUMs are signed 29-bit integers, BIGNUMs consume heap and 10-100x slower

● Use two variables for a counter? foo(C1, 16#ffffff, ...) →foo(C1+1, 0, ...);

foo(C1, C2, ...) ->foo(C1, C2+1, ...);


● Erlang on Xen has a new experimental feature – fast counters:

erlang:new_counter(Bits) -> Referlang:increment_counter(Ref, Incr)erlang:read_counter(Ref)erlang:release_counter(Ref)



??? ??
