Optimizing Erlang Code for Speed

Preview:

DESCRIPTION

Considers optimizations allow to reach microseconds latencies and GBs throughput in intelligent network management solution written in Erlang

Citation preview

Optimizing Erlang code for speedRevelations from a real-world project based on Erlang on Xen

ErlangDripro2014

Maxim KharchenkoCTO, Cloudozer LLPmk@cloudozer.com

The road map● Erlang on Xen intro

● Speed-related notes

– Arguments are registers

– ETS tables are (mostly) ok

– Do not overuse records

– GC is key to speed

– gen_server vs. barebone process

– NIFS: more pain than gain

– Fast counters● Q&A

3

Erlang on Xen 101● A new Erlang runtime that runs without OS

● Conceived in 2009

● Highly-compatible with Erlang/OTP

● Built from scratch, not a “port”

● Optimised for low startup latency

● Not an open source (yet)

● The public build service is free

Go to erlangonxen.org

4

Zerg demo: zerg.erlangonxen.org

The road map● Erlang on Xen intro

● Speed-related notes

– Arguments are registers

– ETS tables are (mostly) ok

– Do not overuse records

– GC is key to speed

– gen_server vs. barebone process

– NIFS: more pain than gain

– Fast counters● Q&A

6

Arguments are registers

● Many arguments do not make a function any slower

● Do not reshuffle arguments:

animal(batman = Cat, Dog, Horse, Pig, Cow, State) ->feed(Cat, Dog, Horse, Pig, Cow, State);

animal(Cat, deli = Dog, Horse, Pig, Cow, State) ->pet(Cat, Dog, Horse, Pig, Cow, State);

...

%% SLOWanimal(Cat, Dog, Horse, Pig, Cow, State) ->

feed(Goat, Cat, Dog, Horse, Pig, Cow, State);...

7

ETS tables are (mostly) ok● A small ETS table lookup = 10x function activations

● Do not use ets:tab2list() inside tight loops

● Treat ETS as a database; not a pool of global variables

● 1-2 ETS lookups on the fast path are ok

● Beware that ets:lookup(), etc create a copy of the data on the heap of the caller, similarly to message passing

8

Do not overuse records● selelement() creates a copy of the tuple

● State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?) copies of the tuple

● Use tuples explicitly in the performance-critical sections to see the heap footprint of the code

%% from 9p.erlmixer({rauth,_,_}, {tauth,_,AFid,_,_}, _) -> {write_auth,AFid};mixer({rauth,_,_}, {tauth,_,AFid,_,_,_}, _) -> {write_auth,AFid};mixer({rwrite,_,_}, _, initial) -> start_attaching;mixer({rerror,_,_}, _, initial) -> auth_failed;mixer({rlerror,_,_}, _, initial) -> auth_failed;mixer({rattach,_,Qid}, {tattach,_,Fid,_,_,AName,_}, initial) -> {attach_more,Fid,AName,qid_type(Qid)};mixer({rclunk,_}, {tclunk,_,Fid}, initial) -> {forget,Fid};

9

Garbage collection is key to speed● Heap is a list of chunks

● 'new heap' is close to its head, 'old heap' - to its tail

● A GC run takes 10μs on average

● GC may run 1000s times per second

● How to tackle GC-related issues:

– (Priority 1) Call erlang:garbage_collect() at strategic points

– (Priority 2) For the fastest code avoid GC completely – restart the fast process regularly

– (Priority 3) Use fullsweep_after option

10

gen_server vs barebone process ● Message passing using gen_server:call() is 2x slower

than Pid ! Msg

● For speedy code prefer barebone processes to gen_servers

● Design Principles are about high availability, not high performance

11

NIFs: more pain than gain● A new principle of Erlang development: do not use NIFs

● For a small performance boost, NIFs undermine key properties of Erlang: reliability and soft-realtime guarantees

● Most of the time Erlang code can be made as fast as C

● Most of performance problems of Erlang are traceable to NIFs, or external C libraries, which are similar

● Erlang on Xen does not have NIFs and we do not plan to add them

12

Fast counters● 32-bit or 64-bit unsigned integer counters with overflow - trivial

in C, not easy in Erlang

● FIXNUMs are signed 29-bit integers, BIGNUMs consume heap and 10-100x slower

● Use two variables for a counter? foo(C1, 16#ffffff, ...) →foo(C1+1, 0, ...);

foo(C1, C2, ...) ->foo(C1, C2+1, ...);

...

● Erlang on Xen has a new experimental feature – fast counters:

erlang:new_counter(Bits) -> Referlang:increment_counter(Ref, Incr)erlang:read_counter(Ref)erlang:release_counter(Ref)

13

Questions?

??? ??

Recommended