13

Click here to load reader

Optimizing Erlang Code for Speed

Embed Size (px)

DESCRIPTION

Considers optimizations allow to reach microseconds latencies and GBs throughput in intelligent network management solution written in Erlang

Citation preview

Page 1: Optimizing Erlang Code for Speed

Optimizing Erlang code for speedRevelations from a real-world project based on Erlang on Xen

ErlangDripro2014

Maxim KharchenkoCTO, Cloudozer [email protected]

Page 2: Optimizing Erlang Code for Speed

The road map● Erlang on Xen intro

● Speed-related notes

– Arguments are registers

– ETS tables are (mostly) ok

– Do not overuse records

– GC is key to speed

– gen_server vs. barebone process

– NIFS: more pain than gain

– Fast counters● Q&A

Page 3: Optimizing Erlang Code for Speed

3

Erlang on Xen 101● A new Erlang runtime that runs without OS

● Conceived in 2009

● Highly-compatible with Erlang/OTP

● Built from scratch, not a “port”

● Optimised for low startup latency

● Not an open source (yet)

● The public build service is free

Go to erlangonxen.org

Page 4: Optimizing Erlang Code for Speed

4

Zerg demo: zerg.erlangonxen.org

Page 5: Optimizing Erlang Code for Speed

The road map● Erlang on Xen intro

● Speed-related notes

– Arguments are registers

– ETS tables are (mostly) ok

– Do not overuse records

– GC is key to speed

– gen_server vs. barebone process

– NIFS: more pain than gain

– Fast counters● Q&A

Page 6: Optimizing Erlang Code for Speed

6

Arguments are registers

● Many arguments do not make a function any slower

● Do not reshuffle arguments:

animal(batman = Cat, Dog, Horse, Pig, Cow, State) ->feed(Cat, Dog, Horse, Pig, Cow, State);

animal(Cat, deli = Dog, Horse, Pig, Cow, State) ->pet(Cat, Dog, Horse, Pig, Cow, State);

...

%% SLOWanimal(Cat, Dog, Horse, Pig, Cow, State) ->

feed(Goat, Cat, Dog, Horse, Pig, Cow, State);...

Page 7: Optimizing Erlang Code for Speed

7

ETS tables are (mostly) ok● A small ETS table lookup = 10x function activations

● Do not use ets:tab2list() inside tight loops

● Treat ETS as a database; not a pool of global variables

● 1-2 ETS lookups on the fast path are ok

● Beware that ets:lookup(), etc create a copy of the data on the heap of the caller, similarly to message passing

Page 8: Optimizing Erlang Code for Speed

8

Do not overuse records● selelement() creates a copy of the tuple

● State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?) copies of the tuple

● Use tuples explicitly in the performance-critical sections to see the heap footprint of the code

%% from 9p.erlmixer({rauth,_,_}, {tauth,_,AFid,_,_}, _) -> {write_auth,AFid};mixer({rauth,_,_}, {tauth,_,AFid,_,_,_}, _) -> {write_auth,AFid};mixer({rwrite,_,_}, _, initial) -> start_attaching;mixer({rerror,_,_}, _, initial) -> auth_failed;mixer({rlerror,_,_}, _, initial) -> auth_failed;mixer({rattach,_,Qid}, {tattach,_,Fid,_,_,AName,_}, initial) -> {attach_more,Fid,AName,qid_type(Qid)};mixer({rclunk,_}, {tclunk,_,Fid}, initial) -> {forget,Fid};

Page 9: Optimizing Erlang Code for Speed

9

Garbage collection is key to speed● Heap is a list of chunks

● 'new heap' is close to its head, 'old heap' - to its tail

● A GC run takes 10μs on average

● GC may run 1000s times per second

● How to tackle GC-related issues:

– (Priority 1) Call erlang:garbage_collect() at strategic points

– (Priority 2) For the fastest code avoid GC completely – restart the fast process regularly

– (Priority 3) Use fullsweep_after option

Page 10: Optimizing Erlang Code for Speed

10

gen_server vs barebone process ● Message passing using gen_server:call() is 2x slower

than Pid ! Msg

● For speedy code prefer barebone processes to gen_servers

● Design Principles are about high availability, not high performance

Page 11: Optimizing Erlang Code for Speed

11

NIFs: more pain than gain● A new principle of Erlang development: do not use NIFs

● For a small performance boost, NIFs undermine key properties of Erlang: reliability and soft-realtime guarantees

● Most of the time Erlang code can be made as fast as C

● Most of performance problems of Erlang are traceable to NIFs, or external C libraries, which are similar

● Erlang on Xen does not have NIFs and we do not plan to add them

Page 12: Optimizing Erlang Code for Speed

12

Fast counters● 32-bit or 64-bit unsigned integer counters with overflow - trivial

in C, not easy in Erlang

● FIXNUMs are signed 29-bit integers, BIGNUMs consume heap and 10-100x slower

● Use two variables for a counter? foo(C1, 16#ffffff, ...) →foo(C1+1, 0, ...);

foo(C1, C2, ...) ->foo(C1, C2+1, ...);

...

● Erlang on Xen has a new experimental feature – fast counters:

erlang:new_counter(Bits) -> Referlang:increment_counter(Ref, Incr)erlang:read_counter(Ref)erlang:release_counter(Ref)

Page 13: Optimizing Erlang Code for Speed

13

Questions?

??? ??