Click here to load reader
Upload
viktor-sovietov
View
1.077
Download
2
Embed Size (px)
DESCRIPTION
Considers optimizations allow to reach microseconds latencies and GBs throughput in intelligent network management solution written in Erlang
Citation preview
Optimizing Erlang code for speedRevelations from a real-world project based on Erlang on Xen
ErlangDripro2014
Maxim KharchenkoCTO, Cloudozer [email protected]
The road map● Erlang on Xen intro
● Speed-related notes
– Arguments are registers
– ETS tables are (mostly) ok
– Do not overuse records
– GC is key to speed
– gen_server vs. barebone process
– NIFS: more pain than gain
– Fast counters● Q&A
3
Erlang on Xen 101● A new Erlang runtime that runs without OS
● Conceived in 2009
● Highly-compatible with Erlang/OTP
● Built from scratch, not a “port”
● Optimised for low startup latency
● Not an open source (yet)
● The public build service is free
Go to erlangonxen.org
4
Zerg demo: zerg.erlangonxen.org
The road map● Erlang on Xen intro
● Speed-related notes
– Arguments are registers
– ETS tables are (mostly) ok
– Do not overuse records
– GC is key to speed
– gen_server vs. barebone process
– NIFS: more pain than gain
– Fast counters● Q&A
6
Arguments are registers
● Many arguments do not make a function any slower
● Do not reshuffle arguments:
animal(batman = Cat, Dog, Horse, Pig, Cow, State) ->feed(Cat, Dog, Horse, Pig, Cow, State);
animal(Cat, deli = Dog, Horse, Pig, Cow, State) ->pet(Cat, Dog, Horse, Pig, Cow, State);
...
%% SLOWanimal(Cat, Dog, Horse, Pig, Cow, State) ->
feed(Goat, Cat, Dog, Horse, Pig, Cow, State);...
7
ETS tables are (mostly) ok● A small ETS table lookup = 10x function activations
● Do not use ets:tab2list() inside tight loops
● Treat ETS as a database; not a pool of global variables
● 1-2 ETS lookups on the fast path are ok
● Beware that ets:lookup(), etc create a copy of the data on the heap of the caller, similarly to message passing
8
Do not overuse records● selelement() creates a copy of the tuple
● State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?) copies of the tuple
● Use tuples explicitly in the performance-critical sections to see the heap footprint of the code
%% from 9p.erlmixer({rauth,_,_}, {tauth,_,AFid,_,_}, _) -> {write_auth,AFid};mixer({rauth,_,_}, {tauth,_,AFid,_,_,_}, _) -> {write_auth,AFid};mixer({rwrite,_,_}, _, initial) -> start_attaching;mixer({rerror,_,_}, _, initial) -> auth_failed;mixer({rlerror,_,_}, _, initial) -> auth_failed;mixer({rattach,_,Qid}, {tattach,_,Fid,_,_,AName,_}, initial) -> {attach_more,Fid,AName,qid_type(Qid)};mixer({rclunk,_}, {tclunk,_,Fid}, initial) -> {forget,Fid};
9
Garbage collection is key to speed● Heap is a list of chunks
● 'new heap' is close to its head, 'old heap' - to its tail
● A GC run takes 10μs on average
● GC may run 1000s times per second
● How to tackle GC-related issues:
– (Priority 1) Call erlang:garbage_collect() at strategic points
– (Priority 2) For the fastest code avoid GC completely – restart the fast process regularly
– (Priority 3) Use fullsweep_after option
10
gen_server vs barebone process ● Message passing using gen_server:call() is 2x slower
than Pid ! Msg
● For speedy code prefer barebone processes to gen_servers
● Design Principles are about high availability, not high performance
11
NIFs: more pain than gain● A new principle of Erlang development: do not use NIFs
● For a small performance boost, NIFs undermine key properties of Erlang: reliability and soft-realtime guarantees
● Most of the time Erlang code can be made as fast as C
● Most of performance problems of Erlang are traceable to NIFs, or external C libraries, which are similar
● Erlang on Xen does not have NIFs and we do not plan to add them
12
Fast counters● 32-bit or 64-bit unsigned integer counters with overflow - trivial
in C, not easy in Erlang
● FIXNUMs are signed 29-bit integers, BIGNUMs consume heap and 10-100x slower
● Use two variables for a counter? foo(C1, 16#ffffff, ...) →foo(C1+1, 0, ...);
foo(C1, C2, ...) ->foo(C1, C2+1, ...);
...
● Erlang on Xen has a new experimental feature – fast counters:
erlang:new_counter(Bits) -> Referlang:increment_counter(Ref, Incr)erlang:read_counter(Ref)erlang:release_counter(Ref)
13
Questions?
??? ??