Performance analysis of network-bound MPC

Performance analysis of network-bound MPC

Reimo Rebane1,2

1 University of Tartu, Institute of Computer Science2 Software Technology and Applications Competence Center

Abstract.

1 Introduction

Today, as more and more applications move towards the cloud, privacy becomesmore of an issue. You have to trust a third party to keep your secrets. Onepossible solution to this problem is to use secure multi-party computation (MPC)schemes. MPC can be used to jointly compute the result of a public functionbetween multiple parties, without revealing more than the corresponding inputand output to any of the parties. It has a potential use for a wide range ofapplications, from voting to privately comparing genetic information, and hasbeen shown to be a useful tool for solving practical large-scale problems [4].

Even though MPC has a large potential, it is often though as inefficientto perform certain tasks. Multi-party computation can be performed in twocommon ways, both with their own limitations. First, circuit evaluation, usingboolean or arithmetic circuits, which minimizes the communication between theparties but is computationally expensive (CPU-bound). Second, general multi-party computation, which requires a lot of communication between the parties(network-bound). In this paper we are interested in the second case, lookingat what the actual volume of the traffic is and how the characteristics of thecommunication affect the performance of MPC protocols, depending on theirstructure.

We perform our measurements on the Sharemind secure computation frame-work [2]. In a standard deployment setting, Sharemind is composed of a virtualmachine, consisting of three separate computing nodes, and controller applica-tions, connecting to the virtual machine and requesting computations. The datais divided between the three computing nodes using an additive secret shar-ing scheme. The framework also includes profiling tools to perform our mea-surements. Later, the learned information can be used to estimate the cost ofdeploying similar MPC systems in a cloud environment.

The rest of the paper is structures as follows: In Section 2, we show howdifferent network engine version affect the performance of Sharemind protocols.A description of our experiments is given in Section 3, followed by a discussion ofthe results in Section 4. Section 5 illustrates how the structure of the protocolsshapes their traffic. In Section 6 we conclude the paper.

2 Cost of Secure Transport

The secure channels used between the computing nodes have a noticeable impacton the performance of a protocol, albeit the increase being constant comparedto the size of the input vectors. The Sharemind framework uses the RakNet3

library as its networking engine. The library provides a possibility to use secureconnections with efficient 256-bit elliptic curve key assignment and the ChaChastream cipher [1], designed for high performance use. The overhead generatedby the secure connections is additional computation time and up to 11 bytes ofadditional data per packet. The described security was introduced in RakNetversion 4.

The implications of using secure connections can be seen on Figure 1, show-ing the multiplication protocol execution performance on one of the computingnodes. The results are shown for both RakNet 3 and RakNet 4, where the latterversion is measured with and without secure connections. The figure shows thatthe computation time for a single operation decreases as the size of the inputvectors increase. However, because the communication layer becomes saturated,the computation time per operation becomes level as the vector size increasesfurther. The secure connections have similar impact on other protocols, whichrely heavily on communication between the parties.

More precise results for a set of protocols are shown in Table 1. The tableshows computation time for a single operation when working on vectors with oneelement and when computing with vector sizes where the communication layerbecomes saturated. The vector size in the saturation point is also given. For allof the sample protocols in the table, the Sharemind with RakNet 4 and enabledsecure channels takes more time to compute than with the secure communicationdisabled.

Number of parallel operations

Tim

e pe

r op

erat

ion

in m

illis

econ

ds

10−2

10−1

100

101 ●

●

●

●

●

● ●● ●●●

●●● ● ● ● ●●●●●

● ● ● ● ●●●●●

100 101 102 103 104 105 106

Mult● RakNet3

RakNet4RakNet4 sec. ch.

Fig. 1. Multiplication protocol time per operation

3 RakNet – Multiplayer game network engine, http://www.jenkinssoftware.com

Protocol RakNet version Single op. Saturated pt. Saturated op.

BitExtr RakNet 3 85 ms 2731 53 µsRakNet 4 84 ms 1507 78 µs

RakNet 4 sec. ch. 129 ms 1614 107 µs

ShareConv RakNet 3 10.2 ms 10327 1.3 µsRakNet 4 10.1 ms 14161 1.0 µs

RakNet 4 sec. ch. 14.5 ms 16388 1.3 µs

Mult RakNet 3 13.1 ms 5555 2.9 µsRakNet 4 10.3 ms 4757 3.0 µs

RakNet 4 sec. ch. 12.6 ms 4719 3.6 µs

Table 1. Overview of performance of the protocols depending on the RakNet version

3 Experiment Descriptions

3.1 Measurement Techniques

The Sharemind framework uses an execution profiler to measure various aspectsof the running virtual machine. The profiler logs the time spent on an operation,the type of the operation and the sizes of the vectors the operation is performedon. Besides getting a total time estimate for an operation, the measurement logscan be further analyzed for a more granular overview of how much time wasspent on processing incoming and outgoing packets, on computing randomnessand on database transactions. However, additional measurements were neededfor out work.

To create a model of how the protocol execution depends on the networkparameters, we need better statistics about the network, more precisely we needto know the bandwidth and latency during protocol runtime. Logging was addedfor the inbound and outbound traffic between the miner nodes. The traffic inboth ways is measured in two aspects: the total bytes sent/received and themessage bytes sent/received. While the message bytes contain only the data thatwe explicitly specified to be sent, the total bytes also include the overhead andthe acknowledgement packets. From the additional information, we can computethe cost of a single operation, based on the size of the input vector.

For latency analysis, the miners continuously log the ping time to the othercomputing nodes. For additional information we also log the traffic amount re-ceived and sent, which is measured over the last second.

3.2 Latency Measurements

In addition to gathering the information described in the last section, we con-ducted a separate experiment to get an overview of the distribution of latency. Acustom protocol was created for this purpose. The protocol generates a random

vector of a specific size and sends it to another node, which immediately returnsthe same vector to the sender. The round trip time and the size of the vectorare logged. Note that only two parties are involved in this protocol.

The protocol was run in the two following settings. In the first case, theminers were executing only our protocol, however in the second case second,the miners were additionally doing computations on other protocols and thussimulating normal work. For both of the settings we varied the batch size of theminers. The batch size is used to configure the maximum size of the vectors thatthe virtual machine processes at once. Larger vectors are broken up into smallerones. The results of this experiment are discussed in Section 4.1.

3.3 Protocol Performance Measurements

For each individual protocol in Sharemind we looked at the cost of a singleoperation when computing on one element vector and also on vector sizes wherethe communication channel becomes saturated. In the first case, the computationand communication overhead per operation is the largest, whereas in the secondcase it is the lowest. The cost is evaluated in two aspects, time and generatedamount of traffic.

First, we measured the time performance of the protocols, which also gaveus the exact vector sizes for the saturation points of the protocols. The vectorsizes were then used to measure the traffic and latency on the saturation point.The same measurements were made for the one value vector. The results of themeasurements are discussed in Section 4.2.

4 Experiment Results

4.1 Latency Analysis

The results for a batch size of 100000 are illustrated in Figure 2 and Figure 3.Both figures show the latency distribution for vector sizes of 100000 and 1000000.Figure 4 shows the same results for a batch size of 1000. It can be seen that thelower patch size increases the latency greatly for a higher vector size.

Latency depending on the vector size is plotted in Figure 5 for a batch sizeof 1000 and 100000. From the figures we see that the ping stats going up rapidlyafter a vector size of 100000. However, we are only getting significant resultsstarting from vector sizes of 100000 because the network between our testingmachines handles the smaller vectors with no difficulties. Further tests have tobe conducted with artificially introducing delay to the network for better results.Also, additional experiments for different batch sizes should be carried out.

The distribution model for the latency remains an open question.

4.2 Protocol Performance Analysis

The Sharemind framework provides secure multi-party computation protocolsfor the following operations: addition, multiplication, division, conversion of sin-gle bit shares to general shares, bit share extraction, bitwise addition, greater

Vector size 100000

Round trip (ms)

dens

ity

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

105 110 115 120 125

Vector size 1000000

Round trip (ms)de

nsity

0.00

0.01

0.02

0.03

0.04

0.05

0.06

730 740 750 760 770

Fig. 2. Ping distribution for batch size of 100000

Vector size 100000

Round trip (ms)

dens

ity

0.000

0.002

0.004

0.006

0.008

50 100 150 200 250 300 350

Vector size 1000000

Round trip (ms)

dens

ity

0.0000

0.0005

0.0010

0.0015

0.0020

0.0025

0.0030

0.0035

400 500 600 700 800 9001000

Fig. 3. Ping distribution with background computations for batch size of 100000

Vector size 100000

Round trip (ms)

dens

ity

0.00

0.02

0.04

0.06

0.08

0.10

0.12

50 60 70 80 90

Vector size 1000000

Round trip (ms)de

nsity

0.0000

0.0005

0.0010

0.0015

1800200022002400260028003000

Fig. 4. Ping distribution with background computations for batch size of 1000

Vector size (el)

Med

ian

ping

(m

s)

101

101.5

102

102.5

103

● ● ● ●●

●

●

100 101 102 103 104 105 106

(a) Batch size 1000

Vector size (el)

Med

ian

ping

(m

s)

101.8

102

102.2

102.4

●

●●

●

●

●

●

100 101 102 103 104 105 106

(b) Batch size 100000

Fig. 5. Ping estimation depending on vector size

than comparison and equal comparison. Table 2 shows the performance resultsfor all of the protocols.

The theoretical complexities of the protocols have been analyzed in [3] and areshown in Table 3. In our case the parameters corresponding to the complexitiesare as follows n = 32, ` = 5, n′ = 37 and m = 254. For discussion how theparameters were found, refer to the cited paper.

ProtocolTime Traffic

Single op. Saturated pt. Saturated op. Single op. Saturated op.

Add NA 68919 0.015 µs 0 B/op 4.69e− 07 B/op

BitAdd NA 14092 11.3 µs

BitExtr 129 ms 1615 107 µs 1209 B/op 322 B/op

ShareConv 14.5 ms 16388 1.3 µs 104 B/op 6.61 B/op

Div 620 ms 1745 517 µs 4501 B/op 1871 B/op

PubDiv 95 ms 3605 50.3 µs 1080 B/op 372 B/op

Equal 91 ms 22776 6.0 µs 560 B/op 25.9 B/op

ShiftR 106.9 ms 3646 39.9 µs 734 B/op 135 B/op

Mult 12.6 ms 4719 3.7 µs 131 B/op 19.8 B/op

PubMult NA 179390 0.006 µs 0 B/op 3.34e− 08 B/op

Table 2. Performance of individual protocols

Protocol Rounds Communication

Mult 1 15n

ShareConv 2 5n+ 4

Equal `+ 2 22n+ 6

ShiftR `+ 3 12(`+ 4)n+ 16

BitExtr `+ 3 5n2 + 12(`+ 1)n

PubDiv `+ 4 (108 + 30`)n+ 18

Div 4`+ 9 2mn+ 6m`+ 39`n+ 35`n′ + 126n+ 32n′ + 24

Table 3. Complexities of protocols

4.3 Running Time Results Validation

The individual protocol running times are compared to the theoretical resultsin Table 4. We can see that the running times per round varies greatly betweenthe protocols and there is no strong relationship between the parameters. Thisindicates that the rounds take different amounts of time.

Protocol Rounds Single op. Sat. op. Single op./round Sat. op./round

Mult 1 12.6 ms 3.7 µs 12.6 ms 3.7 µs

ShareConv 2 14.5 ms 1.3 µs 7.25 ms 0.65 µs

Equal 7 91 ms 6.0 µs 13 ms 0.9 µs

ShiftR 8 106.9 ms 39.9 µs 13.4 ms 5.0 µs

BitExtr 8 129 ms 107 µs 16.1 ms 13.4 µs

PubDiv 9 95 ms 50.3 µs 10.6 ms 5.59 µs

Div 29 620 ms 517 µs 21.4 ms 17.8 µs

Table 4. Running time comparison with theoretical results

4.4 Traffic Cost Results Validation

The traffic measurements are compared to the theoretical results in Table 5. Thetable illustrates that there is a strong relationship between the communicationcomplexity and the generated traffic per operation. The relationship is strongerwhen the communication channel is saturated. With the single element vector,it is more apparent for protocols with higher communication cost.

Protocol Comm. Single op. Sat. op. Single op/comm. Sat. op./comm.

Mult 480 131.4 B 19.8 B 0.27 B 0.041 B

ShareConv 164 103.6 B 6.61 B 0.63 B 0.040 B

Equal 710 560.4 B 25.9 B 0.79 B 0.036 B

ShiftR 3472 733.5 B 135 B 0.21 B 0.038 B

BitExtr 7424 1210 B 322 B 0.16 B 0.043 B

PubDiv 8274 1080 B 372 B 0.13 B 0.044 B

Div 41831 4500 B 1871 B 0.11 B 0.044 B

Table 5. Communication comparison with theoretical results

5 Protocol Structures

In this section we show how the protocol structure shapes its traffic. This is wellillustrated in Figure 7, showing, for each computing node, the sent traffic for thebit share extraction protocol. The traffic structure varies greatly between theminers, with smooth traffic for the first miner but bursty traffic for the secondand third miner. Similar graphs are shown for the other protocols. The resultsindicate that optimizations to some of the protocols might be possible if idletime in the traffic is reduced.

Miner 1

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

1000 2000 3000 4000 5000 6000

Miner 2

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

1000 2000 3000 4000 5000 6000

Miner 3

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

1000 2000 3000 4000 5000 6000

Legend

Previous

Next

Fig. 6. Bitwise addition

Miner 1

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

10000 20000 30000 40000 50000

Miner 2

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

10000 20000 30000 40000 50000

Miner 3

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

10000 20000 30000 40000 50000

Legend

Previous

Next

Fig. 7. Bit share extraction

Miner 1

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

100 200 300 400 500 600 700 800

Miner 2

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

100 200 300 400 500 600 700

Miner 3

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

100 200 300 400 500 600 700 800

Legend

Previous

Next

Fig. 8. Share conversion

Miner 1

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

1e+052e+053e+054e+055e+056e+057e+05

Miner 2

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

1e+052e+053e+054e+055e+056e+057e+05

Miner 3

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

1e+052e+053e+054e+055e+056e+057e+05

Legend

Previous

Next

Fig. 9. Division

Miner 1

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

10000 20000 30000

Miner 2

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

10000 20000 30000

Miner 3

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

5000100001500020000250003000035000

Legend

Previous

Next

Fig. 10. Public division

Miner 1

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

1000 2000 3000 4000

Miner 2

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

1000 2000 3000 4000

Miner 3

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

1000 2000 3000 4000

Legend

Previous

Next

Fig. 11. Equal comparison

Miner 1

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

2000 4000 6000 8000 10000

Miner 2

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

2000 4000 6000 8000 10000

Miner 3

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

2000 4000 6000 8000 10000

Legend

Previous

Next

Fig. 12. Greater than comparison

Miner 1

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

500 1000 1500 2000

Miner 2

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

500 1000 1500 2000

Miner 3

Time (ms)

Traf

fic o

ut (

B)

100

102

104

106

108

1010

1012

500 1000 1500 2000

Legend

Previous

Next

Fig. 13. Multiplication

6 Conclusion

In this paper we have analyzed the network impact of secure multi-party com-putation protocols of the Sharemind framework. For each of the protocols wemeasured its running time and the amount of traffic it generated. We comparedthe measurements to the theoretical results and found that the communicationcomplexity and the volume of the traffic have a strong relationship. However, nosuch relationship was found between the running time and the round complexityof the protocols. Latency analysis was made, where we looked at the distributionof the ping when varying the batch and the vector size.

For future work, a model could be constructed to predict the running timesof the protocols when knowing the latency and the bandwidth of the network.Additionally, we would like estimate the cost of deploying network-bound multi-party computation systems on the cloud. Also, we might be able to use theresults in this work to find optimization possibilities for the protocols.

References

1. D.J. Bernstein. ChaCha, a variant of Salsa20. http://cr.yp.to/chacha.html, 2008.2. Dan Bogdanov, Sven Laur, and Jan Willemson. Sharemind: A Framework for Fast

Privacy-Preserving Computations. In Sushil Jajodia and Javier Lopez, editors,Computer Security – ESORICS 2008, volume 5283 of Lecture Notes in ComputerScience, pages 192–206. Springer Berlin / Heidelberg, 2008.

3. Dan Bogdanov, Margus Niitsoo, Tomas Toft, and Jan Willemson. High-performancesecure multi-party computation for data mining applications., 2011. Unpublished.

4. P. Bogetoft, D. Christensen, I. Damg̊ard, M. Geisler, T. Jakobsen, M. Krøigaard,J. Nielsen, J. Nielsen, K. Nielsen, J. Pagter, et al. Secure multiparty computationgoes live. Financial Cryptography and Data Security, pages 325–343, 2009.

Documents

Performance analysis of network-bound MPC