1
Author: Florencia Irena Supervisor: Professor Sri Parameswaran Introduction: Homomorphic Processor CryptoBlaze Architecture Results To keep its confidentiality, cloud data has to be stored in encrypted format. As a result, cloud processors have to be able to compute on encrypted data. These processors can be located anywhere physically and should not be allowed to access the real value of data (plaintext). Hence, a special method is required to allow computation on encrypted data without decryption process, while maintaining the correct computation on the plaintext domain. This special method is called “homomorphic encryption”, which requires specific sequences of operations on encrypted data to be equivalent with operation in plaintext domain. The sequences of operations vary between different homomorphic algorithms, but require expensive computations in general. This motivates the creation of processors that support homomorphic operations on hardware level. Several architectures, such as HEROIC and FURISC, have existed to try solving this problem. Each has its own advantages and disadvantages. Based on these, we are building a new homomorphic processor to try improving the performance and security of existing designs. Our architecture, named CryptoBlaze, is a multiple instruction machine that supports homomorphic operation on the basis of Paillier cryptographic algorithm. Design Overview Homomorphic addition support using Paillier Encryption Algorithm: “A” x “B” mod n 2 = “A + B”, where n = public key. Non-deterministic encryption: same data can be encrypted into different values 8 instructions support for encrypted data to ensure Turing completeness: data memory load/store, homomorphic addition, homomorphic subtraction, register and public key copy, branching on negative or non-negative Implementation Built on top of MicroBlaze architecture, with 5 stages pipeline. Encrypted data size = 4b, where b = the size of public key (security parameter) n. The actual size of an encrypted value is 2b as Paillier encryption is modulo n 2 based. To support homomorphic subtraction, CryptoBlaze stores data as “X”:”-X” pair, where “X” corresponds to 2b bits encryption of real data, and “-X” is 2b bits encryption of the negation. 32 x 4b eRegister file, a special 4b keyRegister file (to hold the value of n 2 ), program and data memory Data memory load/store via 32-bit AXI bus, requires 4b/32 cycles that will stall the processor. CryptoBlaze Performance: No Decryption on MicroBlaze (Server Side), Using Pre-Computed Array for Branching Body text: Arial font, 24pt Conclusion and Future Work CryptoBlaze is a multiple instruction machine that works on the basis of non-deterministic Paillier Cryptosystem. It supports 8 instructions for encrypted data, including homomorphic addition, subtraction, and branching, which ensures Turing completeness. As homomorphic operations are expensive, CryptoBlaze is designed to work with various ALU sizes to support multicycle operations. ALU size of 512 bits is found to be most optimum for big security key sizes. In general, CryptoBlaze performs faster compared to existing homomorphic processors (HEROIC and FURISC), given an efficient decryption process on server-side to compute branching decision. In terms of security, CryptoBlaze is more robust in keeping data confidentiality compare to HEROIC due to non-deterministic property. However, the client-server communication link might lead into security abuses, and therefore need some protective measures, such as authorisation and limit, to ensure higher security. Aside from improving security, a new homomorphic architecture can be built using other homomorphic encryption schemes, either partially homomorphic or fully homomorphic, to be compared with CryptoBlaze performance. Effort into increasing speed of decryption process can also help to improve the performance of CryptoBlaze. Experimental Setup Homomorphic Cryptographic Processor Design and Implementation: The CryptoBlaze Architecture We vary security parameter size (b) and ALU size (k) to assess CryptoBlaze performance and determine the most efficient hardware configuration by measuring maximum frequency and number of cycles taken to execute benchmark programs on Xilinx Virtex-6 ML605 FPGA. The client-server communication for branching is modelled by having MicroBlaze as server to determine branching condition, which is connected to CryptoBlaze (client) via 32- bit AXI bus. As decryption process can be expensive, our testing mainly uses a pre-computed array of branching in server side to make it O(1) operation. We then compare the results with those using decryption process (a basic and unoptimised version) to analyse the effect of having server-side decryption to CryptoBlaze performance. Most Efficient Configuration for each Security Parameter Size b b (bits) Most Efficient k (bits) Max Freq (MHz) Bubble Sort Latency (ms) Throughput/Area (/s.%area) 32 128 125 24.78 576.54 64 256 125 32.88 253.21 128 512 80 129.48 64.35 256 256 110 456.99 10.42 384 512 60 935.90 3.56 512 512 60 1663.46 1.59 1024 128* 60* 19757.83* 0.07* * = might not be most efficient as build fails for higher ALU sizes (k) due to out of memory from synthesis tool. CryptoBlaze Performance: Decryption vs No Decryption on Server Side Server Side Branching Program Cycles Taken For CompuJng Factorial of “9” b = 32, k = 32 b = 32, k = 64 b = 64, k = 32 b = 64, k = 64 No Decryp6on (PreComputed Array) 65741 41546 229757 130477 With Decryp6on (Basic, Inefficient) 980893237 980869042 8873259035 8873159755 CryptoBlaze vs HEROIC: #Instructions Architecture Number of Executed InstrucJons Fibonacci Factorial Bubble Sort HEROIC 1617294 1011994 1882234 CryptoBlaze 898 5657 112882 CryptoBlaze vs HEROIC vs FURISC Architecture + HEROIC Par$ally Homomorphic Encrypted Instruc6ons Fast Single instruc6on Determinis6c (Large LUTs Lookup) FURISC Fully Homomorphic Encrypted Instruc6ons Support all opera6ons Nondeterminis6c Single instruc6on Slow due to recryp6on and bitwise cryptosystem CryptoBlaze Par$ally Homomorphic Nondeterminis6c “Fast”, given efficient serverside decryp6on Mul6ple instruc6ons Performance highly dependent on decryp6on Security: Clientserver communica6on abuse and unencrypted instruc6ons We have successfully built CryptoBlaze for security parameter sizes (b) ranging from 32 to 1024 bits. For each key size, we vary ALU sizes (k) to examine trade-off between maximum frequency achieved and number of cycles taken for running a program. Each security parameter size has its optimum ALU size. In general, 512 bits security key size gives optimum configuration for large security key size. All built hardware passed tests for correctness. Assuming that the decryption operation in server-side to compute branching is O(1), CryptoBlaze outperforms both HEROIC and FURISC as it supports multiple instructions (except for the case of 1024-bit b, as built failed for high ALU sizes). However, decryption process can be expensive in reality and thus become the most determining factor of CryptoBlaze performance. In terms of security, CryptoBlaze has some security advantages compared to HEROIC, as it utilises non-deterministic encryption which is more robust against security attacks. However, CryptoBlaze does not encrypt program memory, and client-server communication for branching can be abused to gain information about data. CryptoBlaze do not have security advantage against FURISC, but is expected to perform faster. Homomorphic addition and subtraction are supported on the basis of multicycle multiplier and divider (using hardware bit shifts and addition/subtraction), as these operations can be expensive for high bits data. The adders/ subtractors in multiplication and division blocks are designed to be multicycle with customisable ALU size (k). Branching is supported via client-server communication, where CryptoBlaze acts as client that sends encrypted data to a trusted server. The server is then responsible to determine whether the encrypted data corresponds corresponds to a negative number, and sends signal (0/1) back to CryptoBlaze. CryptoBlaze will decide whether to do branching based on this received signal. The server can determine negative data by Paillier decryption process, since server is trusted and has access to the private key.

Author: Florencia Irenahpaik/thesis/showcases/16s2/Florencia_Iren… · Author: Florencia Irena Supervisor: Professor Sri Parameswaran Introduction: Homomorphic Processor CryptoBlaze

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Author: Florencia Irenahpaik/thesis/showcases/16s2/Florencia_Iren… · Author: Florencia Irena Supervisor: Professor Sri Parameswaran Introduction: Homomorphic Processor CryptoBlaze

Author: Florencia Irena Supervisor: Professor Sri Parameswaran

Introduction: Homomorphic Processor

CryptoBlaze Architecture

Results

To keep its confidentiality, cloud data has to be stored in encrypted format. As a result, cloud processors have to be able to compute on encrypted data. These processors can be located anywhere physically and should not be allowed to access the real value of data (plaintext). Hence, a special method is required to allow computation on encrypted data without decryption process, while maintaining the correct computation on the plaintext domain. This special method is called “homomorphic encryption”, which requires specific sequences of operations on encrypted data to be equivalent with operation in plaintext domain. The sequences of operations vary between different homomorphic algorithms, but require expensive computations in general. This motivates the creation of processors that support homomorphic operations on hardware level. Several architectures, such as HEROIC and FURISC, have existed to try solving this problem. Each has its own advantages and disadvantages. Based on these, we are building a new homomorphic processor to try improving the performance and security of existing designs. Our architecture, named CryptoBlaze, is a multiple instruction machine that supports homomorphic operation on the basis of Paillier cryptographic algorithm.

Design Overview •  Homomorphic addition support using Paillier Encryption Algorithm: “A” x “B” mod n2 = “A + B”, where n = public key. •  Non-deterministic encryption: same data can be encrypted into different values •  8 instructions support for encrypted data to ensure Turing completeness: data memory load/store, homomorphic

addition, homomorphic subtraction, register and public key copy, branching on negative or non-negative Implementation •  Built on top of MicroBlaze architecture, with 5 stages pipeline. •  Encrypted data size = 4b, where b = the size of public key (security parameter) n. The actual size of an encrypted

value is 2b as Paillier encryption is modulo n2 based. To support homomorphic subtraction, CryptoBlaze stores data as “X”:”-X” pair, where “X” corresponds to 2b bits encryption of real data, and “-X” is 2b bits encryption of the negation.

•  32 x 4b eRegister file, a special 4b keyRegister file (to hold the value of n2), program and data memory •  Data memory load/store via 32-bit AXI bus, requires 4b/32 cycles that will stall the processor.

CryptoBlaze Performance: No Decryption on MicroBlaze (Server Side), Using Pre-Computed Array for Branching

Body text: Arial font, 24pt

Conclusion and Future Work CryptoBlaze is a multiple instruction machine that works on the basis of non-deterministic Paillier Cryptosystem. It supports 8 instructions for encrypted data, including homomorphic addition, subtraction, and branching, which ensures Turing completeness. As homomorphic operations are expensive, CryptoBlaze is designed to work with various ALU sizes to support multicycle operations. ALU size of 512 bits is found to be most optimum for big security key sizes. In general, CryptoBlaze performs faster compared to existing homomorphic processors (HEROIC and FURISC), given an efficient decryption process on server-side to compute branching decision. In terms of security, CryptoBlaze is more robust in keeping data confidentiality compare to HEROIC due to non-deterministic property. However, the client-server communication link might lead into security abuses, and therefore need some protective measures, such as authorisation and limit, to ensure higher security. Aside from improving security, a new homomorphic architecture can be built using other homomorphic encryption schemes, either partially homomorphic or fully homomorphic, to be compared with CryptoBlaze performance. Effort into increasing speed of decryption process can also help to improve the performance of CryptoBlaze.

Experimental Setup

Homomorphic Cryptographic Processor Design and Implementation: The CryptoBlaze Architecture

We vary security parameter size (b) and ALU size (k) to assess CryptoBlaze performance and determine the most efficient hardware configuration by measuring maximum frequency and number of cycles taken to execute benchmark programs on Xilinx Virtex-6 ML605 FPGA. The client-server communication for branching is modelled by having MicroBlaze as server to determine branching condition, which is connected to CryptoBlaze (client) via 32-bit AXI bus. As decryption process can be expensive, our testing mainly uses a pre-computed array of branching in server side to make it O(1) operation. We then compare the results with those using decryption process (a basic and unoptimised version) to analyse the effect of having server-side decryption to CryptoBlaze performance.

Most Efficient Configuration for each Security Parameter Size b b    

(bits) Most  Efficient  k  

(bits) Max  Freq  (MHz)

Bubble  Sort  Latency  (ms)

Throughput/Area  (/s.%area)

32 128 125 24.78 576.54 64 256 125 32.88 253.21

128 512 80 129.48 64.35 256 256 110 456.99 10.42 384 512 60 935.90 3.56 512 512 60 1663.46 1.59

1024 128* 60* 19757.83* 0.07* * = might not be most efficient as build fails for higher ALU sizes (k) due to

out of memory from synthesis tool.

CryptoBlaze Performance: Decryption vs No Decryption on Server Side

Server  Side  Branching  Program  Cycles  Taken  For  CompuJng  Factorial  of  “9”  

b  =  32,  k  =  32   b  =  32,  k  =  64   b  =  64,  k  =  32   b  =  64,  k  =  64  No  Decryp6on  (Pre-­‐Computed  Array)   65741   41546   229757   130477  

With  Decryp6on  (Basic,  Inefficient)   980893237   980869042   8873259035   8873159755  

CryptoBlaze vs HEROIC: #Instructions

Architecture  Number  of  Executed  InstrucJons  Fibonacci   Factorial   Bubble  Sort  

HEROIC   1617294   1011994   1882234  

CryptoBlaze   898   5657   112882  

CryptoBlaze vs HEROIC vs FURISC Architecture   +   -­‐  

HEROIC  Par$ally  Homomorphic  

•  Encrypted  Instruc6ons  •  Fast  

•  Single  instruc6on  •  Determinis6c  (Large  LUTs  Lookup)  

FURISC  Fully  Homomorphic  

•  Encrypted  Instruc6ons  •  Support  all  opera6ons  •  Non-­‐determinis6c  

•  Single  instruc6on  •  Slow  due  to  recryp6on  and  bitwise  

cryptosystem  CryptoBlaze  Par$ally  Homomorphic    

•  Non-­‐determinis6c  •  “Fast”,  given  efficient  

server-­‐side  decryp6on  •  Mul6ple  instruc6ons  

•  Performance  highly  dependent  on  decryp6on  

•  Security:  Client-­‐server  communica6on  abuse  and  unencrypted  instruc6ons  

We have successfully built CryptoBlaze for security parameter sizes (b) ranging from 32 to 1024 bits. For each key size, we vary ALU sizes (k) to examine trade-off between maximum frequency achieved and number of cycles taken for running a program. Each security parameter size has its optimum ALU size. In general, 512 bits security key size gives optimum configuration for large security key size. All built hardware passed tests for correctness. Assuming that the decryption operation in server-side to compute branching is O(1), CryptoBlaze outperforms both HEROIC and FURISC as it supports multiple instructions (except for the case of 1024-bit b, as built failed for high ALU sizes). However, decryption process can be expensive in reality and thus become the most determining factor of CryptoBlaze performance. In terms of security, CryptoBlaze has some security advantages compared to HEROIC, as it utilises non-deterministic encryption which is more robust against security attacks. However, CryptoBlaze does not encrypt program memory, and client-server communication for branching can be abused to gain information about data. CryptoBlaze do not have security advantage against FURISC, but is expected to perform faster.

•  Homomorphic addition and subtraction are

supported on the basis of multicycle multiplier and divider (using hardware bit shifts and addition/subtraction), as these operations can be expensive for high bits data. The adders/subtractors in multiplication and division blocks are designed to be multicycle with customisable ALU size (k).

•  Branching is supported via client-server communication, where CryptoBlaze acts as client that sends encrypted data to a trusted server. The server is then responsible to determine whether the encrypted data corresponds corresponds to a negative number, and sends signal (0/1) back to CryptoBlaze. CryptoBlaze will decide whether to do branching based on this received signal. The server can determine negative data by Paillier decryption process, since server is trusted and has access to the private key.