RNS Arithmetic Approach in Lattice-based...

22nd IEEE Symposium on Computer Arithmetic

RNS Arithmetic Approach in Lattice-based CryptographyAccelerating the ”Rounding-off” Core Procedure

Jean-Claude Bajard�, Julien Eynard�

Nabil Merkiche�:, Thomas Plantard;

�Sorbonne Universites, UPMC Univ Paris 06, CNRS, LIP6 UMR 7606, France:DGA/MI, Rennes, France

;University of Wollongong, CCISR, Wollongong, Australia

June 23rd, 2015

Bajard, Eynard, Merkiche, Plantard RNS Arithmetic Approach in Lattice-based Cryptography 1 / 20

Context & Motivation

Lattice-based cryptography (LBC)

post-quantum security

homomorphic encryption properties

average-case to worst-case reductions

scalar products, vector-matrix products, with huge dimensions

Why Residue Number Systems (RNS) ?

natural and easy concurrency for basic operations

easy scalability

natural matching with GPU, multi-core CPU, FPGA features

Ñ optimization of LBC primitives at the arithmetical level ?here, focus on Babai’s round-off algorithm

Outline

Essentials about RNS & lattices

Closest vector problem & Round-off algorithm

Round-off and RNS arithmetic

Considerations about FPGA implementation

Conclusion

Essentials

Residue Number Systems (RNS)

Essentials

Lattices

(full-rank) lattice L : discrete additive subgroup of R` ù ”regular grid”

L � r1Z` . . .` r`Z, r1, . . . , r` independant vectors of R`

matrix R � pr1, . . . , r`qᵀ : a basis of L (for ` ¥ 2, infinite number of basis)

Closest Vector Problem (CVP) : given c P Z`, compute v P L suchthat }c� v} ¤ }c� z} for all z P L

Solving the CVP

with Babai’s Round-off algorithm, given a basis R of Lchange of basis Ñ rounding components Ñ return to canonical basis

c� R�1 tc� R�1s tc� R�1s� RL Z` L

Solving the CVP

Cryptographic interest of CVP

hard to find a close vector via a ”bad” basis B of Lhard to compute a ”good” basis from a bad one

GGH-like cryptosystem (1997)

public key : bad basis, private key : good basis

plaintext + lattice vector = ciphertext (GGH, 1997)

deciphering : solving CVP (through round-off algorithm)

Adapting the round-off to RNS arithmetic

Common simplification step

c � tcR�1sR� p with p P Z` X �p�12 ,

12q` � R

Ñ Babai’s condition : σρR 1

2with }p}8 ¤ σ and max

1¤j¤`

i�1|pR�1qi ,j |

tcR�1s mod mσ with mσ ¥ 2σ � 1 ñ p � pc� tcR�1sRqmodcmσ

Ñ just need to compute tcR�1s mod mσ

Problems

tcR�1s : rational expression and round-off function

Solutions

R�1 � R1

d , d � det R P Z and R1 � ComatpRqᵀ P Z`�`

t ab s � t ab � 12 u � 2a�b�|2a�b|2b

2b exact division : doable in RNS

tcR�1s � 2cR1 � d� |2cR1 � d|2d2d

, d � pd , . . . , dq

New problem

complete modular reduction |2cR1 � d|2d in RNS ?

Efficient RNS Montgomery modular reduction

precomputations : R P v0, 2dv`2, d P v0, 2dv`

RNS base B with size M �±mPB m ¡ }cR� d}8{2d � }c}1

What we obtain

RNS reduction gives : |2cR1 � d|2d � 2d � e

finally we compute 2cR1�d�|2cR1�d|2d�2d�e2d � tcR�1s� e

how to correct e ?

Hybrid representation RNS-Mixed Radix System (previous work)

burdensome RNS-to-MRS conversion (intrinsically sequential)

large RNS base B1 : M 1 ¡ pn � 1q � 2d ¥ |2cR1 � d|2d � 2de

Ñ how to do better ? (i.e. pure RNS approach)

New strategy to correct the error vector e P t0, . . . , nu`

do not focus on |2cR1 � d|2d � 2d � e but on the whole formula :

2cR1 � d� |2cR1 � d|2d � 2d � e

2d� tcR�1s� e

idea : γ P Z such that ptcR�1s� eq mod γ � p�eq mod γ ù e ?(γ enabling to extract the error)

to recover e from p�eq mod γ : easy, take γ ¡ n ¥ }e}8to guarantee tcR�1s � 0 mod γ whatever c is... no reason to happen !

Keep going...

Ñ compute tγcR�1s � 2γcR1�d�|2γcR1�d|2d2d and see what happens :

1 uncomplete reduction |2γcR1 � d|2d � 2d � e gives tγcR�1s� e

2 we can write tγcR�1s � γtcR�1s� tγpR�1s

then we obtain :

tγcR�1s� e � γtcR�1s� tγpR�1s� e

New strategy : correcting the global error

ptγcR�1s� eq mod γ ù ptγpR�1s� eq mod γ

γ large enough gives : ptγpR�1s� eq mod γ ùtγpR�1s� e

recall : σρR 1{2 ô σρR ¤ 12 � ε for correct rounding

Ñ size of γ depends on ε : γ � nε�1 (n � CardpBq)Bajard, Eynard, Merkiche, Plantard RNS Arithmetic Approach in Lattice-based Cryptography 14 / 20

Final full RNS algorithm

Completely in RNS if γ is a 1-modulus RNS base

Ñ in practice, size of modulus determined by hardware (e.g. 18 for someFPGA multipliers, 32/64 bits on CPU, etc)

Examples of binary size of acceptable γ’s

for 200 basis R Ð 4r?`sI� randpv�4,�4w`2q and }p}8 ¤ 3 (GGH

challenges) and moduli of B having binary size ω

` ω 11 12 13 14 15 16 17 18 19 20

20018 0 12 46 44 46 32 10 6 2 232 6 48 45 47 33 11 6 2 2 0

30018 0 0 29 51 68 28 13 4 7 032 0 20 55 63 37 12 5 7 1 0

40018 0 15 141 33 7 3 0 1 0 032 4 134 50 8 3 0 1 0 0 0

Conclusions about new acceleration technique

vs RNS-MRS approach

γ depends on basis R ; worst-case : γ � det R ù case RNS-MRS

B1 replaced by γ : -50% precomputations, -55/60% elementarymodular multiplications (no more RNS-to-MRS conv.)

fast RNS base conversion : straightforward parallelization and scaling

Ñ tcR�1s mod mσ in `2 � 2n` concurrent steps in RNS channels(n � CardpBq � log }c}1)

vs multi-precision arithmetic (theoretical analysis)

precomputations (vs R�1 with sufficient precision) : � �2%(` � 256), � �0.5% (` � 1024) memory overhead

number of word-based multiplications : RNS � Karatsuba,Toom-Cook complexities

straightforward concurrency + single-precision arithmetic

Towards an FPGA implementation ?

Why FPGA

cheap, flexible, natural fitting with concurrency properties of RNS

previously successfully used for RNS finite field arithmetic

Principle of RNS architecture on FPGA

”Rower” unit : computesk°

i�1aibi mod mj (core computation in fast RNS

base conversion, and vector-matrix products)

Towards an FPGA implementation ?

Specific features

1 unit for γ : computation of centered remainder mod γ

(γ � 2θ�1 � 1 ù comparing to tγ2 u = checking θth bit)

1 unit for mσ : mσ other moduli

Results of analysis for ` P t64, 128u

analysis for worst-case : det R P Op2` log `q (Hadamard’s bound)

full RNS round-off CVP : 2`2 � 2n`� 13`� 6 cyclesù e.g. � 20µs for ` � 64 on 468 MHz Kintex-7

memory bottleneck : for ` � 64, � 1.7 Mbit (ok) ; for ` � 128, �15.5 Mbit (not enough BRAM)

Conclusion & Future work

Conclusion

optimized CVP algorithm : c� tcR�1sR in 2`2 �Op` log }c}1qconcurrent steps in small rings Z{miZimplementation on FPGA : memory bottleneck, even for notcryptographic dimensions of lattice

Future work

Beyond this first step...

implementation on several architectures (GPU, multi-core CPU,clusters of FPGA, etc)

identify other bottlenecks in LBC which could be accelerated throughtools from computer arithmetic

Thank You !

Questions ?jean-claude.bajard@lip6.fr

julien.eynard@lip6.frnabil.merkiche@intradef.gouv.fr

thomaspl@uow.edu.au

Appendix

GGH like cryptosystem

Appendix

Efficient RNS Montgomery modular reduction

requires an RNS base BMontgomery representations : R � |2M � R1|2d , d � |M � d|2dRNS base B with size M �±

mPB m ¡ }cR� d}8{2d � }c}1

RNS Arithmetic Approach in Lattice-based...

Documents

RNS System Patient Manual - NeuroPace, Inc · RNS® System Patient Manual For the RNS® Neurostimulator Model RNS-300M and Model RNS-320 You should have two manuals, both this manual

RN / RNS Spec Sheet · Title: RN / RNS Spec Sheet Author: Rathi North America

IMPLEMENTATION OF FLOATING POINT MAC USING … · (MAC) unit. Residue Number System (RNS) gained popularity in the implementation of fast arithmetic and fault-tolerant computing applications

CSE 246: Computer Arithmetic Algorithms and Hardware Design Numbers: RNS, DBNS, Montgomory Prof Chung-Kuan Cheng Lecture 3

TOYOTA REPORT - RNS Submit

RNS System Physician Manual - neuropace.com · About this Manual This manual includes instructions for use for the RNS® System including: • The RNS® Neurostimulator model RNS-320

RNS Number : 6809G

QUARTERLY REPORT - rns-pdf.londonstockexchange.com

RNS Project Management

Arithmetic Mean & Arithmetic Series

Cataloge RNS-USA 2011

Rns 510 install

PUBLIC - rns-pdf.londonstockexchange.com

MENA - rns-pdf.londonstockexchange.com

2012 - rns-pdf.londonstockexchange.com

RNS sample student projects

Siemens RnS

CENTRICA plc - RNS Submit

Repetitive Nerve Stimulation (RNS)

Mentoring - RNS