25
Institute of Microelectronics, Tsinghua University. Highly Efficient Architecture of NewHope-NIST on FPGA using Low-Complexity NTT/INTT Neng Zhang, Bohan Yang, Chen Chen, Shouyi Yin, Shaojun Wei and Leibo Liu Institute of Microelectronics, Tsinghua University, Beijing, China CHES 2020

Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University.

Highly Efficient Architecture of NewHope-NIST

on FPGA using Low-Complexity NTT/INTT

Neng Zhang, Bohan Yang, Chen Chen,

Shouyi Yin, Shaojun Wei and Leibo Liu

Institute of Microelectronics, Tsinghua University, Beijing, China

CHES 2020

Page 2: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 2

Outline

1. Introduction

2. Low-Complexity NTT/INTT

3. Hardware Architecture

4. Implementation Results

Page 3: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 3

1 Introduction

NewHope-USENIX NewHope-Simple NewHope-NIST

NewHope: a PQC algorithm for key encapsulation mechanism (KEM)

A candidate in the 2nd round of NIST PQC standardization process, but not in the 3rd round

Crystals-Dilithium

qTesla Falcon LTV BFVPQC FHE

Low-complexity NTT/INTT can be utilized by other algorithms.

Page 4: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 4

1 Introduction

Main mathematical objects of NewHope

Encryption-based KEM

polynomials over the ring ℝ𝒒 = 𝕫𝒒 𝒙 / 𝒙𝑵 + 𝟏

q 12289 𝝎𝑵 Primitive N-th root of unit over 𝑍𝑞

N 1024 or 512 𝜸𝟐𝑵 Square root of 𝜔𝑁

Key Generation 2 NTTs

Encryption 2 NTTs, 1 INTT

Decryption 1 INTT

Page 5: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 5

➢ f(x) is arbitrary

➢Convolution theory

➢q≡1 (𝑚𝑜𝑑 𝑁)

➢ f(x) = xN+1

➢ Negative Wrapped Convolution (NWC)

➢q≡1 (𝑚𝑜𝑑 2𝑁)

Multiplication over the ring Zq[x]/f(x)

1 Introduction

Page 6: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 6

area speed Low-complexity

Low area

High speed

Why do we need low-complexity ?

1 Introduction

Page 7: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 7

2.1 Low-Complexity NTT

Cost of the pre-processing is considerable

[1] S. Roy, et al., Compact ring-lwe cryptoprocessor. CHES 2014

Low-Complexity NTT

➢A low-complexity NTT with twiddle factors computed on-the-fly [1].➢Merge the pre-processing into the DIT FFT with twiddle factors pre-computed.

(N/2) log N + N

FFT pre-processing

Number of modular multiplications of NTT

Page 8: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 8

Derivation of the low-complexity NTT

➢Inspired by the strategy of the Cooley-Turkey FFT ➢Follow the divide-and-conquer method of FFT that divides in time domain (DIT)

➢First, the pre-processing and the FFT are written together as a summation of N items

➢Second, the summation is split into two groups according to parity of the index of a

2.1 Low-Complexity NTT

Page 9: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 9

Derivation of the low-complexity NTT

➢Third, the equation is grouped into two parts according to the size of index i.

➢In this way, N-point NTT can be resolved with two N/2-point NTTs

2.1 Low-Complexity NTT

ො𝑎𝑖(0)

and ො𝑎𝑖(1)

are N/2-point NTTs

of 𝑎2𝑗 and 𝑎2𝑗+1

N-point NTT

N/2-point NTT

N/2-point NTT

N/4-point NTT

N/4-point NTT

N/4-point NTT

N/4-point NTT

2-point NTT

2-point NTT

……

Page 10: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 10

2.1 Low-Complexity NTT

Dataflow of a 8-point low-complexity NTT

Butterfly of low-complexity NTT

Page 11: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 11

2.1 Low-Complexity NTT

𝜔 = 𝜔𝑁𝑗𝑁/𝑚

No additional timing cost;

No additional hardware resources cost

In classic FFT:Computational complexity:

(N/2) log N + N → (N/2) log N

Page 12: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 12

2.2 Low-Complexity INTT

Cost of the post-processing is greater than pre-processing

[1] T. Pöppelmann, et al., High-performance ideal lattice-based cryptography on 8-bit atxmega microcontrollers. LATINCRYPT 2015

Low-Complexity INTT

➢[1] merges the scaling of 𝜆2𝑁−𝑖 into the FFT.

➢Further merge the scaling of N−1 into the FFT

(N/2) log N + 2N

FFT post-processing

Number of modular multiplications of NTT and INTT

Page 13: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 13

Derivation of the low-complexity INTT

➢Inspired by the strategy of the Gentleman-Sande FFT ➢Follow the divide-and-conquer method of FFT that divides in frequency domain (DIF)

➢First, the post-processing and the FFT are written together as a summation of N items

➢Second, the summation is split into two groups according to the size of index of ො𝑎

2.2 Low-Complexity INTT

Page 14: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 14

Derivation of the low-complexity INTT

➢Third, the equation is grouped into two parts according to the parity of i.

➢In this way, N-point INTT can be resolved with two N/2-point INTTs

2.2 Low-Complexity INTT

𝑎2𝑖 and 𝑎2𝑖+1 correspond to N/2-

point INTT of 𝑏𝑖(0)

and 𝑏𝑖(1)

N-point INTT

N/2-point INTT

N/2-point INTT

N/4-point INTT

N/4-point INTT

N/4-point INTT

N/4-point INTT

2-point INTT

2-point INTT

……

Page 15: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 15

2.2 Low-Complexity INTT

Dataflow of a 8-point low-complexity INTT

Butterfly of low-complexity INTT

Page 16: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 16

2.2 Low-Complexity INTT

𝜔 = 𝜔𝑁−𝑗𝑁/𝑚

No additional timing cost;

slightly modify the butterfly unit

𝑢 + 𝑡

𝑢 − 𝑡

In classic FFT: Computational complexity:

(N/2) log N + 2N → (N/2) log N

Page 17: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 17

3 The Hardware Architecture

The architecture of NTT/INTT Multi-bank memory

➢Address generator [1] :

➢Log N: Even √ Odd ╳➢The execution order of the

last s-loop is rearranged as :

[1] W. Wang, et al., VLSI design of a large number multiplier for fully homomorphic encryption.

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(9):1879–1887, Sept 2014.

Page 18: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 18

Compact Butterfly Unit

3 The Hardware Architecture

Page 19: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 19

3 The Hardware Architecture

No additional multiplication;

Time-constant

Low-Complexity Modular Multiplication

Page 20: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 20

3 The Hardware Architecture

➢Support: key generation, encryption and decryption

➢ Doubled bandwidth matching

➢ RAM (R0, R1): two data in an address

The architecture of NewHope-NIST

Page 21: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 21

3 The Hardware Architecture

Timing hiding

➢Resource conflict

➢data dependency A RAM may be read and write by operations in the same line.

Page 22: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 22

4 Implementation Results

Implementation platform

➢Xilinx Artix-7 FPGA

➢Vivado 2019.1.1

Implementation Results of NTT/INTT

0

20

40

60

80

100

120

Time(us)

0

50

100

150

200

250

ATP(LUT x ms)

0

10

20

30

40

50

60

70

ATP(FF x ms)

0

500

1000

1500

2000

2500

3000

ATP(DSP x us)

0

50

100

150

200

250

300

350

ATP(BRAM x us)

Ours[FS19][KLC+7][JGCS19][FSM+19][BUC19b]

Page 23: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 23

4 Implementation Results

Implementation Results of NewHope-NISTOurs [JGCS19-1] [JGCS19-2] [buc19b] [FSM+19]

0

500

1000

1500

2000

2500

3000

3500

KeyGen+Decrypt Encrypt

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

ATP(LUT x ms)

ATP(FF x ms)

ATP(DSP x us)

ATP(BRAM x us)

0

2000

4000

6000

8000

10000

12000

14000

16000

LUTs FFs

0

5

10

15

20

25

30

DSPs BRAMs

Time(us)

Page 24: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 24

Conclusion

Low-complexity NTT/INTT

➢NTT: no pre-processing

➢INTT: no post-processing

A highly efficient architecture of NewHope-NIST

➢A clear advantage in both speed and ATP

Low-complexity NTT/INTT can benefit other NTT-inside algorithms

Page 25: Highly Efficient Architecture of NewHope-NIST on FPGA using Low … · 2020. 9. 7. · Institute of Microelectronics, Tsinghua University. 7 2.1 Low-Complexity NTT Cost of the pre-processing

Institute of Microelectronics, Tsinghua University. 25

Thanks!