88
Compact Dilithium on Cortex M3 and Cortex M4 Denisa Greconici Matthias Kannwischer Daan Sprenkels 30 October 2020 Institute for Computing and Information Sciences – Digital Security Radboud University Nijmegen

Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Compact Dilithium on Cortex M3 and Cortex M4

Denisa Greconici Matthias Kannwischer Daan Sprenkels

30 October 2020

Institute for Computing and Information Sciences – Digital SecurityRadboud University Nijmegen

Page 2: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Table of contents

1. Introduction

2. Constant time multiplications on Cortex-M3

3. Optimizing performance

4. Optimization memory

5. Results

6. Conclusion

1

Page 3: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Introduction

Page 4: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

NIST Post-Quantum Standardization Competition

I 2016 – NIST calls for proposals for PQC algorithms

• Key Encapsulation Mechanisms (KEMs) and Digital Signatures

I 2017 – 69 candidates qualified for round 1

I 2019 – 26 candidates qualified for round 2

I July 2020 – round 3 candidates announced

• 7 finalists

I KEMs (Classic McEliece, Kyber, NTRU and Saber)I Signatures (Dilithium, Falcon, and Rainbow)

• 8 alternative schemes

I KEMs (BIKE, FrodoKEM, HQC, NTRU Prime, SIKE )I Signatures (GeMSS, Picnic, SPHINCS+)

2

Page 5: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

NIST Post-Quantum Standardization Competition

I 2016 – NIST calls for proposals for PQC algorithms

• Key Encapsulation Mechanisms (KEMs) and Digital Signatures

I 2017 – 69 candidates qualified for round 1

I 2019 – 26 candidates qualified for round 2

I July 2020 – round 3 candidates announced

• 7 finalists

I KEMs (Classic McEliece, Kyber, NTRU and Saber)I Signatures (Dilithium, Falcon, and Rainbow)

• 8 alternative schemes

I KEMs (BIKE, FrodoKEM, HQC, NTRU Prime, SIKE )I Signatures (GeMSS, Picnic, SPHINCS+)

2

Page 6: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

NIST Post-Quantum Standardization Competition

I 2016 – NIST calls for proposals for PQC algorithms

• Key Encapsulation Mechanisms (KEMs) and Digital Signatures

I 2017 – 69 candidates qualified for round 1

I 2019 – 26 candidates qualified for round 2

I July 2020 – round 3 candidates announced

• 7 finalists

I KEMs (Classic McEliece, Kyber, NTRU and Saber)I Signatures (Dilithium, Falcon, and Rainbow)

• 8 alternative schemes

I KEMs (BIKE, FrodoKEM, HQC, NTRU Prime, SIKE )I Signatures (GeMSS, Picnic, SPHINCS+)

2

Page 7: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

NIST Post-Quantum Standardization Competition

I 2016 – NIST calls for proposals for PQC algorithms

• Key Encapsulation Mechanisms (KEMs) and Digital Signatures

I 2017 – 69 candidates qualified for round 1

I 2019 – 26 candidates qualified for round 2

I July 2020 – round 3 candidates announced

• 7 finalists

I KEMs (Classic McEliece, Kyber, NTRU and Saber)I Signatures (Dilithium, Falcon, and Rainbow)

• 8 alternative schemes

I KEMs (BIKE, FrodoKEM, HQC, NTRU Prime, SIKE )I Signatures (GeMSS, Picnic, SPHINCS+)

2

Page 8: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

NIST Post-Quantum Standardization Competition

I 2016 – NIST calls for proposals for PQC algorithms

• Key Encapsulation Mechanisms (KEMs) and Digital Signatures

I 2017 – 69 candidates qualified for round 1

I 2019 – 26 candidates qualified for round 2

I July 2020 – round 3 candidates announced

• 7 finalists

I KEMs (Classic McEliece, Kyber, NTRU and Saber)I Signatures (Dilithium, Falcon, and Rainbow)

• 8 alternative schemes

I KEMs (BIKE, FrodoKEM, HQC, NTRU Prime, SIKE )I Signatures (GeMSS, Picnic, SPHINCS+)

2

Page 9: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

NIST Post-Quantum Standardization Competition

I 2016 – NIST calls for proposals for PQC algorithms

• Key Encapsulation Mechanisms (KEMs) and Digital Signatures

I 2017 – 69 candidates qualified for round 1

I 2019 – 26 candidates qualified for round 2

I July 2020 – round 3 candidates announced

• 7 finalists

I KEMs (Classic McEliece, Kyber, NTRU and Saber)I Signatures (Dilithium, Falcon, and Rainbow)

• 8 alternative schemes

I KEMs (BIKE, FrodoKEM, HQC, NTRU Prime, SIKE )I Signatures (GeMSS, Picnic, SPHINCS+)

2

Page 10: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

NIST Post-Quantum Standardization Competition

I 2016 – NIST calls for proposals for PQC algorithms

• Key Encapsulation Mechanisms (KEMs) and Digital Signatures

I 2017 – 69 candidates qualified for round 1

I 2019 – 26 candidates qualified for round 2

I July 2020 – round 3 candidates announced

• 7 finalistsI KEMs (Classic McEliece, Kyber, NTRU and Saber)I Signatures (Dilithium, Falcon, and Rainbow)

• 8 alternative schemesI KEMs (BIKE, FrodoKEM, HQC, NTRU Prime, SIKE )I Signatures (GeMSS, Picnic, SPHINCS+)

2

Page 11: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Dilithium

I Signature scheme

I Part of CRYSTALS (with Kyber)

I One of the 3rd round finalists

I Fiat-Shamir with aborts

I Module-LWE and Module-SIS

I Small keys and signatures

I Operates in the polynomial ring Rq = Zq[X ]/(X 256 + 1), with q = 8380417⇒ Allows efficient polynomial multiplication with NTT

I 4 security levels (3 of them target NIST security levels 1-3)

3

Page 12: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Dilithium

I Signature scheme

I Part of CRYSTALS (with Kyber)

I One of the 3rd round finalists

I Fiat-Shamir with aborts

I Module-LWE and Module-SIS

I Small keys and signatures

I Operates in the polynomial ring Rq = Zq[X ]/(X 256 + 1), with q = 8380417⇒ Allows efficient polynomial multiplication with NTT

I 4 security levels (3 of them target NIST security levels 1-3)

3

Page 13: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Dilithium

I Signature scheme

I Part of CRYSTALS (with Kyber)

I One of the 3rd round finalists

I Fiat-Shamir with aborts

I Module-LWE and Module-SIS

I Small keys and signatures

I Operates in the polynomial ring Rq = Zq[X ]/(X 256 + 1), with q = 8380417⇒ Allows efficient polynomial multiplication with NTT

I 4 security levels (3 of them target NIST security levels 1-3)

3

Page 14: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Dilithium

I Signature scheme

I Part of CRYSTALS (with Kyber)

I One of the 3rd round finalists

I Fiat-Shamir with aborts

I Module-LWE and Module-SIS

I Small keys and signatures

I Operates in the polynomial ring Rq = Zq[X ]/(X 256 + 1), with q = 8380417⇒ Allows efficient polynomial multiplication with NTT

I 4 security levels (3 of them target NIST security levels 1-3)

3

Page 15: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Dilithium

I Signature scheme

I Part of CRYSTALS (with Kyber)

I One of the 3rd round finalists

I Fiat-Shamir with aborts

I Module-LWE and Module-SIS

I Small keys and signatures

I Operates in the polynomial ring Rq = Zq[X ]/(X 256 + 1), with q = 8380417⇒ Allows efficient polynomial multiplication with NTT

I 4 security levels (3 of them target NIST security levels 1-3)

3

Page 16: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

The Number-Theoretic Transform (NTT)

I Fast Fourier Transform (FFT) in finite field

I Let g = g0 + g1X + ...+ gn−1Xn−1, polynomial in Rq

I Representation of polynomial g :

• By its coefficients: g0, g1...gn−1

• By evaluating g at the powers of the n’th primitive root of unity:g(ω0), g(ω1)...g(ωn−1)

I Formal definition of the NTT in Dilithium

• g = NTT (g) =∑n−1

i=0 giXi , with gi =

∑n−1j=0 ψ

jgjωij ; and

• g = INTT (g) =∑n−1

i=0 giXi , with gi = n−1ψ−i

∑n−1j=0 gjω

−ij .

I Polynomial Multiplication in Rq

a · b = INTT(NTT(a) ◦ NTT(b))

4

Page 17: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

The Number-Theoretic Transform (NTT)

I Fast Fourier Transform (FFT) in finite field

I Let g = g0 + g1X + ...+ gn−1Xn−1, polynomial in Rq

I Representation of polynomial g :

• By its coefficients: g0, g1...gn−1

• By evaluating g at the powers of the n’th primitive root of unity:g(ω0), g(ω1)...g(ωn−1)

I Formal definition of the NTT in Dilithium

• g = NTT (g) =∑n−1

i=0 giXi , with gi =

∑n−1j=0 ψ

jgjωij ; and

• g = INTT (g) =∑n−1

i=0 giXi , with gi = n−1ψ−i

∑n−1j=0 gjω

−ij .

I Polynomial Multiplication in Rq

a · b = INTT(NTT(a) ◦ NTT(b))

4

Page 18: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

The Number-Theoretic Transform (NTT)

I Fast Fourier Transform (FFT) in finite field

I Let g = g0 + g1X + ...+ gn−1Xn−1, polynomial in Rq

I Representation of polynomial g :

• By its coefficients: g0, g1...gn−1

• By evaluating g at the powers of the n’th primitive root of unity:g(ω0), g(ω1)...g(ωn−1)

I Formal definition of the NTT in Dilithium

• g = NTT (g) =∑n−1

i=0 giXi , with gi =

∑n−1j=0 ψ

jgjωij ; and

• g = INTT (g) =∑n−1

i=0 giXi , with gi = n−1ψ−i

∑n−1j=0 gjω

−ij .

I Polynomial Multiplication in Rq

a · b = INTT(NTT(a) ◦ NTT(b))

4

Page 19: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Dilithium simplified

5

Page 20: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Dilithium simplified

5

Page 21: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Target platforms

I Arm Cortex M4(STM32F407-DISCOVERY)

• NIST choice for PQC• 32-bit, ARMv7e-M• 1 MiB ROM, 196 KB RAM, 168 MHz• 32-bit multiplications in 1 cycle

(UMULL, SMULL, UMLAL, SMLAL)

I Arm Cortex M3 (AtmelSAM3X8E )

• Arduino Due• 32-bit, ARMv7-M• 512 KiB Flash, 96 KB RAM, 84 MHz• Variable time 32-bit multiplications !

6

Page 22: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Target platforms

I Arm Cortex M4(STM32F407-DISCOVERY)

• NIST choice for PQC• 32-bit, ARMv7e-M• 1 MiB ROM, 196 KB RAM, 168 MHz• 32-bit multiplications in 1 cycle

(UMULL, SMULL, UMLAL, SMLAL)

I Arm Cortex M3 (AtmelSAM3X8E )

• Arduino Due• 32-bit, ARMv7-M• 512 KiB Flash, 96 KB RAM, 84 MHz• Variable time 32-bit multiplications !

6

Page 23: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Target platforms

I Arm Cortex M4(STM32F407-DISCOVERY)

• NIST choice for PQC• 32-bit, ARMv7e-M• 1 MiB ROM, 196 KB RAM, 168 MHz• 32-bit multiplications in 1 cycle

(UMULL, SMULL, UMLAL, SMLAL)

I Arm Cortex M3 (AtmelSAM3X8E )

• Arduino Due• 32-bit, ARMv7-M• 512 KiB Flash, 96 KB RAM, 84 MHz• Variable time 32-bit multiplications !

6

Page 24: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Target platforms

I Arm Cortex M4(STM32F407-DISCOVERY)

• NIST choice for PQC• 32-bit, ARMv7e-M• 1 MiB ROM, 196 KB RAM, 168 MHz• 32-bit multiplications in 1 cycle

(UMULL, SMULL, UMLAL, SMLAL)

I Arm Cortex M3 (AtmelSAM3X8E )

• Arduino Due• 32-bit, ARMv7-M• 512 KiB Flash, 96 KB RAM, 84 MHz• Variable time 32-bit multiplications !

6

Page 25: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

UMULL on M3

1Based on the Master thesis of [dG15].

7

Page 26: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Constant time multiplicationson Cortex-M3

Page 27: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Overcoming the non-constant time multiplications

I Variable time 32-bit multiplications

• But, 16-bit multipliers are constant timeMUL, MLS – 1 cycle; MLA – 2 cycles

I Our solution: use 16-bit multipliers⇒ represent the 32-bit values in radix 216

• Let a = 216a1 + a0 and b = 216b1 + b0with 0 ≤ a0, b0 < 216 and −215 ≤ a1, b1 < 215

• Then ab = 232a1b1 + 216(a0b1 + a1b0) + a0b0,with −231 ≤ aibj < 231

8

Page 28: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Overcoming the non-constant time multiplications

I Variable time 32-bit multiplications

• But, 16-bit multipliers are constant timeMUL, MLS – 1 cycle; MLA – 2 cycles

I Our solution: use 16-bit multipliers⇒ represent the 32-bit values in radix 216

• Let a = 216a1 + a0 and b = 216b1 + b0with 0 ≤ a0, b0 < 216 and −215 ≤ a1, b1 < 215

• Then ab = 232a1b1 + 216(a0b1 + a1b0) + a0b0,with −231 ≤ aibj < 231

8

Page 29: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Overcoming the non-constant time multiplications

I Variable time 32-bit multiplications

• But, 16-bit multipliers are constant timeMUL, MLS – 1 cycle; MLA – 2 cycles

I Our solution: use 16-bit multipliers⇒ represent the 32-bit values in radix 216

• Let a = 216a1 + a0 and b = 216b1 + b0with 0 ≤ a0, b0 < 216 and −215 ≤ a1, b1 < 215

• Then ab = 232a1b1 + 216(a0b1 + a1b0) + a0b0,with −231 ≤ aibj < 231

8

Page 30: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Overcoming the non-constant time multiplications

I Variable time 32-bit multiplications

• But, 16-bit multipliers are constant timeMUL, MLS – 1 cycle; MLA – 2 cycles

I Our solution: use 16-bit multipliers⇒ represent the 32-bit values in radix 216

• Let a = 216a1 + a0 and b = 216b1 + b0with 0 ≤ a0, b0 < 216 and −215 ≤ a1, b1 < 215

• Then ab = 232a1b1 + 216(a0b1 + a1b0) + a0b0,with −231 ≤ aibj < 231

8

Page 31: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Overcoming the non-constant time multiplications

I Variable time 32-bit multiplications

• But, 16-bit multipliers are constant timeMUL, MLS – 1 cycle; MLA – 2 cycles

I Our solution: use 16-bit multipliers⇒ represent the 32-bit values in radix 216

• Let a = 216a1 + a0 and b = 216b1 + b0with 0 ≤ a0, b0 < 216 and −215 ≤ a1, b1 < 215

• Then ab = 232a1b1 + 216(a0b1 + a1b0) + a0b0,with −231 ≤ aibj < 231

8

Page 32: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Schoolbook multiplication

9

Page 33: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

(slides handover)

9

Page 34: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Optimizing performance

Page 35: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Optimizing performance

(1) Applying the CRT

(2) {Unsigned => Signed} representation

(3) Merging layer

10

Page 36: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Applying the CRT1

1Based on [BCLv19].

11

Page 37: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Applying the CRT1

1Based on [BCLv19].

11

Page 38: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Applying the CRT1

1Based on [BCLv19].

11

Page 39: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Applying the CRT1

1Based on [BCLv19].

11

Page 40: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Applying the CRT1

1Based on [BCLv19].

11

Page 41: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Applying the CRT

I NTT has to work in Zqi/(X256 + 1)

⇒ choose qi NTT primes

I∏

i qi must be larger than coefficients in c!

I For Dilithium, need to split into 4 polynomials mod qi

I Unfortunately, this is slower than doing schoolbook

I But it might be useful for other platforms :)

12

Page 42: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Applying the CRT

I NTT has to work in Zqi/(X256 + 1)

⇒ choose qi NTT primes

I∏

i qi must be larger than coefficients in c!

I For Dilithium, need to split into 4 polynomials mod qi

I Unfortunately, this is slower than doing schoolbook

I But it might be useful for other platforms :)

12

Page 43: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Applying the CRT

I NTT has to work in Zqi/(X256 + 1)

⇒ choose qi NTT primes

I∏

i qi must be larger than coefficients in c!

I For Dilithium, need to split into 4 polynomials mod qi

I Unfortunately, this is slower than doing schoolbook

I But it might be useful for other platforms :)

12

Page 44: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Applying the CRT

I NTT has to work in Zqi/(X256 + 1)

⇒ choose qi NTT primes

I∏

i qi must be larger than coefficients in c!

I For Dilithium, need to split into 4 polynomials mod qi

I Unfortunately, this is slower than doing schoolbook

I But it might be useful for other platforms :)

12

Page 45: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

{Unsigned => Signed} representation

I Unsigned subtraction a− b overflows if a < b

I All subtractions are a− b ≡ (a+ Nq)− b to mitigate this

• Extra addition• Numbers grow faster ⇒ more reductions needed

I Signed representation is better! :)

• No extra addition• Numbers grow less ⇒ less reductions

13

Page 46: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

{Unsigned => Signed} representation

I Unsigned subtraction a− b overflows if a < b

I All subtractions are a− b ≡ (a+ Nq)− b to mitigate this

• Extra addition• Numbers grow faster ⇒ more reductions needed

I Signed representation is better! :)

• No extra addition• Numbers grow less ⇒ less reductions

13

Page 47: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

{Unsigned => Signed} representation

I Unsigned subtraction a− b overflows if a < b

I All subtractions are a− b ≡ (a+ Nq)− b to mitigate this

• Extra addition• Numbers grow faster ⇒ more reductions needed

I Signed representation is better! :)

• No extra addition• Numbers grow less ⇒ less reductions

13

Page 48: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

{Unsigned => Signed} representation

I Unsigned subtraction a− b overflows if a < b

I All subtractions are a− b ≡ (a+ Nq)− b to mitigate this

• Extra addition• Numbers grow faster ⇒ more reductions needed

I Signed representation is better! :)

• No extra addition• Numbers grow less ⇒ less reductions

13

Page 49: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

{Unsigned => Signed} representation

I Unsigned subtraction a− b overflows if a < b

I All subtractions are a− b ≡ (a+ Nq)− b to mitigate this

• Extra addition• Numbers grow faster ⇒ more reductions needed

I Signed representation is better! :)

• No extra addition• Numbers grow less ⇒ less reductions

13

Page 50: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Merging layers

I NTT (= FFT) recurses a binary tree

I Depth first: Many reloads of twiddle factors

I Breadth first: Many loads/spills of coefficients

I Go for hybrid approach, i.e., merging layers

14

Page 51: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Merging layers

I NTT (= FFT) recurses a binary tree

I Depth first: Many reloads of twiddle factors

I Breadth first: Many loads/spills of coefficients

I Go for hybrid approach, i.e., merging layers

14

Page 52: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Merging layers

I NTT (= FFT) recurses a binary tree

I Depth first: Many reloads of twiddle factors

I Breadth first: Many loads/spills of coefficients

I Go for hybrid approach, i.e., merging layers

14

Page 53: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Merging layers (visualisation)

15

Page 54: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Merging layers (visualisation)

15

Page 55: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Merging layers (visualisation)

15

Page 56: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Merging layers (visualisation)

15

Page 57: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Merging layers (impl)

I M4: Merge 2 layers

I M3 (constant-time): No merged layers

I M3 (leaktime): Merge 2 layers

16

Page 58: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Optimization memory

Page 59: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Three strategies

(1) Storing A in flash (realistic setting)

• Can read A from flash during signing• Needs extra flash space

(2) Storing A in SRAM (“vanilla” setting)

• Generate A once during signing• Needs extra SRAM space

(3) Streaming A and y (how small can we go?)

• No extra space needed• Likely to be very slow

17

Page 60: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Three strategies

(1) Storing A in flash (realistic setting)

• Can read A from flash during signing• Needs extra flash space

(2) Storing A in SRAM (“vanilla” setting)

• Generate A once during signing• Needs extra SRAM space

(3) Streaming A and y (how small can we go?)

• No extra space needed• Likely to be very slow

17

Page 61: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Three strategies

(1) Storing A in flash (realistic setting)

• Can read A from flash during signing• Needs extra flash space

(2) Storing A in SRAM (“vanilla” setting)

• Generate A once during signing• Needs extra SRAM space

(3) Streaming A and y (how small can we go?)

• No extra space needed• Likely to be very slow

17

Page 62: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Three strategies

(1) Storing A in flash (realistic setting)

• Can read A from flash during signing• Needs extra flash space

(2) Storing A in SRAM (“vanilla” setting)

• Generate A once during signing• Needs extra SRAM space

(3) Streaming A and y (how small can we go?)

• No extra space needed• Likely to be very slow

17

Page 63: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Stack optimization

18

Page 64: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Results

Page 65: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

How we measured

Measuring performance

I M4: Use systick timer

I M3: Use the DWT cycle counter (CYCCNT)

Measuring stack usage

(1) Fill the stack with sentinel values

(2) Run the algorithm

(3) Count how many sentinel bytes were overwritten

19

Page 66: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

How we measured

Measuring performance

I M4: Use systick timer

I M3: Use the DWT cycle counter (CYCCNT)

Measuring stack usage

(1) Fill the stack with sentinel values

(2) Run the algorithm

(3) Count how many sentinel bytes were overwritten

19

Page 67: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Results: NTT performance

NTT NTT−1 ◦

Dilithium

[GKOS18] constant-time M4 10 701 11 662 −This work constant-time M4 8 540 8 923 1 955This work variable-time M3 19 347 21 006 4 899This work constant-time M3 33 025 36 609 8 479

I On Cortex M4 we have a 25% improvement

I (Leaktime) operations on M3 are 2.3× – 2.5× slower

I Constant-time NTT 1.7× slower than leaktime

20

Page 68: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Results: NTT performance

NTT NTT−1 ◦

Dilithium

[GKOS18] constant-time M4 10 701 11 662 −This work constant-time M4 8 540 8 923 1 955This work variable-time M3 19 347 21 006 4 899This work constant-time M3 33 025 36 609 8 479

I On Cortex M4 we have a 25% improvement

I (Leaktime) operations on M3 are 2.3× – 2.5× slower

I Constant-time NTT 1.7× slower than leaktime

20

Page 69: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Results M4 strategy 1

Algorithm/strategy Params Work Speed [kcc] Stack [B]

KeyGen (1)Dilithium2 This work 2 267 7 916Dilithium3 This work 3 545 8 940Dilithium4 This work 5 086 9 964

Sign (1)

Dilithium2 [RGCB19, scen. 2] 3 640 –Dilithium2 This work 3 097 14 428Dilithium3 [RGCB19, scen. 2] 5 495 –Dilithium3 This work 4 578 17 628Dilithium4 [RGCB19, scen. 2] 4 733 –Dilithium4 This work 3 768 20 828

Verify

Dilithium2 This work 1 259 9 004Dilithium3 [GKOS18] 2 342 54 800Dilithium3 This work 1 917 10 028Dilithium4 This work 2 720 11 052

21

Page 70: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Results M4 strategy 2Algorithm/strategy Params Work Speed [kcc] Stack [B]

KeyGen (2 & 3)

Dilithium2 This work 1 315 7 916Dilithium3 [GKOS18] 2 320 50 488Dilithium3 This work 2 013 8 940Dilithium4 This work 2 837 9 964

Sign (2)

Dilithium2 [RGCB19, scen. 1] 4 632 –Dilithium2 This work 3 987 38 300Dilithium3 [GKOS18] 8 348 86 568Dilithium3 [RGCB19, scen. 1] 7 085 –Dilithium3 This work 6 053 52 756Dilithium4 [RGCB19, scen. 1] 7 061 –Dilithium4 This work 6 001 69 276

Verify

Dilithium2 This work 1 259 9 004Dilithium3 [GKOS18] 2 342 54 800Dilithium3 This work 1 917 10 028Dilithium4 This work 2 720 11 052 22

Page 71: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Results M4 strategy 3

Algorithm/strategy Params Work Speed [kcc] Stack [B]

KeyGen (2 & 3)

Dilithium2 This work 1 315 7 916Dilithium3 [GKOS18] 2 320 50 488Dilithium3 This work 2 013 8 940Dilithium4 This work 2 837 9 964

Sign (3)Dilithium2 This work 13 332 8 924Dilithium3 This work 23 550 9 948Dilithium4 This work 22 658 10 972

Verify

Dilithium2 This work 1 259 9 004Dilithium3 [GKOS18] 2 342 54 800Dilithium3 This work 1 917 10 028Dilithium4 This work 2 720 11 052

23

Page 72: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Results M3 strategy 1

Algorithm/strategy Params Speed [kcc] Stack [B]

KeyGen (1)Dilithium2 2 945 12 631Dilithium3 4 503 15 703Dilithium4 6 380 18 783

Sign (1)Dilithium2 5 822 14 869a

Dilithium3 8 730 18 083b

Dilithium4 7 398 18 083c

VerifyDilithium2 1 541 8 944Dilithium3 2 321 9 967Dilithium4 3 260 10 999

a Uses additional 23 632 bytes of flash space.b Uses additional 34 896 bytes of flash space.c Uses additional 48 208 bytes of flash space.

24

Page 73: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Results M3 strategy 2

Algorithm/strategy Params Speed [kcc] Stack [B]

KeyGen (2 & 3)Dilithium2 1 699 7 983Dilithium3 2 562 9 007Dilithium4 3 587 10 031

Sign (2)Dilithium2 7 115 39 503Dilithium3 10 667 53 959Dilithium4 10 031 70 463

VerifyDilithium2 1 541 8 944Dilithium3 2 321 9 967Dilithium4 3 260 10 999

25

Page 74: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Results M3 strategy 3

Algorithm/strategy Params Speed [kcc] Stack [B]

KeyGen (2 & 3)Dilithium2 1 699 7 983Dilithium3 2 562 9 007Dilithium4 3 587 10 031

Sign (3)Dilithium2 18 932 9 463Dilithium3 33 229 10 495Dilithium4 31 180 11 511

VerifyDilithium2 1 541 8 944Dilithium3 2 321 9 967Dilithium4 3 260 10 999

26

Page 75: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Performance results

Cortex M4

I New speed records! \o/

I 13%, 27%, and 18% speedup compared to [GKOS18]

I 14% – 20% speedup compared to [RGCB19]

Cortex M3

I New speed records1

I Signing: always need 40, 54, 70 kB of memory

I Signing: 24, 35, 48 kB can be flash instead of SRAM

I Keygen and Verify are always pretty cheap1We are the first implementation on M3 ;)

27

Page 76: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Performance results

Cortex M4

I New speed records! \o/

I 13%, 27%, and 18% speedup compared to [GKOS18]

I 14% – 20% speedup compared to [RGCB19]

Cortex M3

I New speed records1

I Signing: always need 40, 54, 70 kB of memory

I Signing: 24, 35, 48 kB can be flash instead of SRAM

I Keygen and Verify are always pretty cheap1We are the first implementation on M3 ;)

27

Page 77: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Performance results

Cortex M4

I New speed records! \o/

I 13%, 27%, and 18% speedup compared to [GKOS18]

I 14% – 20% speedup compared to [RGCB19]

Cortex M3

I New speed records1

I Signing: always need 40, 54, 70 kB of memory

I Signing: 24, 35, 48 kB can be flash instead of SRAM

I Keygen and Verify are always pretty cheap

1We are the first implementation on M3 ;)

27

Page 78: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Performance results

Cortex M4

I New speed records! \o/

I 13%, 27%, and 18% speedup compared to [GKOS18]

I 14% – 20% speedup compared to [RGCB19]

Cortex M3

I New speed records1

I Signing: always need 40, 54, 70 kB of memory

I Signing: 24, 35, 48 kB can be flash instead of SRAM

I Keygen and Verify are always pretty cheap1We are the first implementation on M3 ;)

27

Page 79: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Memory results

Cortex M4

I Keygen and Verify are always pretty cheap

I Generally need 40, 54, 70 kB of memory

I Strategy 1: 24, 35, 48 kB can be flash instead of SRAM

I Also can get signing to around 10 kB

I For a factor 3× – 4×, we save 39, 43, 58 kB

28

Page 80: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Memory results

Cortex M4

I Keygen and Verify are always pretty cheap

I Generally need 40, 54, 70 kB of memory

I Strategy 1: 24, 35, 48 kB can be flash instead of SRAM

I Also can get signing to around 10 kB

I For a factor 3× – 4×, we save 39, 43, 58 kB

28

Page 81: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Memory results

Cortex M4

I Keygen and Verify are always pretty cheap

I Generally need 40, 54, 70 kB of memory

I Strategy 1: 24, 35, 48 kB can be flash instead of SRAM

I Also can get signing to around 10 kB

I For a factor 3× – 4×, we save 39, 43, 58 kB

28

Page 82: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Conclusion

Page 83: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Links

Paper: https://dsprenkels.com/files/dilithium-m3.pdf

Code: https://github.com/dilithium-cortexm/dilithium-cortexm

Authors:

I Daan: https://dsprenkels.com

I Denisa: TBD

I Matthias: https://kannwischer.eu

29

Page 84: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

29

Page 85: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

Backup slides

Page 86: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

References i

Daniel J. Bernstein, Chitchanok Chuengsatiansup, Tanja Lange, and Christine vanVredendaal.NTRU Prime.Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS16],2019.available athttps://csrc.nist.gov/projects/post-quantum-cryptography/round-2-submissions.

Wouter de Groot.A performance study of X25519 on Cortex-M3 and M4, 2015.https://pure.tue.nl/ws/portalfiles/portal/47038543.

Page 87: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

References ii

Tim Güneysu, Markus Krausz, Tobias Oder, and Julian Speith.Evaluation of lattice-based signature schemes in embedded systems.In ICECS 2018, pages 385–388, 2018.https://www.seceng.ruhr-uni-bochum.de/media/seceng/veroeffentlichungen/2018/10/17/paper.pdf.

Vadim Lyubashevsky, Léo Ducas, Eike Kiltz, Tancrède Lepoint, Peter Schwabe, GregorSeiler, and Damien Stehlé.CRYSTALS-DILITHIUM.Submission to the NIST Post-Quantum Cryptography Standardization Project [NIS16],2019.available athttps://csrc.nist.gov/projects/post-quantum-cryptography/round-2-submissions.

Page 88: Compact Dilithium on Cortex M3 and Cortex M4 · 2020. 10. 30. · This work variable-time M3 19347 21006 4899 This work constant-time M3 33025 36609 8479 I OnCortexM4wehavea25%improvement

References iii

NIST Computer Security Division.Post-Quantum Cryptography Standardization, 2016.https://csrc.nist.gov/Projects/Post-Quantum-Cryptography.

Prasanna Ravi, Sourav Sen Gupta, Anupam Chattopadhyay, and Shivam Bhasin.Improving Speed of Dilithium’s Signing Procedure.In CARDIS 2019, volume 11833 of LNCS, pages 57–73. Springer, 2019.https://eprint.iacr.org/2019/420.pdf.