52
Solving Quadratic Equations with XL on Parallel Architectures Cheng Chen-Mou 1 , Chou Tung 2 , Ni Ru-Ben 2 , Yang Bo-Yin 2 1 National Taiwan University 2 Academia Sinica Taipei, Taiwan Leuven, Sept. 11, 2012

Solving Quadratic Equations with XL on Parallel Architectures

Embed Size (px)

Citation preview

Page 1: Solving Quadratic Equations with XL on Parallel Architectures

Solving Quadratic Equations with XLon Parallel Architectures

Cheng Chen-Mou1, Chou Tung2,Ni Ru-Ben2, Yang Bo-Yin2

1National Taiwan University2Academia SinicaTaipei, Taiwan

Leuven, Sept. 11, 2012

Page 2: Solving Quadratic Equations with XL on Parallel Architectures

Solving Quadratic Equations with XLon Parallel Architectures

Chen-Mou Cheng1, Tung Chou2,Ruben Niederhagen2, Bo-Yin Yang2

1National Taiwan University2Academia SinicaTaipei, Taiwan

Leuven, Sept. 11, 2012

Page 3: Solving Quadratic Equations with XL on Parallel Architectures

The XL algorithm

Some cryptographic systems can be attacked by solving a system ofmultivariate quadratic equations, e.g.:

I AES:I 8000 quadratic equations with 1600 variables over F2

(Courtois and Pieprzyk, 2002)I 840 sparse quadratic equations and 1408 linear equations over

3968 variables of F256 (Murphy and Robshaw, 2002)

I multivariate cryptographic systems, e.g. QUAD stream cipher(cryptanalysis by Yang, Chen, Bernstein, and Chen, 2007)

Page 4: Solving Quadratic Equations with XL on Parallel Architectures

The XL algorithm

I XL is an acronym for extended linearization:I extend a quadratic system by multiplying with appropriate

monomialsI linearize by treating each monomial as an independent variableI solve the linearized system

I special case of Gröbner basis algorithmsI first suggested by Lazard (1983)I reinvented by Courtois, Klimov, Patarin, and Shamir (2000)I alternative to Gröbner basis solvers like Faugère’s F4 (1999,

e.g., Magma) and F5 (2002) algorithms

Page 5: Solving Quadratic Equations with XL on Parallel Architectures

The XL algorithmFor b ∈ Nn denote by xb the monomial xb1

1 xb22 . . . xbn

n and by|b| = b1 + b2 + · · ·+ bn the total degree of xb.

given: finite field K = Fqsystem A of m multivariate quadratic equations:`1 = `2 = · · · = `m = 0, `i ∈ K [x1, x2, . . . , xn]

choose: operational degree D ∈ Nextend: system A to the system

R(D) = {xb`i = 0 : |b| ≤ D − 2, `i ∈ A}linearize: consider xd , d ≤ D a new variable

to obtain a linear systemMsolve: linear systemM

minimum degree D0 for reliable termination (Yang and Chen):

D0 := min{D : ((1− λ)m−n−1(1+ λ)m)[D] ≤ 0}

We use the Wiedemann algorithminstead of a Gauss solver. Thus, wedo not compute a complete Gröbnerbasis but distinguished solutions!

Page 6: Solving Quadratic Equations with XL on Parallel Architectures

The XL algorithmFor b ∈ Nn denote by xb the monomial xb1

1 xb22 . . . xbn

n and by|b| = b1 + b2 + · · ·+ bn the total degree of xb.

given: finite field K = Fqsystem A of m multivariate quadratic equations:`1 = `2 = · · · = `m = 0, `i ∈ K [x1, x2, . . . , xn]

choose: operational degree D ∈ N How?extend: system A to the system

R(D) = {xb`i = 0 : |b| ≤ D − 2, `i ∈ A}linearize: consider xd , d ≤ D a new variable

to obtain a linear systemMsolve: linear systemM

minimum degree D0 for reliable termination (Yang and Chen):

D0 := min{D : ((1− λ)m−n−1(1+ λ)m)[D] ≤ 0}

We use the Wiedemann algorithminstead of a Gauss solver. Thus, wedo not compute a complete Gröbnerbasis but distinguished solutions!

Page 7: Solving Quadratic Equations with XL on Parallel Architectures

The XL algorithmFor b ∈ Nn denote by xb the monomial xb1

1 xb22 . . . xbn

n and by|b| = b1 + b2 + · · ·+ bn the total degree of xb.

given: finite field K = Fqsystem A of m multivariate quadratic equations:`1 = `2 = · · · = `m = 0, `i ∈ K [x1, x2, . . . , xn]

choose: operational degree D ∈ N How?extend: system A to the system

R(D) = {xb`i = 0 : |b| ≤ D − 2, `i ∈ A}linearize: consider xd , d ≤ D a new variable

to obtain a linear systemMsolve: linear systemM

minimum degree D0 for reliable termination (Yang and Chen):

D0 := min{D : ((1− λ)m−n−1(1+ λ)m)[D] ≤ 0}

We use the Wiedemann algorithminstead of a Gauss solver. Thus, wedo not compute a complete Gröbnerbasis but distinguished solutions!

Page 8: Solving Quadratic Equations with XL on Parallel Architectures

The XL algorithmFor b ∈ Nn denote by xb the monomial xb1

1 xb22 . . . xbn

n and by|b| = b1 + b2 + · · ·+ bn the total degree of xb.

given: finite field K = Fqsystem A of m multivariate quadratic equations:`1 = `2 = · · · = `m = 0, `i ∈ K [x1, x2, . . . , xn]

choose: operational degree D ∈ N How?extend: system A to the system

R(D) = {xb`i = 0 : |b| ≤ D − 2, `i ∈ A}linearize: consider xd , d ≤ D a new variable

to obtain a linear systemMsolve: linear systemM How?

minimum degree D0 for reliable termination (Yang and Chen):

D0 := min{D : ((1− λ)m−n−1(1+ λ)m)[D] ≤ 0}

We use the Wiedemann algorithminstead of a Gauss solver. Thus, wedo not compute a complete Gröbnerbasis but distinguished solutions!

Page 9: Solving Quadratic Equations with XL on Parallel Architectures

The XL algorithmFor b ∈ Nn denote by xb the monomial xb1

1 xb22 . . . xbn

n and by|b| = b1 + b2 + · · ·+ bn the total degree of xb.

given: finite field K = Fqsystem A of m multivariate quadratic equations:`1 = `2 = · · · = `m = 0, `i ∈ K [x1, x2, . . . , xn]

choose: operational degree D ∈ N How?extend: system A to the system

R(D) = {xb`i = 0 : |b| ≤ D − 2, `i ∈ A}linearize: consider xd , d ≤ D a new variable

to obtain a linear systemMsolve: linear systemM How?

minimum degree D0 for reliable termination (Yang and Chen):

D0 := min{D : ((1− λ)m−n−1(1+ λ)m)[D] ≤ 0}

We use the Wiedemann algorithminstead of a Gauss solver. Thus, wedo not compute a complete Gröbnerbasis but distinguished solutions!

Page 10: Solving Quadratic Equations with XL on Parallel Architectures

The Wiedemann algorithm – the basic ideagiven: A ∈ KN×N

wanted: v ∈ KN such that Av = 0solution: compute minimal polynomial f of A of degree d : f (A) = 0

d∑i=0

fiAi = 0

d∑i=0

fiAiz = 0 choose z ∈ KN randomly

d∑i=1

fiAiz + f0z = 0

d∑i=1

fiAiz = 0 since f0 = 0

A ·

(d∑

i=1

fiAi−1z

)︸ ︷︷ ︸

=v

= 0

Page 11: Solving Quadratic Equations with XL on Parallel Architectures

The Wiedemann algorithm – the basic ideagiven: A ∈ KN×N

wanted: v ∈ KN such that Av = 0 How?solution: compute minimal polynomial f of A of degree d : f (A) = 0

d∑i=0

fiAi = 0

d∑i=0

fiAiz = 0 choose z ∈ KN randomly

d∑i=1

fiAiz + f0z = 0

d∑i=1

fiAiz = 0 since f0 = 0

A ·

(d∑

i=1

fiAi−1z

)︸ ︷︷ ︸

=v

= 0

Page 12: Solving Quadratic Equations with XL on Parallel Architectures

The Berlekamp–Massey algorithm

Given a linearly recurrent sequence

S = {ai}∞i=0, ai ∈ K ,

compute an annihilating polynomial f of degree d such that

d∑i=0

fiaj+i = 0, for all j ∈ N.

The Berlekamp–Massey algorithm requires the first 2 · d elementsof S as input and computes fi ∈ K , 0 ≤ i ≤ d .

ai = xAiz , x ∈ K 1×N

Page 13: Solving Quadratic Equations with XL on Parallel Architectures

The Berlekamp–Massey algorithm

Given a linearly recurrent sequence

S = {ai}∞i=0, ai ∈ K ,

compute an annihilating polynomial f of degree d such that

d∑i=0

fiaj+i = 0, for all j ∈ N.

The Berlekamp–Massey algorithm requires the first 2 · d elementsof S as input and computes fi ∈ K , 0 ≤ i ≤ d .

ai = xAiz , x ∈ K 1×N

Page 14: Solving Quadratic Equations with XL on Parallel Architectures

The block Wiedemann algorithmDue to Coppersmith (1994), three steps:

Input: A ∈ KN×N , parameters m, n ∈ N, κ ∈ N of sizeN/m + N/n + O(1).

BW1: Compute sequence {ai}κi=0 of matrices ai ∈ Kn×m usingrandom matrices x ∈ Km×N and z ∈ KN×n

ai = (xAiy)T , for y = Az . O(N2(wA + m)

)BW2: Use block Berlekamp–Massey to compute polynomial f with

coefficients in Kn×n.Coppersmith’s version: O(N2 · n)

Thomé’s version: O(N log2 N · n)BW3: Evaluate the reverse of f :

W =

deg(f )∑j=0

Ajz(fdeg(f )−j)T . O

(N2(wA + n)

)

Page 15: Solving Quadratic Equations with XL on Parallel Architectures

The block Wiedemann algorithmDue to Coppersmith (1994), three steps:

Input: A ∈ KN×N , parameters m, n ∈ N, κ ∈ N of sizeN/m + N/n + O(1).

BW1: Compute sequence {ai}κi=0 of matrices ai ∈ Kn×m usingrandom matrices x ∈ Km×N and z ∈ KN×n

ai = (xAiy)T , for y = Az . O(N2(wA + m)

)

BW2: Use block Berlekamp–Massey to compute polynomial f withcoefficients in Kn×n.Coppersmith’s version: O(N2 · n)

Thomé’s version: O(N log2 N · n)BW3: Evaluate the reverse of f :

W =

deg(f )∑j=0

Ajz(fdeg(f )−j)T . O

(N2(wA + n)

)

Page 16: Solving Quadratic Equations with XL on Parallel Architectures

The block Wiedemann algorithmDue to Coppersmith (1994), three steps:

Input: A ∈ KN×N , parameters m, n ∈ N, κ ∈ N of sizeN/m + N/n + O(1).

BW1: Compute sequence {ai}κi=0 of matrices ai ∈ Kn×m usingrandom matrices x ∈ Km×N and z ∈ KN×n

ai = (xAiy)T , for y = Az . O(N2(wA + m)

)BW2: Use block Berlekamp–Massey to compute polynomial f with

coefficients in Kn×n.Coppersmith’s version: O(N2 · n)

Thomé’s version: O(N log2 N · n)

BW3: Evaluate the reverse of f :

W =

deg(f )∑j=0

Ajz(fdeg(f )−j)T . O

(N2(wA + n)

)

Page 17: Solving Quadratic Equations with XL on Parallel Architectures

The block Wiedemann algorithmDue to Coppersmith (1994), three steps:

Input: A ∈ KN×N , parameters m, n ∈ N, κ ∈ N of sizeN/m + N/n + O(1).

BW1: Compute sequence {ai}κi=0 of matrices ai ∈ Kn×m usingrandom matrices x ∈ Km×N and z ∈ KN×n

ai = (xAiy)T , for y = Az . O(N2(wA + m)

)BW2: Use block Berlekamp–Massey to compute polynomial f with

coefficients in Kn×n.Coppersmith’s version: O(N2 · n)

Thomé’s version: O(N log2 N · n)BW3: Evaluate the reverse of f :

W =

deg(f )∑j=0

Ajz(fdeg(f )−j)T . O

(N2(wA + n)

)

Page 18: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1

Input: A ∈ KN×N , parameters m, n ∈ N, κ ∈ N of sizeN/m + N/n + O(1).

Compute sequence {ai}κi=0 of matrices ai ∈ Kn×m using randommatrices x ∈ Km×N and z ∈ KN×n

ai = (xAiy)T , for y = Az

using {ti}κi=0, ti ∈ KN×n,

ti =

{y = Az for i = 0Ati−1 for 0 < i ≤ κ,

ai = (xti )T .

Page 19: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1

INPUT: macaulay_matrix<N, N> A;sparse_matrix<N, n> z;

matrix<N, n> t_new, t_old;matrix<m, n> a[N/m + N/n + O(1)];sparse_matrix<m, N, weight> x;

x.rand();t_old = z;for (unsigned i = 0; i <= N/m + N/n + O(1); i++){

t_new = A * t_old;a[i] = x * t_new;swap(t_old, t_new);

}

RETURN a

Page 20: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – multicore processor

ti A ti−1= ×

Page 21: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – multicore processor2 cores

ti A ti−1= ×

Page 22: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – multicore processor4 cores

ti A ti−1= ×

Page 23: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – multicore processor4 cores

ti A ti−1= ×

Page 24: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – multicore processor4 cores

ti A ti−1= ×

OpenMP

Page 25: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system

ti A ti−1= ×

Page 26: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system2 computing nodes

ti A ti−1= ×

Page 27: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

ti A ti−1= ×

Page 28: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

ti A ti−1= ×

Page 29: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes a(i) ∈ Kn×m

ti A ti−1= ×

n

Page 30: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system

500

1000

1500

5000

10000

32 64 128 256 512 1024

Run

time[s]

Block Width: m = n

BW1BW2 Thomé

BW2 CoppersmithBW3

Page 31: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system

0

10

20

30

40

50

60

70

80

90

32 64 128 256 512 1024

Mem

ory[GB]

Block Width: m = n

BW2 ThoméBW2 Coppersmith

36 GB

Page 32: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system

Ati ti−1= ×

Page 33: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system2 computing nodes

Ati ti−1= ×

Page 34: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

Ati ti−1= ×

Page 35: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

Ati ti−1= ×

Page 36: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

Ati ti−1= ×

Page 37: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

Ati ti ti−1 ti−1= ×

Page 38: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

Ati ti−1 ti−1 ti−1= ×

Page 39: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

A ti ti−1ti ti = ×

Page 40: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

A ti ti−1ti ti = ×

MPI: ISend, IRecv, . . .

Page 41: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

A ti ti−1ti ti = ×

Page 42: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

A ti ti−1ti ti = ×

Page 43: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

A ti ti−1ti ti = ×

Page 44: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

A ti ti−1ti ti = ×

Page 45: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

A ti ti−1ti ti = ×

Page 46: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system4 computing nodes

A ti ti−1ti ti = ×

InfiniBand Verbs

Page 47: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system

0

0.25

0.5

0.75

1

2 4 8 16 32 64 128 256

Ratio

Number of Nodes

Verbs APIMPI

Page 48: Solving Quadratic Equations with XL on Parallel Architectures

Parallelization of BW1 – cluster system

2 4 8 16 32 64 128 256

Run

time

Number of Cluster Nodes

CommunicationComputation

Sent per Node

(InfiniBand MT26428, 2 ports of 4×QDR, 32 Gbit/s)

Page 49: Solving Quadratic Equations with XL on Parallel Architectures

Runtime n = 16, m = 18, F16

0

500

1000

1500

2000

2500

8 16 32 64 6 12 24 48

Run

time[m

in]

Number of Cores

BW1BW2 Tho.BW2 Cop.

BW3

NUMACluster

Page 50: Solving Quadratic Equations with XL on Parallel Architectures

Comparison to Magma F4

0.0010.010.11

101001000

18 19 20 21 22 23 24 25

Tim

e[hrs]

n, m = 2n

XL (64 cores)XL (scaled)Magma F4

0.01

0.1

1

10

100

18 19 20 21 22 23 24 25Mem

ory[GB]

n, m = 2n

XLMagma F4

Page 51: Solving Quadratic Equations with XL on Parallel Architectures

Conclusions

XL with block Wiedemann as system solver is an alternative forGröbner basis solvers, because

I in about 80% of the cases it operates on the same degree,I it scales well on multicore systems and moderate cluster sizes,

andI it has a relatively small memory demand.

Page 52: Solving Quadratic Equations with XL on Parallel Architectures

Thank you!