Upload
trankhanh
View
214
Download
1
Embed Size (px)
Citation preview
Solving Quadratic Equations with XLon Parallel Architectures
Cheng Chen-Mou1, Chou Tung2,Ni Ru-Ben2, Yang Bo-Yin2
1National Taiwan University2Academia SinicaTaipei, Taiwan
Leuven, Sept. 11, 2012
Solving Quadratic Equations with XLon Parallel Architectures
Chen-Mou Cheng1, Tung Chou2,Ruben Niederhagen2, Bo-Yin Yang2
1National Taiwan University2Academia SinicaTaipei, Taiwan
Leuven, Sept. 11, 2012
The XL algorithm
Some cryptographic systems can be attacked by solving a system ofmultivariate quadratic equations, e.g.:
I AES:I 8000 quadratic equations with 1600 variables over F2
(Courtois and Pieprzyk, 2002)I 840 sparse quadratic equations and 1408 linear equations over
3968 variables of F256 (Murphy and Robshaw, 2002)
I multivariate cryptographic systems, e.g. QUAD stream cipher(cryptanalysis by Yang, Chen, Bernstein, and Chen, 2007)
The XL algorithm
I XL is an acronym for extended linearization:I extend a quadratic system by multiplying with appropriate
monomialsI linearize by treating each monomial as an independent variableI solve the linearized system
I special case of Gröbner basis algorithmsI first suggested by Lazard (1983)I reinvented by Courtois, Klimov, Patarin, and Shamir (2000)I alternative to Gröbner basis solvers like Faugère’s F4 (1999,
e.g., Magma) and F5 (2002) algorithms
The XL algorithmFor b ∈ Nn denote by xb the monomial xb1
1 xb22 . . . xbn
n and by|b| = b1 + b2 + · · ·+ bn the total degree of xb.
given: finite field K = Fqsystem A of m multivariate quadratic equations:`1 = `2 = · · · = `m = 0, `i ∈ K [x1, x2, . . . , xn]
choose: operational degree D ∈ Nextend: system A to the system
R(D) = {xb`i = 0 : |b| ≤ D − 2, `i ∈ A}linearize: consider xd , d ≤ D a new variable
to obtain a linear systemMsolve: linear systemM
minimum degree D0 for reliable termination (Yang and Chen):
D0 := min{D : ((1− λ)m−n−1(1+ λ)m)[D] ≤ 0}
We use the Wiedemann algorithminstead of a Gauss solver. Thus, wedo not compute a complete Gröbnerbasis but distinguished solutions!
The XL algorithmFor b ∈ Nn denote by xb the monomial xb1
1 xb22 . . . xbn
n and by|b| = b1 + b2 + · · ·+ bn the total degree of xb.
given: finite field K = Fqsystem A of m multivariate quadratic equations:`1 = `2 = · · · = `m = 0, `i ∈ K [x1, x2, . . . , xn]
choose: operational degree D ∈ N How?extend: system A to the system
R(D) = {xb`i = 0 : |b| ≤ D − 2, `i ∈ A}linearize: consider xd , d ≤ D a new variable
to obtain a linear systemMsolve: linear systemM
minimum degree D0 for reliable termination (Yang and Chen):
D0 := min{D : ((1− λ)m−n−1(1+ λ)m)[D] ≤ 0}
We use the Wiedemann algorithminstead of a Gauss solver. Thus, wedo not compute a complete Gröbnerbasis but distinguished solutions!
The XL algorithmFor b ∈ Nn denote by xb the monomial xb1
1 xb22 . . . xbn
n and by|b| = b1 + b2 + · · ·+ bn the total degree of xb.
given: finite field K = Fqsystem A of m multivariate quadratic equations:`1 = `2 = · · · = `m = 0, `i ∈ K [x1, x2, . . . , xn]
choose: operational degree D ∈ N How?extend: system A to the system
R(D) = {xb`i = 0 : |b| ≤ D − 2, `i ∈ A}linearize: consider xd , d ≤ D a new variable
to obtain a linear systemMsolve: linear systemM
minimum degree D0 for reliable termination (Yang and Chen):
D0 := min{D : ((1− λ)m−n−1(1+ λ)m)[D] ≤ 0}
We use the Wiedemann algorithminstead of a Gauss solver. Thus, wedo not compute a complete Gröbnerbasis but distinguished solutions!
The XL algorithmFor b ∈ Nn denote by xb the monomial xb1
1 xb22 . . . xbn
n and by|b| = b1 + b2 + · · ·+ bn the total degree of xb.
given: finite field K = Fqsystem A of m multivariate quadratic equations:`1 = `2 = · · · = `m = 0, `i ∈ K [x1, x2, . . . , xn]
choose: operational degree D ∈ N How?extend: system A to the system
R(D) = {xb`i = 0 : |b| ≤ D − 2, `i ∈ A}linearize: consider xd , d ≤ D a new variable
to obtain a linear systemMsolve: linear systemM How?
minimum degree D0 for reliable termination (Yang and Chen):
D0 := min{D : ((1− λ)m−n−1(1+ λ)m)[D] ≤ 0}
We use the Wiedemann algorithminstead of a Gauss solver. Thus, wedo not compute a complete Gröbnerbasis but distinguished solutions!
The XL algorithmFor b ∈ Nn denote by xb the monomial xb1
1 xb22 . . . xbn
n and by|b| = b1 + b2 + · · ·+ bn the total degree of xb.
given: finite field K = Fqsystem A of m multivariate quadratic equations:`1 = `2 = · · · = `m = 0, `i ∈ K [x1, x2, . . . , xn]
choose: operational degree D ∈ N How?extend: system A to the system
R(D) = {xb`i = 0 : |b| ≤ D − 2, `i ∈ A}linearize: consider xd , d ≤ D a new variable
to obtain a linear systemMsolve: linear systemM How?
minimum degree D0 for reliable termination (Yang and Chen):
D0 := min{D : ((1− λ)m−n−1(1+ λ)m)[D] ≤ 0}
We use the Wiedemann algorithminstead of a Gauss solver. Thus, wedo not compute a complete Gröbnerbasis but distinguished solutions!
The Wiedemann algorithm – the basic ideagiven: A ∈ KN×N
wanted: v ∈ KN such that Av = 0solution: compute minimal polynomial f of A of degree d : f (A) = 0
d∑i=0
fiAi = 0
d∑i=0
fiAiz = 0 choose z ∈ KN randomly
d∑i=1
fiAiz + f0z = 0
d∑i=1
fiAiz = 0 since f0 = 0
A ·
(d∑
i=1
fiAi−1z
)︸ ︷︷ ︸
=v
= 0
The Wiedemann algorithm – the basic ideagiven: A ∈ KN×N
wanted: v ∈ KN such that Av = 0 How?solution: compute minimal polynomial f of A of degree d : f (A) = 0
d∑i=0
fiAi = 0
d∑i=0
fiAiz = 0 choose z ∈ KN randomly
d∑i=1
fiAiz + f0z = 0
d∑i=1
fiAiz = 0 since f0 = 0
A ·
(d∑
i=1
fiAi−1z
)︸ ︷︷ ︸
=v
= 0
The Berlekamp–Massey algorithm
Given a linearly recurrent sequence
S = {ai}∞i=0, ai ∈ K ,
compute an annihilating polynomial f of degree d such that
d∑i=0
fiaj+i = 0, for all j ∈ N.
The Berlekamp–Massey algorithm requires the first 2 · d elementsof S as input and computes fi ∈ K , 0 ≤ i ≤ d .
ai = xAiz , x ∈ K 1×N
The Berlekamp–Massey algorithm
Given a linearly recurrent sequence
S = {ai}∞i=0, ai ∈ K ,
compute an annihilating polynomial f of degree d such that
d∑i=0
fiaj+i = 0, for all j ∈ N.
The Berlekamp–Massey algorithm requires the first 2 · d elementsof S as input and computes fi ∈ K , 0 ≤ i ≤ d .
ai = xAiz , x ∈ K 1×N
The block Wiedemann algorithmDue to Coppersmith (1994), three steps:
Input: A ∈ KN×N , parameters m, n ∈ N, κ ∈ N of sizeN/m + N/n + O(1).
BW1: Compute sequence {ai}κi=0 of matrices ai ∈ Kn×m usingrandom matrices x ∈ Km×N and z ∈ KN×n
ai = (xAiy)T , for y = Az . O(N2(wA + m)
)BW2: Use block Berlekamp–Massey to compute polynomial f with
coefficients in Kn×n.Coppersmith’s version: O(N2 · n)
Thomé’s version: O(N log2 N · n)BW3: Evaluate the reverse of f :
W =
deg(f )∑j=0
Ajz(fdeg(f )−j)T . O
(N2(wA + n)
)
The block Wiedemann algorithmDue to Coppersmith (1994), three steps:
Input: A ∈ KN×N , parameters m, n ∈ N, κ ∈ N of sizeN/m + N/n + O(1).
BW1: Compute sequence {ai}κi=0 of matrices ai ∈ Kn×m usingrandom matrices x ∈ Km×N and z ∈ KN×n
ai = (xAiy)T , for y = Az . O(N2(wA + m)
)
BW2: Use block Berlekamp–Massey to compute polynomial f withcoefficients in Kn×n.Coppersmith’s version: O(N2 · n)
Thomé’s version: O(N log2 N · n)BW3: Evaluate the reverse of f :
W =
deg(f )∑j=0
Ajz(fdeg(f )−j)T . O
(N2(wA + n)
)
The block Wiedemann algorithmDue to Coppersmith (1994), three steps:
Input: A ∈ KN×N , parameters m, n ∈ N, κ ∈ N of sizeN/m + N/n + O(1).
BW1: Compute sequence {ai}κi=0 of matrices ai ∈ Kn×m usingrandom matrices x ∈ Km×N and z ∈ KN×n
ai = (xAiy)T , for y = Az . O(N2(wA + m)
)BW2: Use block Berlekamp–Massey to compute polynomial f with
coefficients in Kn×n.Coppersmith’s version: O(N2 · n)
Thomé’s version: O(N log2 N · n)
BW3: Evaluate the reverse of f :
W =
deg(f )∑j=0
Ajz(fdeg(f )−j)T . O
(N2(wA + n)
)
The block Wiedemann algorithmDue to Coppersmith (1994), three steps:
Input: A ∈ KN×N , parameters m, n ∈ N, κ ∈ N of sizeN/m + N/n + O(1).
BW1: Compute sequence {ai}κi=0 of matrices ai ∈ Kn×m usingrandom matrices x ∈ Km×N and z ∈ KN×n
ai = (xAiy)T , for y = Az . O(N2(wA + m)
)BW2: Use block Berlekamp–Massey to compute polynomial f with
coefficients in Kn×n.Coppersmith’s version: O(N2 · n)
Thomé’s version: O(N log2 N · n)BW3: Evaluate the reverse of f :
W =
deg(f )∑j=0
Ajz(fdeg(f )−j)T . O
(N2(wA + n)
)
Parallelization of BW1
Input: A ∈ KN×N , parameters m, n ∈ N, κ ∈ N of sizeN/m + N/n + O(1).
Compute sequence {ai}κi=0 of matrices ai ∈ Kn×m using randommatrices x ∈ Km×N and z ∈ KN×n
ai = (xAiy)T , for y = Az
using {ti}κi=0, ti ∈ KN×n,
ti =
{y = Az for i = 0Ati−1 for 0 < i ≤ κ,
ai = (xti )T .
Parallelization of BW1
INPUT: macaulay_matrix<N, N> A;sparse_matrix<N, n> z;
matrix<N, n> t_new, t_old;matrix<m, n> a[N/m + N/n + O(1)];sparse_matrix<m, N, weight> x;
x.rand();t_old = z;for (unsigned i = 0; i <= N/m + N/n + O(1); i++){
t_new = A * t_old;a[i] = x * t_new;swap(t_old, t_new);
}
RETURN a
Parallelization of BW1 – multicore processor
ti A ti−1= ×
Parallelization of BW1 – multicore processor2 cores
ti A ti−1= ×
Parallelization of BW1 – multicore processor4 cores
ti A ti−1= ×
Parallelization of BW1 – multicore processor4 cores
ti A ti−1= ×
Parallelization of BW1 – multicore processor4 cores
ti A ti−1= ×
OpenMP
Parallelization of BW1 – cluster system
ti A ti−1= ×
Parallelization of BW1 – cluster system2 computing nodes
ti A ti−1= ×
Parallelization of BW1 – cluster system4 computing nodes
ti A ti−1= ×
Parallelization of BW1 – cluster system4 computing nodes
ti A ti−1= ×
Parallelization of BW1 – cluster system4 computing nodes a(i) ∈ Kn×m
ti A ti−1= ×
n
Parallelization of BW1 – cluster system
500
1000
1500
5000
10000
32 64 128 256 512 1024
Run
time[s]
Block Width: m = n
BW1BW2 Thomé
BW2 CoppersmithBW3
Parallelization of BW1 – cluster system
0
10
20
30
40
50
60
70
80
90
32 64 128 256 512 1024
Mem
ory[GB]
Block Width: m = n
BW2 ThoméBW2 Coppersmith
36 GB
Parallelization of BW1 – cluster system
Ati ti−1= ×
Parallelization of BW1 – cluster system2 computing nodes
Ati ti−1= ×
Parallelization of BW1 – cluster system4 computing nodes
Ati ti−1= ×
Parallelization of BW1 – cluster system4 computing nodes
Ati ti−1= ×
Parallelization of BW1 – cluster system4 computing nodes
Ati ti−1= ×
Parallelization of BW1 – cluster system4 computing nodes
Ati ti ti−1 ti−1= ×
Parallelization of BW1 – cluster system4 computing nodes
Ati ti−1 ti−1 ti−1= ×
Parallelization of BW1 – cluster system4 computing nodes
A ti ti−1ti ti = ×
Parallelization of BW1 – cluster system4 computing nodes
A ti ti−1ti ti = ×
MPI: ISend, IRecv, . . .
Parallelization of BW1 – cluster system4 computing nodes
A ti ti−1ti ti = ×
Parallelization of BW1 – cluster system4 computing nodes
A ti ti−1ti ti = ×
Parallelization of BW1 – cluster system4 computing nodes
A ti ti−1ti ti = ×
Parallelization of BW1 – cluster system4 computing nodes
A ti ti−1ti ti = ×
Parallelization of BW1 – cluster system4 computing nodes
A ti ti−1ti ti = ×
Parallelization of BW1 – cluster system4 computing nodes
A ti ti−1ti ti = ×
InfiniBand Verbs
Parallelization of BW1 – cluster system
0
0.25
0.5
0.75
1
2 4 8 16 32 64 128 256
Ratio
Number of Nodes
Verbs APIMPI
Parallelization of BW1 – cluster system
2 4 8 16 32 64 128 256
Run
time
Number of Cluster Nodes
CommunicationComputation
Sent per Node
(InfiniBand MT26428, 2 ports of 4×QDR, 32 Gbit/s)
Runtime n = 16, m = 18, F16
0
500
1000
1500
2000
2500
8 16 32 64 6 12 24 48
Run
time[m
in]
Number of Cores
BW1BW2 Tho.BW2 Cop.
BW3
NUMACluster
Comparison to Magma F4
0.0010.010.11
101001000
18 19 20 21 22 23 24 25
Tim
e[hrs]
n, m = 2n
XL (64 cores)XL (scaled)Magma F4
0.01
0.1
1
10
100
18 19 20 21 22 23 24 25Mem
ory[GB]
n, m = 2n
XLMagma F4
Conclusions
XL with block Wiedemann as system solver is an alternative forGröbner basis solvers, because
I in about 80% of the cases it operates on the same degree,I it scales well on multicore systems and moderate cluster sizes,
andI it has a relatively small memory demand.
Thank you!