Irreducible Polynomials over Finite Fieldspeople.math.sc.edu/filaseta/gradcourses/Math788M/... · a prime psuch that f(x) is irreducible modulo p. If we can nd such a prime pand pdoes

Chapter 4

Irreducible Polynomialsover Finite Fields

§4.1 Construction of Finite Fields

As we will see, modular arithmetic aids in testing the irreducibility of poly-nomials and even in completely factoring polynomials in Z[x]. If we expect apolynomial f(x) is irreducible, for example, it is not unreasonable to try to finda prime p such that f(x) is irreducible modulo p. If we can find such a primep and p does not divide the leading coefficient of f(x), then f(x) is irreduciblein Z[x] (see the example on page 4). It is the case that there exist polynomialswhich are irreducible in Z[x] but are reducible modulo every prime (see Exer-cise (4.1)), but as it turns out, one can show that such polynomials are rareand verifying that a polynomial f(x) is irreducible by trying to find a prime pfor which f(x) is irreducible modulo p will almost always work rather quickly(see Chapter 5). This is already strong motivation for looking into the idea ofusing modular arithmetic, but in this chapter, we plan to explore other aspectsof modular arithmetic as well.

We begin with a definition. Let p be a prime, and let f(x) ∈ Z[x]. Supposefurther that f(x) 6≡ 0 (mod p). We say that u(x) ≡ v(x) (mod p, f(x)), whereu(x) and v(x) are in Z[x], if there exist g(x) and h(x) in Z[x] such that u(x) =v(x) + f(x)g(x) + ph(x). (In other words, u(x) ≡ v(x) (mod p, f(x)) if u(x)−v(x) is in the ideal generated by p and f(x) in the ring Z[x].) One easilychecks that if u(x) ≡ v(x) (mod p, f(x)) and v(x) ≡ w(x) (mod p, f(x)), thenu(x) ≡ w(x) (mod p, f(x)). Suppose that u1(x) ≡ v1(x) (mod p, f(x)) andu2(x) ≡ v2(x) (mod p, f(x)). Then

u1(x)± u2(x) ≡ v1(x)± v2(x) (mod p, f(x)).

61

62

Also, using

u1(x)u2(x)− v1(x)v2(x) = u1(x) (u2(x)− v2(x)) + v2(x) (u1(x)− v1(x)) ,

we easily see that u1(x)u2(x) ≡ v1(x)v2(x) (mod p, f(x)). We note that ifu(x) ≡ v(x) (mod p), then u(x) ≡ v(x) (mod p, f(x)) (by taking g(x) ≡ 0), andif u(x) ≡ v(x) (mod f(x)), then u(x) ≡ v(x) (mod p, f(x)) (by taking h(x) ≡0). A further important and useful observation is that u(x) ≡ 0 (mod p, f(x))if and only if f(x) is a factor of u(x) modulo p.

Let f(x) be monic. If u(x) ≡ v(x) (mod p, f(x)) where u(x) and v(x) arein Z[x], then there are polynomials g0(x) and h0(x) in Z[x] such that u(x) −v(x) = f(x)g0(x) + ph0(x). Recall (see Exercise (1.14) (b)) that when dividinga polynomial in Z[x] by a monic polynomial in Z[x], the quotient and remainderwill be in Z[x]. It follows that there are polynomials q(x) and r(x) in Z[x] withr(x) ≡ 0 or deg r < deg f such that h0(x) = f(x)q(x) + r(x). Taking g(x) =g0(x) + pq(x) and h(x) = r(x), we deduce that if u(x) ≡ v(x) (mod p, f(x)),then there are polynomials g(x) and h(x) in Z[x] with h(x) ≡ 0 or deg h < deg fsuch that u(x) − v(x) = f(x)g(x) + ph(x). A simple argument shows furtherthat such a g(x) and h(x) are unique (given u(x), v(x), f(x), and p).

We will also make use of the following convention. Let p be a prime, andsuppose f(x) =

∑nj=0 ajx

j ∈ Z[x] (with f(x) 6≡ 0 (mod p)). Then we refer tothe degree of f(x) modulo p as the largest positive integer k ≤ n for which pdoes not divide ak. Thus, for example, 2x3 + 3x2 + 4 is a polynomial of degree2 modulo 2. With the added condition that an = 1, we easily see that anyg(x) ∈ Z[x] is congruent (mod p, f(x)) to one of the pn polynomials of degree≤ n − 1 with coefficients from {0, 1, . . . , p − 1}. Also, each of these pn poly-nomials are incongruent (mod p, f(x)). In other words, we can view these pn

polynomials as representatives of the pn distinct residue classes (mod p, f(x)).Consider now the possibility that an 6= 1, and let k denote the degree of f(x)modulo p. Exercise (4.6) implies that arithmetic (mod p, f(x)) is the sameas arithmetic (mod p, f1(x)) where f1(x) ≡ f(x) (mod p) and deg f1(x) = k.Exercise (4.5) further implies that arithmetic (mod p, f1(x)) is the same asarithmetic (mod p, f2(x)) where f2(x) is an appropriate monic polynomial withdeg f2(x) = k. It follows that there are precisely pk distinct residue classes(mod p, f(x)) with representatives given by the polynomials of degree ≤ k − 1with coefficients from {0, 1, . . . , p− 1}.

Theorem 4.1.1. Let p be a prime. If f(x) ∈ Z[x] is of degree n modulo p andf(x) is irreducible modulo p, then

xpn

≡ x (mod p, f(x)).

We clarify that in Theorem 4.1.1, as is usual,

xpn

= x(pn).

Before we prove this theorem, we consider an example. We show that f(x) =xp−x−1 is irreducible modulo p (and hence irreducible over Z). Consider u(x)

Chapter 11

ComputationalConsiderations

§11.1 Berlekamp’s Method and Hensel Lifting

We begin this chapter by discussing some important and classical methods forfactoring polynomials in Z[x]. From this book’s point of view, we are mainlyinterested in knowing whether a given polynomial in Z[x] is irreducible over Q.If it is reducible over Q, factoring methods will allow us to find a non-trivialfactorization of it in Z[x].

A key to factoring techniques for polynomials in Z[x] is to make use of a fac-toring algorithm in Fp[x], where Fp as before is the field of arithmetic modulo p.We describe an approach due to Berlekamp (1984). This algorithm determinesthe factorization of a polynomial f(x) in Fp[x] where p is a prime (or more gen-erally over finite fields). For simplicity, we suppose f(x) is monic and squarefreein Fp[x]. Let n = deg f(x). For w(x) ∈ Z[x], define w(x) modd (p, f(x)) as theunique g(x) ∈ Z[x] satisfying deg g ≤ n, each coefficient of g(x) is in the set{0, 1, . . . , p − 1} and g(x) ≡ w(x) (mod p, f(x)). Observe that we can vieww(x) modd (p, f(x)) as also being in Fp[x].

Let A be the matrix with jth column derived from the coefficients of

x(j−1)p modd (p, f(x)).

Specifically, write

x(j−1)p modd (p, f(x)) =n∑i=1

aijxi−1 for 1 ≤ j ≤ n.

255

256

Then we set A = (aij)n×n. Note that the first column consists of a one followedby n−1 zeroes. In particular, 〈1, 0, 0, . . . , 0〉 will be an eigenvector for A associ-ated with the eigenvalue 1. We are interested in determining the complete set ofeigenvectors associated with the eigenvalue 1. In other words, we would like toknow the null space of B = A− I where I represents the n× n identity matrix.It will be spanned by k = n − rank(B) linearly independent vectors which canbe determined by performing row operations on B. Suppose ~v = 〈b1, b2, . . . , bn〉is one of these vectors, and set g(x) =

∑nj=1 bjx

j−1. Observe that

g(xp) ≡ g(x) (mod p, f(x)).

Moreover, the g(x) with this property are precisely the g(x) with coefficientsobtained from the components of vectors ~v in the null space of B.

Our first result in this chapter connects the factorization of f(x) in Fp[x]with the computations of greatest common divisors of f(x) and polynomialsg(x) − s where s ∈ {0, 1, . . . , p − 1}. These greatest common divisors must becomputed in Fp[x], and we recall the discussion following Definition 1.3.1. Forclarification in what follows, if u(x) and v(x) are in Z[x] or Fp[x], then we usethe notation gcd p

(u(x), v(x)

)to denote the greatest common divisor of u(x)

and v(x) computed over the field Fp.

Theorem 11.1.1. Let f(x) be a monic polynomial in Z[x]. Suppose f(x) issquarefree in Fp[x]. Let g(x) be a polynomial with coefficients obtained from avector in the null space of B = A− I as described above. Then

f(x) ≡p−1∏s=0

gcd p(g(x)− s, f(x)

)(mod p).

Proof. Observe that

g(x)p − g(x) ≡p−1∏s=0

(g(x)− s) (mod p).

Since g(x)p ≡ g(xp) (mod p), we deduce that f(x) divides the product on theright in Fp. Since the factors g(x) − s, for s ∈ {0, 1, . . . , p − 1}, are pairwiserelatively prime in Fp, we deduce that each monic irreducible factor of f(x)divides exactly one of the expressions gcd p

(g(x) − s, f(x)

)appearing on the

right. The result follows.

Observe that if g(x) is not a constant, then 1 ≤ deg(g(x)− s) < deg f(x) foreach s so the above claim implies we get a non-trivial factorization of f(x) inFp[x]. On the other hand, f(x) will not necessarily be completely factored. Onecan completely factor f(x) by repeating the above procedure for each factorobtained from the claim; but it is simpler to use (and not difficult to show)that if one takes the product of the greatest common divisors of each factor of

257

f(x) obtained above with h(x)− s (with 0 ≤ s ≤ p− 1) where h(x) is obtainedfrom another of the k vectors spanning the null space of B, then one will obtaina new non-trivial factor of f(x) in Fp[x]. Continuing to use all k vectors willproduce a complete factorization of f(x) in Fp[x].

As an example, we factor x7 + x4 + x3 + x+ 1 in F2[x]. The matrices A andB are

A =

1 0 0 0 0 1 0

0 0 0 0 1 1 1

0 1 0 0 1 0 0

0 0 0 0 0 0 1

0 0 1 0 1 0 1

0 0 0 0 1 0 1

0 0 0 1 0 1 0

and B =

0 0 0 0 0 1 0

0 1 0 0 1 1 1

0 1 1 0 1 0 0

0 0 0 1 0 0 1

0 0 1 0 0 0 1

0 0 0 0 1 1 1

0 0 0 1 0 1 1

Performing elementary row operations in F2 on the matrix B, one can obtainthe matrix

0 1 0 0 0 0 0

0 0 1 0 0 0 1

0 0 0 1 0 0 1

0 0 0 0 1 0 1

0 0 0 0 0 1 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

.

Consequently, the dimension of the null space is 2, and it is spanned by the twovectors 〈1, 0, 0, 0, 0, 0, 0〉 and 〈0, 0, 1, 1, 1, 0, 1〉. The first of these vectors corre-sponds to polynomial g(x) = 1. Since this is a constant, Theorem 11.1.1 onlyleads to the trivial factorization for this choice of g(x). The second eigenvectorgives g(x) = x6 + x4 + x3 + x2. As g(x), in this case, is not constant, we knowthat Theorem 11.1.1 must lead to a non-trivial factorization of f(x). In fact,we get

gcd 2(f(x), g(x)) = x3+x2+1 and gcd 2(f(x), g(x)−1) = x4+x3+x2+x+1,

so that we can deduce

f(x) ≡ (x3 + x2 + 1)(x4 + x3 + x2 + x+ 1) (mod 2).

Next, we describe Hensel Lifting, which is a procedure for using the factor-ization of f(x) in Fp[x] (p a prime) to produce a factorization of f(x) modulo

258

pk for an arbitrary positive integer k. Suppose that u(x) and v(x) are relativelyprime polynomials in Fp[x] for which

f(x) ≡ u(x)v(x) (mod p).

We continue to view f(x) as being monic, for simplicity, so we take u(x) andv(x) also to be monic. Then Hensel Lifting will produce, for any positive integerk, monic polynomials uk(x) and vk(x) in Z[x] satisfying

uk(x) ≡ u(x) (mod p), vk(x) ≡ v(x) (mod p),

andf(x) ≡ uk(x)vk(x) (mod pk).

When k = 1, it is clear how to choose uk(x) and vk(x). For k ≥ 1, we determinevalues of uk+1(x) and vk+1(x) from the values of uk(x) and vk(x) as follows.We compute

wk(x) ≡ 1pk

(f(x)− uk(x)vk(x)) (mod p).

Observe that degwk(x) < deg f(x) and

pkwk(x) ≡ f(x)− uk(x)vk(x) (mod pk+1).

Since u(x) and v(x) are relatively prime in Fp[x], we can find a(x) and b(x) inFp[x] (depending on k) such that

a(x)u(x) + b(x)v(x) ≡ wk(x) (mod p).

By choosing t(x) ∈ Fp[x] appropriately and replacing a(x) with a(x)− t(x)v(x)and b(x) with b(x) + t(x)u(x), we see that we may suppose deg a(x) < deg v(x).The above equation then implies further that we may take deg b(x) < deg u(x).Setting

uk+1(x) = uk(x) + b(x)pk and vk+1(x) = vk(x) + a(x)pk,

we see that uk+1(x) and vk+1(x) are monic and

uk+1(x)vk+1(x) ≡(uk(x) + b(x)pk

)(vk(x) + a(x)pk

)≡ uk(x)vk(x) + pk

(a(x)u(x) + b(x)v(x)

)≡ uk(x)vk(x) + pkwk(x)

≡ uk(x)vk(x) +(f(x)− uk(x)vk(x)

)≡ f(x) (mod pk+1).

A complete factorization of f(x) modulo pk can be obtained from a com-plete factorization of f(x) modulo p by modifying this idea. We do not elab-orate on the best approach here, but note that such a factorization can be

259

achieved easily as follows. If f(x) is a product of r monic irreducible polyno-mials g1(x), g2(x), . . . , gr(x) modulo p, then one can factor f(x) modulo pk bytaking u(x) = g1(x) and v(x) = g2(x)g3(x) · · · gr(x) above. This will produce afactor uk(x) modulo pk that is congruent to u(x) modulo p and another factorvk(x) congruent to v(x) modulo p. One can then replace the role of f(x) withvk(x) which is g2(x)g3(x) · · · gr(x) modulo p and repeat the process, factoringvk(x) modulo pk as a polynomial which is congruent to g2(x) modulo p and apolynomial congruent to g3(x)g4(x) · · · gr(x) modulo p. Continuing in this man-ner, one gets a complete factorization of f(x) into a product of monic irreduciblepolynomials modulo pk.

By way of example, setting f(x) = x7 + x4 + x3 + x+ 1, u(x) = x3 + x2 + 1and v(x) = x4 + x3 + x2 + x + 1, we recall that, in our previous example, weshowed f(x) ≡ u(x)v(x) (mod 2). Then

w1(x) ≡ 12

(f(x)− u(x)v(x)) ≡ x6 + x5 + x4 + x3 + x2 (mod 2).

Taking a(x) = 0 and b(x) = x2, we see that

a(x)u(x) + b(x)v(x) ≡ w1(x) (mod 2).

Hence, we takeu2(x) = u(x) + 2x2 = x3 + 3x2 + 1

andv2(x) = v(x) = x4 + x3 + x2 + x+ 1.

We deduce that

f(x) ≡ (x3 + 3x2 + 1)(x4 + x3 + x2 + x+ 1) (mod 4).

Continuing in this manner, we obtain

f(x) = (x3 + 7x2 + 1)(x4 + x3 + x2 + x+ 1) (mod 8)

f(x) = (x3 + 15x2 + 1)(x4 + x3 + x2 + x+ 1) (mod 16)

f(x) = (x3 + 31x2 + 1)(x4 + x3 + x2 + x+ 1) (mod 32)

f(x) = (x3 + 63x2 + 1)(x4 + x3 + x2 + x+ 1) (mod 64)

Perhaps the above is enough for the reader to guess how f(x) factors in Z[x].We return to this at the end of the next section.

260

§11.2 An Inequality of Landauand an Approach of Zassenhaus

Landau’s inequality gives an upper bound on the “size” of the factors of a givenpolynomial in Z[x]. For

(11.2.1) f(x) =n∑j=0

ajxj = an

n∏j=1

(x− αj),

we recall the notations

‖f‖ =( n∑j=0

a2j

)1/2

and M(f) = |an|n∏j=1

max{1, |αj |},

the latter being the Mahler measure of the polynomial f(x). We make use offollowing two easily established properties of Mahler measure:

(i) If g(x) and h(x) are in C[x], then M(gh) = M(g)M(h).

(ii) If g(x) is in Z[x], then M(g) ≥ 1.

For a fixed f(x) ∈ Z[x], we want an upper bound on ‖g‖ where g(x) is a factorof f(x) in Z[x].

Theorem 11.2.1. If f(x), g(x), and h(x) in Z[x] are such that f(x) = g(x)h(x),then

‖g‖ ≤ 2deg g‖f‖.

Proof. We begin by proving that for f(x) ∈ R[x],

(11.2.2) M(f) ≤ ‖f‖ ≤ 2deg fM(f).

For w(x) ∈ C[x], we use the reciprocal polynomial w(x) = xdegww(1/x). Thecoefficient of xdegw in the expanded product w(x)w(x) is ‖w‖2. For f(x) as in(11.2.1), we consider

w(x) = an∏

1≤j≤n|αj |>1

(x− αj)∏

1≤j≤n|αj |≤1

(αjx− 1).

Observe thatw(x) = an

∏1≤j≤n|αj |>1

(1− αjx)∏

1≤j≤n|αj |≤1

(αj − x).

Hence,

w(x)w(x) = a2n

n∏j=1

(x− αj)n∏j=1

(1− αjx) = f(x)f(x).

261

By comparing coefficients of xn, we deduce that ‖w‖ = ‖f‖. Also, the definitionof w(x) implies |w(0)| = M(f). Thus, writing w(x) =

∑nj=0 cjx

j , we obtain

M(f) = |c0| ≤ (c20 + c21 + · · ·+ c2n)1/2 = ‖w‖ = ‖f‖,

establishing the first inequality in (11.2.2). For the second inequality, observethat for any k ∈ {1, 2, . . . , n}, the product of any k of the αj has absolute value≤ M(f)/|an|. It follows that |an−k|/|an|, which is the sum of the products ofthe roots taken k at a time, is ≤

(nk

)×M(f)/|an|. Hence,

|an−k| ≤(n

k

)M(f) =

(n

n− k

)M(f).

The second inequality in (11.2.2) now follows from

‖f‖ =( n∑j=0

a2j

)1/2

≤n∑j=0

|aj | ≤n∑j=0

(n

j

)M(f) = 2nM(f).

Now, we make use of (11.2.2) and properties (i) and (ii) of Mahler measureabove to deduce

‖g‖ ≤ 2deg gM(g) ≤ 2deg gM(g)M(h)

= 2deg gM(gh) = 2deg gM(f) ≤ 2deg g‖f‖.

This establishes the theorem.

We explain a method for factoring a given f(x) ∈ Z[x] with the addedassumptions that f(x) is monic and squarefree. This approach has its origins ina paper by Zassenhaus (1969). The latter we can test by computing gcd(f, f ′),which will give us a nontrivial factor of f(x) if f(x) is not squarefree. If f(x)is not monic, then one needs to add a little more to the ideas below (but notmuch).

LetB = 2b(deg f)/2c‖f‖.

Then if f(x) has a nontrivial factor g(x) in Z[x], it has such a factor of degree≤ b(deg f)/2c so that by Theorem 11.2.1, we can use B as a bound on ‖g‖.

Next, we find a prime p for which f is squarefree modulo p. There are avariety of ways this can be done. There are only a finite number of primesfor which f is not squarefree modulo p. These primes divide the resultantR(f, f ′). So one can compute R(f, f ′) and avoid primes which divide R(f, f ′).Alternatively, one can compute gcd p(f(x), f ′(x)) modulo p or simply usingBerlekamp’s factoring algorithm until a squarefree factorization occurs.

We choose a positive integer r as small as possible such that pr > 2B. Thenwe factor f(x) modulo p by Berlekamp’s algorithm and use Hensel lifting to

262

obtain the factorization of f(x) modulo pr. Given our conditions on f(x), wecan suppose all irreducible factors are monic and do so.

Next, we can determine if f(x) = g(x)h(x) for some monic g(x) and h(x)in Z[x] with ‖g‖ ≤ B as follows. We observe that the coefficients of g(x) arein [−B,B]. We use a residue system modulo pr that includes this interval,namely (−pr/2, pr/2], and consider each factorization of f(x) modulo pr withcoefficients in this residue system as a product of two monic polynomials u(x)and v(x). Since f(x) = g(x)h(x), there must be some factorization where

g(x) ≡ u(x) (mod pr) and h(x) ≡ v (mod pr).

On the other hand, the coefficients of g(x) and u(x) are all in (−pr/2, pr/2] sothat the coefficients of g(x) − u(x) are each divisible by pr and are each < pr

in absolute value. This implies g(x) = u(x). Thus, we can determine if a factorg(x) exists as above by simply checking each monic factor of f(x) modulo pr

with coefficients in (−pr/2, pr/2].Recall we factored f(x) = x7 +x4 +x3 +x+1 modulo various powers of 2 in

the previous section. Using the above approach, we can deduce a factorizationof f(x) in Z[x]. In the notation above

B = 23‖f‖ = 8√

5 < 20.

Since 26 = 64 > 2B, we can use the factorization of f(x) that we obtainedmodulo 64, namely

f(x) = (x3 + 63x2 + 1)(x4 + x3 + x2 + x+ 1) (mod 64).

We see that if f(x) is reducible, then it must have two factors, one congruentto x3 + 63x2 + 1 modulo 64 and one congruent to x4 + x3 + x2 + x+ 1 modulo64. The first of these is of particular interest to us as its degree is < (deg f)/2.If g(x) ∈ Z[x] divides f(x) and

g(x) ≡ x3 + 63x2 + 1 (mod 64),

then the arguments above imply that g(x) must equal the polynomial obtainedby taking the coefficients on the right to be in the interval [−32, 32]. In otherwords, from

x3 + 63x2 + 1 ≡ x3 − x2 + 1 (mod 64),

we deduce g(x) = x3 − x2 + 1. To clarify, this means that if f(x) is reducibleover Z, then g(x) will be a factor of f(x). To establish that f(x) in fact hasg(x) as a factor, we are left with checking if g(x) divides f(x). In fact, we havethe (perhaps not unexpected) factorization

f(x) = (x3 − x2 + 1)(x4 + x3 + x2 + x+ 1).

263

§11.3 Swinnerton-Dyer’s Example

The algorithm just described above for factoring a polynomial f(x) ∈ Z[x] ofdegree n can take time that is exponential in n for some, albeit rare, f(x). Thishas been illustrated by a nice example due to Swinnerton-Dyer. We formulatehis example as follows. Let a1, a2, . . . , am be arbitrary squarefree pairwise rela-tively prime integers > 1. Let Sm be the set of 2m different m-tuples (ε1, . . . , εm)where each εj ∈ {1,−1}. We justify that the polynomial

(11.3.1) f(x) =∏

(ε1,...,εm)∈Sm

(x− (ε1

√a1 + · · ·+ εm

√am)

)has the properties:

(i) The polynomial f(x) is in Z[x].

(ii) It is irreducible over the rationals.

(iii) It factors as a product of linear and quadratic polynomials modulo everyprime p.

Some of the arguments below (in particular, the argument for (ii)) makes someuse of Galois theory. We elaborate on the details but note in advance that somebackground is needed in this direction for the presentation given here.

One can deduce (i) by observing that the coefficients of f(x) are symmetricpolynomials with integer coefficients in the roots of

(x2 − a1)(x2 − a2) · · · (x2 − am).

Alternatively, an easy induction argument on m can be done as follows. Forany squarefree positive integer a1, we have(

x−√a1

)(x+√a1

)= x2 − a1.

Suppose f(x), as above, is in Z[x] whenever m = t where t is a positive integer.Let a1, a2, . . . , at+1 be arbitrary squarefree pairwise relatively prime integers> 1. The induction hypothesis implies

ft(x) =∏

(ε1,...,εt)∈St

(x− (ε1

√a1 + · · ·+ εt

√at))∈ Z[x].

By elementary symmetric functions associated with the two roots of the quadraticx2 − at+1, we see that

ft+1(x) =∏

(ε1,...,εt+1)∈St+1

(x− (ε1

√a1 + · · ·+ εt+1

√at+1)

)= ft

(x+√at+1

)ft(x−√at+1

)∈ Z[x].

264

The use of elementary symmetric functions can be avoided by observing thatft(x+√at+1

)= u(x) + v(x)√at+1 for some u(x) and v(x) in Z[x] and, conse-

quently, ft(x−√at+1

)= u(x)−v(x)√at+1. Hence, the product of ft

(x+√at+1

)and ft

(x−√at+1

)is u(x)2 − at+1v(x)2 ∈ Z[x]. Thus, (i) holds.

To establish (ii), it suffices to show that the minimal polynomial for

αm =√a1 +

√a2 + · · ·+

√am,

that is the monic irreducible polynomial in Q[x] that has αm as a root, has all2m numbers of the form

ε1√a1 + · · ·+ εm

√am, where (ε1, . . . , εm) ∈ Sm,

as roots and that these 2m numbers are distinct. We begin, however, first bylooking at the algebraic number field

Km = Q(√a1,√a2, . . . ,

√am).

Observe that Km is the splitting field for the polynomial

(x2 − a1)(x2 − a2) · · · (x2 − am),

and therefore forms a Galois extension over Q. We prove

(∗) The number field Km has degree 2m over Q, and the 2m elements of theGalois group Gal(Km/Q) of Km over Q are given by the mappings

σ(√aj)

= εj√aj , for 1 ≤ j ≤ m,

where (ε1, . . . , εm) varies over the 2m elements of Sm.

We proceed by induction on m to establish that (∗) holds for each positiveintegerm and for all choices of a1, a2, . . . , am squarefree pairwise relatively primeintegers > 1.

We begin with m = 0 and m = 1. For m = 0, we have K0 = Q which hasdegree 1 over Q and implies Gal(K0/Q) only consists of the identity element.Thus, (∗) holds for m = 0. Suppose m = 1. Given a1 is squarefree and> 1, the quadratic x2 − a1 is irreducible over Q. Therefore, the number fieldK1 = Q

(√a1

)has degree 2 over Q. We deduce Gal(K1/Q) has exactly one non-

identity element σ. As a root of x2 − a1 must be mapped to a root of x2 − a1

by σ, we see that σ(√a1

)= ±√a1. Since σ is not the identity mapping, we

deduce σ(√a1

)= −√a1. This establishes what we want to start our induction,

namely (∗) for m = 1.Suppose now that (∗) holds for m ≤ t, where t is a positive integer, and let

a1, a2, . . . , at+1 be arbitrary squarefree pairwise relatively prime integers > 1.With the already established notation above, we have that the degree of Kt

over Q is 2t and that there is a σ ∈ Gal(Kt/Q) that satisfies σ(√a1

)= −√a1

265

and σ(√aj)

= √aj for 2 ≤ j ≤ t. We argue next that √at+1 6∈ Kt. Assumeotherwise. Let

S ={ t∏j=2

√ajηj : ηj ∈ {0, 1} for each j

}.

Thus, S consists of 2t−1 elements. Every element of Kt can be expresseduniquely as a linear combination of the elements of S and the elements of Stimes

√a1 with coefficients from Q. In other words, each element of Kt can be

written as ∑s∈S

b(s)s+∑s∈S

c(s)s√a1,

for exactly one choice of b(s) and c(s) in Q. Fix such b(s) and c(s) so thatthe above sum represents √at+1. We make use of the automorphism σ definedabove. Observe that

σ

(∑s∈S

b(s)s+∑s∈S

c(s)s√a1

)=∑s∈S

b(s)s−∑s∈S

c(s)s√a1.

Sinceσ(√at+1

)2 = σ(at+1) = at+1,

we deduce that σ(√at+1

)is either √at+1 or −√at+1. In the former case, we

use that

2√at+1 =

√at+1 + σ

(√at+1

)= 2

∑s∈S

b(s)s ∈ Q(√a2, . . . ,

√at).

In the latter case, we use that

2√at+1 =

√at+1 − σ

(√at+1

)= 2√a1

∑s∈S

c(s)s

which implies √a1at+1 ∈ Q

(√a2, . . . ,

√at).

On the other hand, by the induction hypothesis, Q(√a2, . . . ,

√at)

has degree2t−1 over Q and both the fields

Q(√a2, . . . ,

√at,√at+1

)and

Q(√a2, . . . ,

√at,√a1at+1

)have degree 2t over Q. This is a contradiction. Thus, √at+1 6∈ Kt. In otherwords, x2 − at+1 is irreducible over Kt. Note that Kt+1 is formed by adjoininga root of x2 − at+1 to Kt. Since the degree of Kt over Q is 2t, we deduce thatthe degree of Kt+1 over Q is 2t+1. Observe that if σ is an automorphism ofKt+1 that fixes Q, then its action on the elments of Kt+1 are determined by

266

the values of √aj for j ∈ {1, 2, . . . , t + 1}. Also, for each j ∈ {1, 2, . . . , t}, theelement √aj in Kt+1 must be mapped by σ to one of √aj or −√aj . Since thedegree of Kt+1 over Q is 2t+1, there are 2t+1 elements of the Galois group ofKt+1 over Q, and we see that they are precisely the 2t+1 mappings given in (∗)with m = t+ 1. Thus, (∗) holds for m = t+ 1, and the induction argument incomplete.

We return to establishing (ii). We show next that

(11.3.2) ε1√a1 + · · ·+ εm

√am 6= ε′1

√a1 + · · ·+ ε′m

√am,

where (ε1, . . . , εm) and (ε′1, . . . , ε′m) are distinct elements of Sm. Taking j max-

imal such that εj 6= ε′j , we see that if (11.3.2) does not hold, then

√aj ∈ Q

(√a1, . . . ,

√aj−1

).

The argument above showing that √at+1 6∈ Kt implies a contradiction. Thus,the 2m numbers

ε1√a1 + · · ·+ εm

√am,

with (ε1, . . . , εm) ∈ Sm are distinct. Observe that αm ∈ Km. If w(x) is theminimal polynomial for αm, then each σ ∈ Gal(Km/Q) must map αm to aroot of w(x). Fix (ε1, . . . , εm) ∈ Sm. From (∗), we know that there is a σ ∈Gal(Km/Q) satisfying

σ(αm) = ε1√a1 + · · ·+ εm

√am.

Therefore, each root of f(x), given by (11.3.1), is a root of w(x). We deducef(x) = w(x) and, hence, is irreducible. This completes the proof of (ii).

To see (iii), we again use induction on m. We give two such induction argu-ments, the first more direct for someone accustomed to working in extensionswith factorizations modulo a prime and the second more self-contained giventhe contents of this book. The case that m = 1 is clear, since in this case f(x),as defined in (11.3.1), is a quadratic. Suppose (iii) holds for m = t where t is apositive integer. Fix a prime p. We make use of the notation in (11.3.1) withm = t+ 1 and write

(11.3.3) f(x) = gj(x+√aj)gj(x−√aj

),

where j ∈ {1, 2, . . . , t+ 1} and gj(x) is defined by

gj(x) =∏

(ε1,...,εt)∈St

(x− (ε1

√a1 + · · ·+ εj−1

√aj−1 + εj

√aj+1 + · · ·+ εt

√at+1)

).

The induction hypothesis implies that each gj(x) factors modulo p as a productof linear and quadratic polynomials. If some aj is a square modulo p, then thereis some integer b such that b2 ≡ aj (mod p) and, hence, f(x) ≡ gj(x+b)gj(x−b)(mod p). Since each of gj(x+ b) and gj(x− b) factors as a product of linear andquadratic polynomials modulo p, we are through in this case. Now, suppose

267

no aj is a square modulo p. Fix (ε1, . . . , εm) ∈ Sm and observe that when theproduct (

x+ (ε1√a1 + · · ·+ εm

√am)

) (x− (ε1

√a1 + · · ·+ εm

√am)

)is expanded the result is an expression with each radicand the product of twoof the aj . Since no aj is a square modulo p, each such product of two of the ajwill be (since the product of two non-quadratic residues is a quadratic residue).This means that the above product can be expressed as a quadratic polynomialmodulo p with coefficients from {0, 1, . . . , p− 1}. Pairing then the linear factorsof f(x) appropriately leads to the desired factorization modulo p.

For the second argument, we make use of Exercise (4.2)(b) (also, see Exercise(8.4)). We prove by induction that f(x), as defined in (11.3.1), is a productmodulo each prime p of linear and quadratic monic polynomials, with the latterof the form x2 + 2bx+ c for some integers b and c (so that modulo 2 the middlecoefficient is necessarily 0). We start the induction the same, noting the casem = 1 holds and supposing we know that such a factorization holds for m = twhere t is a positive integer and for each prime p. We now fix a prime p and makeuse of the notation in (11.3.1) and in (11.3.3) with m = j = t+1. The inductionhypothesis implies there are monic polynomials u1(x), u2(x), . . . , ur(x) in Z[x],each of degree 1 or 2, and a polynomial v(x) ∈ Z[x] such that

gt+1(x) = u1(x)u2(x) · · ·ur(x) + pv(x).

Hence,

f(x) = gt+1

(x+√at+1

)gt+1

(x−√at+1

)= u1

(x+√at+1

)u1

(x−√at+1

)· · ·ur

(x+√at+1

)ur(x−√at+1

)+ pw(x),

where w(x) is a polynomial with each coefficient a symmetric polynomial in√at+1 and −√at+1 with coefficients in Z. We deduce then that in fact w(x) ∈

Z[x]. By the induction hypothesis, we further may take each uj(x) of one of theforms uj(x) = x+ b and uj(x) = x2 + 2bx+ c where b and c are integers. In thefirst case, we observe that

uj(x+√at+1

)uj(x−√at+1

)= (x+ b)2 − at+1.

Since this is a monic quadratic with even middle term, this is a factor of f(x)of a form we want. In the case that uj(x) = x2 + 2bx+ c, we set d = c− b2 andwrite uj(x) = (x+ b)2 + d. We deduce then that

uj(x+√at+1

)uj(x−√at+1

)=((x+ b+

√at+1)2 + d

)((x+ b−√at+1)2 + d

)=((x+ b)2 − at+1

)2 + 2d((x+ b)2 + at+1

)+ d2.

Observe that the above is of the form h((x+b)2

)where h(x) is a monic quadratic

with an even coefficient for x and a constant term equal to

a2t+1 + 2dat+1 + d2 = (at+1 + d)2.

268

Modulo 2, we see that h(x2) and, hence, h((x + b)2

)factors as a product of

two monic quadratics with the coefficient of x equal to 0 for each quadratic.For odd primes p, Exercise (4.2)(b) implies h(x2) and, hence, h

((x + b)2

)is

reducible modulo p. Note that k is a root of h(x2) modulo p if and only if −kis a root of h(x2) modulo p. Furthermore, 0 is a root of h(x2) if and only ifx2 is a factor of h(x2). Thus, h

((x + b)2

)factors as a product of linear and

quadratic monic polynomials modulo p. Since any integer is congruent to twicean integer modulo an odd prime p, we can take the coefficient of x appearing inany quadratic to be even. The induction argument is therefore complete.

§11.4 The Lattice Base Reduction Algorithm

Lenstra, Lenstra, and Lovasz (1982) showed that it is possible to factor a poly-nomial f(x) =

∑nj=0 ajx

j ∈ Z[x] in polynomial time. If n is the degree off(x) (so an 6= 0) and H is the height of f(x), that is the maximum of |aj |for 0 ≤ j ≤ n, then the quantity n(log2H + log2 n + 2) can be viewed as anupper bound on the length of the input polynomial f(x). A polynomial timealgorithm for factoring f(x) corresponds to an algorithm that runs in time thatis polynomial in n and logH. The previous factoring algorithm we described isnot polynomial as was seen from the example of Swinnerton-Dyer. The mainproblem there (which is notably atypical) is that the polynomial f(x) can factorinto many small irreducible factors modulo every prime p causing us to have toconsider exponentially many possibilities for the mod p reduction of any non-trivial factor of f(x). The algorithm of Lenstra, Lenstra and Lovasz, called thelattice base reduction algorithm or the LLL-algorithm, is an approach for get-ting around having to consider all such mod p reductions and thereby providesa polynomial time algorithm for factoring f(x) over the rationals.

To describe the lattice base reduction algorithm, we turn now to some back-ground on lattices. Let Qn denote the set of vectors 〈a1, a2, . . . , an〉 with aj ∈ Q.For

~b = 〈a1, a2, . . . , an〉 ∈ Q and ~b′ = 〈a′1, a′2, . . . , a′n〉 ∈ Q,

we define the usual dot product ~b ·~b′ by

~b ·~b′ = a1a′1 + a2a

′2 + · · ·+ ana

′n.

Also, we set

‖~b‖ =√a21 + a2

2 + · · ·+ a2n.

Further, we use AT to denote the transpose of a matrix A, so the rows andcolumns of A are the same as the columns and rows of AT , respectively.

Let ~b1, . . . ,~bn ∈ Qn, and let A =(~b1, . . . ,~bn

)be the n × n matrix with

column vectors ~b1, . . . ,~bn. The lattice L generated by ~b1, . . . ,~bn is

L = L(A) = ~b1Z + · · ·+~bnZ.

269

We will be interested mainly in the case that ~b1, . . . ,~bn are linearly independent;in this case, ~b1, . . . ,~bn is called a basis for L. Observe that given L, the value of|detA| is the same regardless of the basis ~b1, . . . ,~bn that is used to describe L.To see this, observe that if ~b′1, . . . ,~b

′n is another basis for L, there are matrices

A and B with integer entries such that(~b1, . . . ,~bn

)AB =

(~b′1, . . . ,

~b′n)B =

(~b1, . . . ,~bn

).

Given that ~b1, . . . ,~bn is a basis for Rn, it follows that AB is the identity matrixand detB = ±1. The second equation above then implies

|det(~b′1, . . . ,

~b′n)| = |det

(~b1, . . . ,~bn

)|.

We set detL to be this common value.Next, we describe the Gram-Schmidt orthogonalization process. Define re-

cursively

~b∗i = ~bi −i−1∑j=1

µij~b∗j , for 1 ≤ i ≤ n,

where

µij = µi,j =~bi ·~b∗j~b∗j ·~b∗j

, for 1 ≤ j < i ≤ n.

Then for each i ∈ {1, . . . , n}, the vectors ~b∗1, . . . ,~b∗i span the same subspace of

Rn as ~b1, . . . ,~bi. In other words,{a1~b∗1 + · · ·+ ai~b

∗i : aj ∈ R for 1 ≤ j ≤ i

}={a1~b1 + · · ·+ ai~bi : aj ∈ R for 1 ≤ j ≤ i

}.

Furthermore, the vectors ~b∗1, . . . ,~b∗n are linearly independent (hence, non-zero)

and pairwise orthogonal (i.e., for distinct i and j, we have ~b∗i ·~b∗j = 0). We leaveverification of these facts as exercises.

We turn next to Hadamard’s inequality. The value of detL can be viewedas the volume of the polyhedron with edges parallel to and the same length as~b1, . . . ,~bn. As indicated by the above remarks, this volume is independent ofthe basis. Geometrically, it is apparent that

detL ≤ ‖~b1‖ ‖~b2‖ · · · ‖~bn‖

(where “apparent” is limited somewhat to the dimensions we can think in). Thisis Hadamard’s inequality. One can also use the vectors ~b∗j to provide a proof inany dimensions as follows. One checks that

det(~b1, . . . ,~bn

)= det

(~b∗1,

~b∗2 + µ21~b∗1, . . . ,

~b∗n +n−1∑j=1

µnj~b∗j

)= det

(~b∗1, . . . ,

~b∗n).

270

Since ~b1, . . . ,~bn is a basis for L, we deduce that

(detL)2 = det((~b∗1, . . . ,~b

∗n)T (~b∗1, . . . ,~b

∗n))

= det

‖~b∗1‖2 0 . . . 0

.... . .

...0 . . . 0 ‖~b∗n‖2

=( n∏i=1

‖~b∗i ‖)2

.

Thus, detL =∏ni=1 ‖~b∗i ‖. So it suffices to show ‖~b∗i ‖ ≤ ‖~bi‖. The orthogonality

of the ~b∗i ’s implies

‖~bi‖2 =∥∥∥∥~b∗i +

i−1∑j=1

µij~b∗j

∥∥∥∥2

= ‖~b∗i ‖2 +i−1∑j=1

µ2ij‖~b∗j‖2.

The sum on the right above is clearly positive so that ‖~b∗i ‖ ≤ ‖~bi‖ follows.The Hadamard inequality provides an upper bound on the value of detL.

Hermite proved that there is a constant cn (depending only on n) such that forsome basis ~b1, . . . ,~bn of L, we have

‖~b1‖ ‖~b2‖ · · · ‖~bn‖ ≤ cn detL.

It is known that cn ≤ nn. To clarify a point, Minkowski has shown that thereexist n linearly independent vectors ~b′1, . . . ,~b

′n in L such that

‖~b′1‖ ‖~b′2‖ · · · ‖~b′n‖ ≤ nn/2 detL,

but ~b′1, . . . ,~b′n is not necessarily a basis for L. Further, we note that the problem

of finding a basis ~b1, . . . ,~bn of L for which ‖~b1‖ · · · ‖~bn‖ is minimal is known tobe NP-hard.

The problem of finding a vector ~b ∈ L with ‖~b‖ minimal is not known to beNP-complete, but it may well be. In any case, no one knows a polynomial timealgorithm for this problem. We note that Lagarias (1985) has, however, provedthat the problem of finding a vector~b ∈ L which minimizes the maximal absolutevalue of a component is NP-hard. Observe that Hermite’s result mentionedabove implies that there is a constant c′n, depending only on n, such that ‖~b‖ ≤c′n

n√

detL. It is possible for a lattice L to contain a vector that is much shorterthan this, but it is known that the best constant c′n for all lattices L satisfies√

n/(2eπ) ≤ c′n ≤√n/(eπ).

The vectors ~b∗j obtained from the Gram-Schmidt orthogonalization processcan be used to obtain a lower bound for the shortest vector in a lattice L. More

271

precisely, we have

(11.4.1) ~b ∈ L, ~b 6= 0 =⇒ ‖~b‖ ≥ min{‖~b∗1‖, ‖~b∗2‖, . . . , ‖~b∗n‖}.

To see this, express ~b in the form

~b = u1~b1 + · · ·+ uk~bk, where each uj ∈ Z and uk 6= 0.

Observe that the definition of the ~b∗j imply then that

~b = v1~b∗1 + · · ·+ vk~b

∗k, for some vj ∈ Q with vk = uk.

In particular, vk is a non-zero integer. We deduce that

‖~b‖2 =(v1~b∗1 + · · ·+ vk~b

∗k

)·(v1~b∗1 + · · ·+ vk~b

∗k

)= v2

1‖~b∗1‖2 + · · ·+ v2k‖~b∗k‖2 ≥ ‖~b∗k‖2,

from which (11.4.1) follows.

§11.5 Reduced Bases and Factoring with LLL

We will want the following important defintion.

Definition 11.5.1. Let ~b1, . . . ,~bn be a basis for a lattice L and ~b∗1, . . . ,~b∗n the

corresponding basis for Rn obtained from the Gram-Schmidt orthogonalizationprocess, with µij as defined before. Then ~b1, . . . ,~bn is said to be reduced if bothof the following hold

(i) ‖µij‖ ≤12

for 1 ≤ j < i ≤ n

(ii) ‖~b∗i + µi,i−1~b∗i−1‖2 ≥

34‖~b∗i−1‖2 for 1 < i ≤ n.

The main work of Lenstra, Lenstra, and Lovasz (1982) establishes an algorithmthat runs in polynomial time which constructs a reduced basis of L from anarbitrary basis ~b1, . . . ,~bn of L. Our main goal below is to explain how such areduced basis can be used to factor a polynomial f(x) in polynomial time. Wewill need to describe the related lattice and an initial basis for it. We begin,however, with some properties of reduced bases.

Let~b1, . . . ,~bn be a reduced basis for a lattice L and~b∗1, . . . ,~b∗n the correspond-

ing basis for Rn obtained from the Gram-Schmidt orthogonalization process withµij as before. The argument for (11.4.1) can be modified to show that

(11.5.1) ~b ∈ L, ~b 6= 0 =⇒ ‖~b1‖ ≤ 2(n−1)/2‖~b‖.

272

In particular, the above inequality holds for the shortest vector ~b ∈ L. To prove(11.5.1), observe that (i) and (ii) imply

‖~b∗i ‖2 +14‖~b∗i−1‖2 ≥ ‖~b∗i ‖2 + µ2

i,i−1‖~b∗i−1‖2 = ‖~b∗i + µi,i−1~b∗i−1‖2 ≥

34‖~b∗i−1‖2.

Hence, ‖~b∗i ‖2 ≥ (1/2)‖~b∗i−1‖2. We deduce that

(11.5.2) ‖~b∗i ‖2 ≥1

2i−j‖~b∗j‖2 for 1 ≤ j < i ≤ n.

Defining k as in the proof of (11.4.1) and following the argument there, weobtain ‖~b‖2 ≥ ‖~b∗k‖2. Hence,

‖~b‖2 ≥ ‖~b∗k‖2 ≥1

2k−1‖~b∗1‖2 ≥

12n−1

‖~b∗1‖2 =1

2n−1‖~b1‖2,

where the last equation makes use of ~b∗1 = ~b1. Thus, (11.5.1) follows.Recall that

‖~bi‖2 = ‖~b∗i ‖2 +i−1∑j=1

µ2ij‖~b∗j‖2.

From (i) and (11.5.2), we obtain

‖~bi‖2 ≤ ‖~b∗i ‖2 +14

i−1∑j=1

‖~b∗j‖2 ≤ ‖~b∗i ‖2 +14

i−1∑j=1

2i−j‖~b∗i ‖2

=(

1 +14

(2i − 2))‖~b∗i ‖2 ≤ 2i−1‖~b∗i ‖2.

Using (11.5.2) again, we deduce

(11.5.3) ‖~bj‖2 ≤ 2j−1‖~b∗j‖2 ≤ 2i−1‖~b∗i ‖2 for 1 ≤ j ≤ i ≤ n.

We show now the following improvement of (11.5.1). Let ~x1, ~x2, . . . , ~xt be tlinearly independent vectors in L. Then

(11.5.4) ‖~bj‖ ≤ 2(n−1)/2 max{‖~x1‖2, ‖~x2‖2, . . . , ‖~xt‖2} for 1 ≤ j ≤ t.

For each 1 ≤ j ≤ t, define a positive integer m(j) and integers uji by

~xj =m(j)∑i=1

uji~bi, ujm(j) 6= 0.

By reordering the ~xj , we may suppose further that m(1) ≤ m(2) ≤ · · · ≤ m(t).The linear independence of the ~xj implies that m(j) ≥ j for 1 ≤ j ≤ t. Theproof of (11.4.1) implies here that

‖~xj‖ ≥ ‖~b∗m(j)‖ for 1 ≤ j ≤ t.

273

From (11.5.3), we deduce

‖~bj‖2 ≤ 2m(j)−1‖~b∗m(j)‖2 ≤ 2m(j)−1‖~xj‖2 ≤ 2n−1‖~xj‖2 for 1 ≤ j ≤ t.

The inequality in (11.5.4) now follows.

Recall that detL =∏ni=1 ‖~b∗i ‖. We obtain from (11.5.3) that

n∏i=1

‖~bi‖2 ≤n∏i=1

2i−1‖~b∗i ‖2 ≤ 2n(n−1)/2n∏i=1

‖~b∗i ‖2 = 2n(n−1)/2(detL)2.

Thus, from Hadamard’s inequality, we obtain

2−n(n−1)/4‖~b1‖ ‖~b2‖ · · · ‖~bn‖ ≤ detL ≤ ‖~b′1‖ ‖~b′2‖ · · · ‖~b′n‖

for any basis ~b′1, . . . ,~b′n of L. Recall that finding a basis ~b′1, . . . ,~b

′n for which the

product on the right is minimal is NP-hard. The above implies that a reducedbasis is close to being such a basis. We also note that Hermite’s inequalitymentioned earlier is a consequence of the above inequality.

Suppose now that we want to factor a non-zero polynomial f(x) ∈ Z[x]. Letp be a prime, and consider a monic irreducible factor h(x) of f(x) modulo pk

(obtained say through Berlekamp’s algorithm and Hensel lifting). Now, let h0(x)denote an irreducible factor of f(x) in Z[x] such that h0(x) is divisible by h(x)modulo pk. Note that h0(x) being irreducible in Z[x] implies that the content ofh0(x) (the greatest common divisor of its coefficients) is 1. Our goal here is toshow how one can determine h0(x) without worrying about other factors of f(x)modulo pk (to avoid the difficulty suggested by Swinnerton-Dyer’s example).

We describe a lattice for this approach. Let ` = deg h. We need only considerthe case that ` < n. Fix an integer m ∈ {`, `+1, . . . , n−1}. We will successivelyconsider such m beginning with ` and working our way up until we find h0(x).In the end, m will correspond to the degree of h0(x); and if no such h0(x) isfound for ` ≤ m ≤ n − 1, then we can deduce that f(x) is irreducible. Weassociate with each polynomial

w(x) = amxm + · · ·+ a1x+ a0 ∈ Z[x],

a vector ~b = 〈a0, a1, . . . , am〉 ∈ Zm+1. Observe that ‖~b‖ = ‖w(x)‖. Let L be thelattice in Zm+1 spanned by the vectors associated with

wj(x) =

{pkxj−1 for 1 ≤ j ≤ `h(x)xj−`−1 for `+ 1 ≤ j ≤ m+ 1.

It is not difficult to see that these vectors form a basis. Furthermore, the poly-nomials associated with the vectors in L correspond precisely to the polynomialsin Z[x] of degree ≤ m that are divisible by h(x) modulo pk. In particular, ifm ≥ deg h0, the vector, say ~b0, associated with h0(x) is in L. Observe thatif k is large enough and deg h0 > `, the coefficients of h(x) are “presumably”

274

large and the value of ‖~b0‖ is “seemingly” small. We will show that in fact ifk is large enough and m = deg h0 and ~b1, . . . ,~bm+1 is a reduced basis for L,then ~b#1 = ~b0, where ~b#1 corresponds to the vector obtained by dividing thecomponents of ~b1 by the greatest common divisor of these components (i.e.,the polynomial associated with ~b#1 is the polynomial associated with ~b1 with itscontent removed).

The lattice L seemingly has little to do with f(x) as its definition onlydepends on h(x). Fix h0(x) as above. We show that if k is large enough, thenh0(x) is the only irreducible polynomial in Z[x] which is associated with a shortvector in L. For this purpose, suppose g0(x) is an irreducible polynomial in Z[x]divisible by h(x) but different from h0(x) and that R is the resultant of h0(x)and g0(x). Note that since h0(x) and g0(x) are irreducible in Z[x], we haveR 6= 0. The definition of the resultant implies that if R is large, then ‖g0(x)‖must be large (since we are viewing h0(x) as fixed). So suppose R is not large.There are polynomials u(x) and v(x) in Z[x] such that

h0(x)u(x) + g0(x)v(x) = R.

We wish to take advantage now of the fact that the left-hand side above isdivisible by h(x) modulo pk, but at the same time we want to keep in mind thatunique factorization does not exist modulo pk. Since h(x) is monic of degree` ≥ 1, the left-hand side is of the form h(x)w(x) modulo pk where we can noweasily deduce that every coefficient of w(x) is divisible by pk. This implies pk|R.Hence, given k is large, we deduce R is large, giving us the desired conclusionthat ‖g0(x)‖ is large.

The above argument does more. If m = deg h0(x) and ~b ∈ L, then viewingg0(x) ∈ L as the polynomial associated with ~b, we deduce from the above thateither ‖g0(x)‖ is large or R = 0. In the latter case, since h0(x) is irreducibleand deg g0 ≤ m = deg h0, we obtain that ~b# = ~b0.

How large is large? We take ~b = ~b1 above (i.e., g(x) is the polynomialassociated with the first vector in a reduced basis). Recall that Theorem 11.2.1gives

‖h0(x)‖ ≤ 2m‖f(x)‖.

On the other hand, by considering the vectors associated with g0(x) (that is,~b1) and h0(x) in L ⊆ Zm+1, we deduce from (11.5.1) that

‖g0(x)‖ ≤ 2m/2‖h0(x)‖.

Thus,‖g0(x)‖ ≤ 23m/2‖f(x)‖.

We want this bound on ‖g0(x)‖ to assure that R = 0 so that ~b# = ~b0.

To see how large pk needs to be, we recall the Sylvester form of the resultantgiven by (2.2.1). We are interested in the resultant R of g0(x) and h0(x) where

275

we may suppose that deg g0 ≤ m and deg h0 ≤ m. From Hadamard’s inequalityand Theorem 11.2.1, we deduce

|R| ≤ ‖g0(x)‖m‖h0(x)‖m ≤ ‖g0(x)‖m(2m‖f(x)‖

)m = 2m2‖g0(x)‖m‖f(x)‖m.

Our upper bound on ‖g0(x)‖ now implies

|R| ≤ 25m2/2‖f(x)‖2m.

Hence, we see that ifpk > 25m2/2‖f(x)‖2m,

then the vector ~b1 in a reduced basis for the lattice L, where m = deg h0(x), issuch that the polynomial corresponding to ~b#1 is h0(x).

§11.6 Sparse Polynomial Computations

In the previous sections of this chapter, we explored algorithms for factoringpolynomials with an interest in them as irreducibility tests for polynomials inZ[x]. We began with an approach due to Zassenhaus (1969). An example dueto Swinnerton-Dyer shows that this algorithm can take time that is exponentialin the degree of the input polynomial f(x). In general, this algorithm is nev-ertheless quite practical as examples of f(x) for which the running time of thealgorithm is exponential in deg f are rare. We then explored an algorithm dueto Lenstra, Lenstra, and Lovasz (1982) which can be shown to have runningtime that is polynomial in deg f as well as the height of f(x) (the maximumof the absolute values of the coefficients of f(x)). In this section, we describean algorithm given by Filaseta, Granville, and Schinzel (2008) for determiningwhether a non-reciprocal polynomial f(x) is irreducible. Take particular notethat this algorithm requires the input polynomial f(x) to be non-reciprocal.This algorithm is significant when a polynomial is sparse. The algorithm hasrunning time that is almost linear in log deg f , but its dependence on the num-ber of terms of f(x) and the height of f(x) is not so good. In particular, thedependence on the number of terms of f(x) is worse than exponential. On theother hand, if one considers all polynomials in Z[x] with a fixed bound on thenumber of terms and a fixed bound on the height of f(x), then the runningtime of the algorithm will just depend on the deg f and will be close to linearin log deg f . More precisely, we have the following result of Filaseta, Granville,and Schinzel (2008).

Theorem 11.6.1. Let f(x) =∑rj=0 ajx

dj ∈ Z[x] with each aj 6= 0 and with

0 = d0 < d1 < · · · < dr−1 < dr = n.

Suppose r ≥ 1 and n ≥ 16. Let H = H(f) denote the height of f(x), so H =max0≤j≤r{|aj |}. Then there is a constant c1 = c1(r,H) such that an algorithm

276

exists for determining whether a given non-reciprocal polynomial f(x) ∈ Z[x] asabove is irreducible and that runs in time O

(c1 log n (log log n)2 log log log n

).

The algorithm for Theorem 11.6.1 also provides some information on thefactorization of f(x) in the case that f(x) is reducible (with the same runningtime). Specifically, we have the following:

(i) If f(x) has a cyclotomic factor, then the algorithm will detect this andoutput an m ∈ Z+ such that the cyclotomic polynomial Φm(x) dividesf(x).

(ii) If f(x) does not have a cyclotomic factor but has a non-constant reciprocalfactor, then the algorithm will produce such a factor. In fact, the algorithmwill produce a reciprocal factor of f(x) of maximal degree.

(iii) Otherwise, if f(x) is reducible, then the algorithm outputs a completefactorization of f(x) as a product of irreducible polynomials over Q.

These can in fact be viewed as basic parts of the algorithm. First, we will checkif f(x) has a cyclotomic factor by making use of the algorithm for Theorem 6.9.1.If it does, the algorithm will produce m as in (i) and stop. If it does not, thenthe algorithm will check if f(x) has a non-cyclotomic non-constant reciprocalfactor. If it does, then the algorithm will produce such a factor as in (ii) andstop. If it does not, then the algorithm will output a complete factorization off(x) as indicated in (iii).

For (ii), we will make use of another algorithm for computing the greatestcommon divisor of two polynomials in Z[x], that is for computing gcdZ(f(x), g(x))as defined in Section 10.2. Here, we will have the added condition that at leastone of the two polynomials is not divisible by a cyclotomic polynomial, which ingeneral can be checked by making use of the algorithm in Theorem 6.9.1. Thisresult, also due to Filaseta, Granville, and Schinzel (2008), is as follows.

Theorem 11.6.2. There is an algorithm which takes as input two polynomialsf(x) and g(x) in Z[x], each of degree ≤ n and height ≤ H and having ≤ r + 1nonzero terms, with at least one of f(x) and g(x) free of cyclotomic factors,and outputs the value of gcdZ(f(x), g(x)) and runs in time O

(c2 log n

)for some

constant c2 = c2(r,H).

In discussing the above theorems in this section, we describe the algorithmswhich lead to the proofs but do not address details of the running times. Thereader should consult the work of Filaseta, Granville, and Schinzel (2008) forfurther details on the running time estimates.

The proof of Theorem 11.6.2 result relies heavily on a result Bombieri andZannier described in an appendix by the latter in Schinzel (2000b). Alterna-tively, one can use the later work of Bombieri, Masser, and Zannier (2007). Asa consequence of their either of these, we have the following.

277

Theorem 11.6.3. Let

F (x1, . . . , xk), G(x1, . . . , xk) ∈ Q[x1, . . . , xk]

be two coprime polynomials. There exists an effectively computable numberB(F,G) with the following property. If −→u = 〈u1, . . . , uk〉 ∈ Zk, ξ 6= 0 is al-gebraic and

F (ξu1 , . . . , ξuk) = G(ξu1 , . . . , ξuk) = 0,

then either ξ is a root of unity or there exists a nonzero vector −→v ∈ Zk havingcomponents bounded in absolute value by B(F,G) and orthogonal to −→u .

Our proof of Theorem 11.6.2 has similarities to an application of Theo-rem 11.6.3 by Schinzel (1999c, 2000b). In particular, we make use of the follow-ing lemma which is Corollary 6 in Appendix E of Schinzel (2000b). A proof isgiven there.

Lemma 11.6.4. Let ` be a positive integer and −→v ∈ Z` with −→v nonzero. Thelattice of vectors −→u ∈ Z` orthogonal to −→v has a basis −→v1 ′,−→v2 ′, . . . ,−−→v`−1

′ suchthat the maximum absolute value of a component of any vector −→vj ′ is boundedby `/2 times the maximum absolute value of a component of −→v .

We make use of the notation Or,H(α(n)

)to denote a function which has

absolute value bounded by Cα(n) for some constant C > 0 and for n sufficientlylarge. Another result of use to us is the following.

Lemma 11.6.5. There is an algorithm with the following property. Given anr × t integral matrix M = (mij) of rank t ≤ r and max{|mij |} = Or,H(1)and given an integral vector

−→d = 〈d1, . . . , dr〉 with max{|dj |} = Or,H(n), the

algorithm determines whether there is an integral vector −→v = 〈v1, . . . , vt〉 forwhich d1

...dr

= M

v1...vt

holds, and if such a −→v exists, the algorithm outputs the solution vector −→v .Furthermore, max{|vj |} = Or,H(n) and the algorithm runs in time Or,H(log n).

There are a variety of ways we can determine if−→d = M−→v has a solution and

to determine the solution if there is one within the required time Or,H(log n).We use Gaussian elimination to explain an approach. Again, we refer to readerto Filaseta, Granville, and Schinzel (2008) for details on the running time. Per-forming elementary row operations on M and multiplying by entries from thematrix as one proceeds to use only integer arithmetic allows us to rewrite Min the form of an r × t matrix M ′ = (m′ij) with each m′ij ∈ Z and the firstt rows of M ′ forming a t × t diagonal matrix with nonzero integers along thediagonal. We perform the analogous row operations and integer multiplicationson the vector

−→d = 〈d1, d2, . . . , dr〉 to solve

−→d = M−→v for −→v . We are thus left

278

with an equation of the form−→d′ = M ′−→v where the entries of M ′ are integers

that are Or,H(1) and the components of−→d′ = 〈d′1, d′2, . . . , d′r〉 are integers that

are Or,H(n). For each j ∈ {1, 2, . . . , t}, we check if d′j ≡ 0 (mod m′jj). If forsome j ∈ {1, 2, . . . , t} we have d′j 6≡ 0 (mod m′jj), then a solution to the original

equation−→d = M−→v , if it exists, must be such that vj 6∈ Z. In this case, an

integral vector −→v does not exist. Now, suppose instead that d′j ≡ 0 (mod m′jj)for every j ∈ {1, 2, . . . , t}. Then we divide d′j by m′jj to determine the vector −→v .

This vector may or may not be a solution to the equation−→d = M−→v . We check

whether it is by a direct computation. If it is not a solution to the equation−→d = M−→v , then there are no solutions to the equation. Otherwise, −→v is anintegral vector satisfying

−→d = M−→v , and we output the vector −→v . This finishes

our explanation for Lemma 11.6.5.

We also make use of the following notation. For a polynomial

F(x1, . . . , xr, x

−11 , . . . , x−1

r

),

in the variables x1, . . . , xr and their reciprocals x−11 , . . . , x−1

r , we define

J F = xu11 · · ·xur

r F(x1, . . . , xr, x

−11 , . . . , x−1

r

),

where each uj is an integer chosen as small as possible so that J F is a polynomialin x1, . . . , xr. In the way of examples, if

F = x2 + 4x−1y + y3 and G = 2xyw − x2z−3w − 12w,

thenJ F = x3 + 4y + xy3 and J G = 2xyz3 − x2 − 12z3.

In particular, note that although w is a variable in G, the polynomial J G doesnot involve w. We call a multi-variable polynomial F (x1, . . . , xr) ∈ Q[x1, . . . , xr]reciprocal if

J F(x−1

1 , . . . , x−1r

)= ±F (x1, . . . , xr).

For example, x1x2− x1− x2 + 1 and x1x2− x3x4 are reciprocal. Note that thisis consistent with our definition of a reciprocal polynomial f(x) ∈ Z[x].

For our proof of Theorem 11.6.2, we can suppose that f(x) does not havea cyclotomic factor and do so. We consider only the case that f(0)g(0) 6= 0as computing gcdZ(f(x), g(x)) can easily be reduced to this case by initiallyremoving an appropriate power of x from each of f(x) and g(x). This wouldneed to be followed up by possibly multiplying by a power of x after our gcdcomputation.

We furthermore only consider the case that the content of f(x), that isthe greatest common divisor of its coefficients, and the content of g(x) are 1.Otherwise, we simply divide by the contents before proceeding and then multiplythe final result by the greatest common divisor of the two contents.

279

We express our two polynomials in the form

f(x) =k∑j=0

ajxdj and g(x) =

k∑j=0

bjxdj ,

where above we have possibly extended the lists of exponents and coefficientsdescribing f(x) and g(x) so that the exponent lists are identical and the co-efficient lists are allowed to include coefficients which are 0. We do this insuch a way that |aj | + |bj | 6= 0 for each j ∈ {0, 1, . . . , k}. Also, we take0 = d0 < d1 < · · · < dk−1 < dk. Thus, d0 = 0, a0b0 6= 0 and k ≤ 2r.

Let w(x) denote gcdZ(f(x), g(x)). We will apply Theorem 11.6.3 to constructtwo finite sequences of polynomials in several variables Fu and Gu with integercoefficients and a corresponding finite sequence of vectors

−→d (u) that will enable

us to determine a polynomial in Z[x] that has the common zeros, to the correctmultiplicity, of f(x) and g(x). This then will allow us to compute w(x).

Let ξ be a zero of w(x), if it exists. Observe that ξ 6= 0, and since ξ is a zeroof f(x) which has no cyclotomic factors, we have ξ is not a root of unity. Sinceξ is a common zero of f(x) and g(x), we have

k∑j=0

ajξdj =

k∑j=0

bjξdj = 0.

We recursively construct Fu, Gu and−→d (u), for 0 ≤ u ≤ s, where s is to be

determined, beginning with

F0 = F0(x1, . . . , xk) = a0 +k∑j=1

ajxj ,

G0 = G0(x1, . . . , xk) = b0 +k∑j=1

bjxj ,

(11.6.1)

and−→d (0) = 〈d1, d2, . . . , dk〉. As u increases, the number of variables defining Fu

and Gu will decrease. The value of s then will be ≤ k. Observe that

F0(xd1 , . . . , xdk) = f(x) and G0(xd1 , . . . , xdk) = g(x).

We deduce that F0 and G0, being linear, are coprime in Q[x1, . . . , xk] and that

(11.6.2) F0(ξd1 , . . . , ξdk) = G0(ξd1 , . . . , ξdk) = 0.

Now, suppose for some u ≥ 0 that nonzero polynomials Fu and Gu inZ[x1, . . . , xku

] and a vector−→d (u) = 〈d(u)

1 , . . . , d(u)ku〉 ∈ Zku have been determined,

where ku < ku−1 < · · · < k0 = k. Furthermore, suppose that Fu and Gu arecoprime in Q[x1, . . . , xku ] and that we have at least one zero ξ of w(x) such that

(11.6.3) Fu(ξd

(u)1 , . . . , ξd

(u)ku

)= Gu

(ξd

(u)1 , . . . , ξd

(u)ku

)= 0.

280

In particular, ξ 6= 0 and ξ is not a root of unity. Note that the d(u)j may be

negative. We will require

(11.6.4) J Fu(xd

(u)1 , . . . , xd

(u)ku

)| f(x) and J Gu

(xd

(u)1 , . . . , xd

(u)ku

)| g(x).

Observe that J Fu(xd

(u)1 , . . . , xd

(u)ku

)and f(x) are in Z[x]. We take (11.6.4) to

mean that there is a polynomial h(x) ∈ Z[x] such that

f(x) = h(x) · J Fu(xd

(u)1 , . . . , xd

(u)ku

)with an analogous equation holding for g(x) and J Gu

(xd

(u)1 , . . . , xd

(u)ku

). In par-

ticular, we want J Fu(xd

(u)1 , . . . , xd

(u)ku

)and J Gu

(xd

(u)1 , . . . , xd

(u)ku

)to be nonzero.

Note that these conditions which are being imposed on Fu and Gu are satisfiedfor u = 0 provided w(x) is not constant. For 0 ≤ u < s, we describe next howto recursively construct Fu+1 and Gu+1 having analogous properties.

There is a computable bound B(Fu, Gu) as described in Theorem 11.6.3.We deduce that there is a nonzero vector −→v = 〈v1, v2, . . . , vku〉 ∈ Zku suchthat each |vi| ≤ B(Fu, Gu) and −→v is orthogonal to

−→d (u). From Lemma 11.6.4,

there is a ku × (ku − 1) matrix M with each entry of M having absolute value≤ kuB(Fu, Gu)/2 and such that

−→d (u) =M−→v (u) for some −→v (u) ∈ Zku−1, where

we view the vectors as column vectors. We define integers mij (written alsomi,j) and v

(u)j , depending on u, by the conditions

M =

m11 · · · m1,ku−1

.... . .

...mku1 · · · mku,ku−1

and −→v (u) = 〈v(u)1 , . . . , v

(u)ku−1〉.

The relationsxi = ymi1

1 · · · ymi,ku−1ku−1 , for 1 ≤ i ≤ ku,

transform the polynomials Fu(x1, . . . , xku) and Gu(x1, . . . , xku

) into polynomi-als in some, possibly all, of the variables y1, . . . , yku−1. These new polynomialswe call Fu and Gu, respectively. More precisely, we define

(11.6.5) Fu(y1, . . . , yku−1) = J Fu(ym111 · · · ym1,ku−1

ku−1 , . . . , ymku11 · · · ymku,ku−1

ku−1

)and

(11.6.6) Gu(y1, . . . , yku−1) = J Gu(ym111 · · · ym1,ku−1

ku−1 , . . . , ymku11 · · · ymku,ku−1

ku−1

).

The polynomials Fu and Gu will depend on the matrixM so that there may bemany choices for Fu and Gu for each Fu and Gu. We need only consider one suchFu and Gu and do so. Note that this still may require considering various Muntil we find one for which

−→d (u) =M−→v (u) is satisfied for some −→v (u) ∈ Zku−1.

281

The equation−→d (u) =M−→v (u) implies that for some integers ef (u) and eg(u) we

have

(11.6.7) Fu(xv

(u)1 , . . . , xv

(u)ku−1

)= xef (u)Fu

(xd

(u)1 , . . . , xd

(u)ku

)and

(11.6.8) Gu(xv

(u)1 , . . . , xv

(u)ku−1

)= xeg(u)Gu

(xd

(u)1 , . . . , xd

(u)ku

).

In particular, Fu and Gu are nonzero. Also,

(11.6.9) J Fu(xv

(u)1 , . . . , xv

(u)ku−1

)| f(x) and J Gu

(xv

(u)1 , . . . , xv

(u)ku−1

)| g(x).

Furthermore, with ξ as in (11.6.3), we have

Fu(ξv

(u)1 , . . . , ξv

(u)ku−1

)= Gu

(ξv

(u)1 , . . . , ξv

(u)ku−1

)= 0.

The idea is to suppress the variables, if they exist, which do not occur in Fuand Gu and the corresponding components of −→v (u) to obtain the polynomialsFu+1 and Gu+1 and the vector

−→d (u+1) for our recursive construction. However,

there is one other matter to consider. The polynomials Fu and Gu may not becoprime, and we require Fu+1 and Gu+1 to be coprime. Hence, we adjust thisidea slightly.

Let

(11.6.10) Du = Du(y1, . . . , yku−1) = gcdZ(Fu,Gu) ∈ Z[y1, . . . , yku−1].

Recall that f(0)g(0) 6= 0. Hence, (11.6.7), (11.6.8) and (11.6.9) imply that

J Du

(xv

(u)1 , . . . , xv

(u)ku−1

)divides gcdZ(f, g) in Z[x]. We define

(11.6.11) Fu+1 =Fu(y1, . . . , yku−1)Du(y1, . . . , yku−1)

and Gu+1 =Gu(y1, . . . , yku−1)Du(y1, . . . , yku−1)

,

and set ku+1 ≤ ku−1 to be the total number of variables y1, . . . , yku−1 appearingin Fu+1 and Gu+1. Note that Fu+1 and Gu+1 are coprime and that (11.6.4)holds with u replaced by u+ 1 and the appropriate change of variables.

We describe next how the recursive construction will end. Suppose we havejust constructed Fu, Gu and

−→d (u) and proceed as above to the next step of

constructing Fu+1, Gu+1 and−→d (u+1). At this point, Du−1 will have been de-

fined but not Du. We want to find M and a −→v (u) such that−→d (u) = M−→v (u)

where M is a ku × (ku − 1) matrix with entries bounded in absolute value bykuB(Fu, Gu)/2. So we compute B(Fu, Gu) and the bound kuB(Fu, Gu)/2 onthe absolute values of the entries of M. We consider such M and apply thealgorithm of Lemma 11.6.5 to see if there is an integral vector −→v (u) for which−→d (u) =M−→v (u). Once such anM and −→v (u) are found, we can proceed with theconstruction of Fu+1 and Gu+1 given above. On the other hand, it is possible

282

that no such M and −→v (u) will be found. Given Theorem 11.6.3, this will bethe case only if the supposition that (11.6.3) holds for some zero ξ of w(x) isincorrect. In particular, (11.6.3) does not hold for some zero ξ of w(x) if Fuand Gu are coprime polynomials in < 2 variables (i.e., ku ≤ 1), but it is alsopossible that (11.6.3) does not hold for some u with Fu and Gu polynomials in≥ 2 variables (i.e., ku ≥ 2). Given thatM is a ku× (ku−1) matrix, we considerit to be vacuously true that no M and −→v (u) exist satisfying

−→d (u) =M−→v (u) in

the case that ku ≤ 1. If no such M and −→v (u) exist, we consider the recursiveconstruction of the polynomials Fu and Gu complete and set s = u. We willwant the values of Du for every 1 ≤ u ≤ s− 1, so we save these as we proceed.

The motivation discussed above can be summarized into a procedure fordescribing the algorithm associated with Theorem 11.6.2. Beginning with F0

and G0 as in (11.6.1) and−→d (0) = 〈d1, . . . , dk〉, we construct the multi-variable

polynomials Fu and Gu and vectors−→d (u) = 〈d(u)

1 , . . . , d(u)ku〉 ∈ Zku recursively.

Given Fu, Gu and−→d (u), we compute B(Fu, Gu) and search for a ku × (ku −

1) matrix M with integer entries having absolute value ≤ kuB(Fu, Gu)/2 forwhich

−→d (u) = M−→v (u) is solvable with −→v (u) = 〈v(u)

1 , . . . , v(u)ku−1〉 ∈ Zku−1. We

check for solvability and determine the solution −→v (u) if it exists by using thealgorithm in Lemma 11.6.5. If no such M and −→v (u) exist, then we set s = uand stop our construction. Otherwise, once such an M = (mij) and −→v (u) aredetermined, we define Fu+1 and Gu+1 using (11.6.5), (11.6.6), (11.6.10) and(11.6.11). After using (11.6.11) to construct Fu+1 and Gu+1, we determine thevariables y1, . . . , yku−1 which occur in Fu+1 and Gu+1 and define

−→d (u+1) as the

vector with corresponding components from v(u)1 , . . . , v

(u)ku−1; in other words, if yj

is the ith variable occurring in Fu+1 and Gu+1, then v(u)j is the ith component

of−→d (u+1).One important aspect of the construction is that when we divide by Du to

construct Fu+1 and Gu+1, we obtain not simply that J Du

(xv

(u)1 , . . . , xv

(u)ku−1

)divides gcdZ(f, g) in Z[x] but also

(11.6.12)u∏j=0

J Dj

(xv

(j)1 , . . . , x

v(j)kj−1

)divides gcdZ(f, g) in Z[x].

This can be seen inductively by observing that

(11.6.13) J Fu(xv

(u)1 , . . . , xv

(u)ku−1

)=

f(x)u−1∏j=0

J Dj

(xv

(j)1 , . . . , x

v(j)kj−1

)and

(11.6.14) J Gu(xv

(u)1 , . . . , xv

(u)ku−1

)=

g(x)u−1∏j=0

J Dj

(xv

(j)1 , . . . , x

v(j)kj−1

) .

283

The algorithm for Theorem 11.6.2 ends by making use of the identity

(11.6.15) gcdZ(f(x), g(x)

)=

s−1∏u=0

J Du

(xv

(u)1 , . . . , xv

(u)ku−1

).

We justify (11.6.15). Recall that we have denoted the left side by w(x). Observethat (11.6.12) implies that the expression on the right of (11.6.15) divides w(x).By the definition of s, when we arrive at u = s in our recursive construction,(11.6.3) fails to hold for every zero ξ of w(x). Therefore, taking u = s − 1 in(11.6.11), (11.6.13) and (11.6.14)) implies that the right-hand side of (11.6.15)vanishes at all the zeros of w(x) and to the same multiplicity. As noted earlier,we are considering the case that the contents of f(x) and g(x) are 1. We deducethat (11.6.15) holds.

We are now left with considering the case that f(x) does not have any non-constant reciprocal factor. For this part, we make use of the a weakened form ofa result of Schinzel (1970a). Before stating this result, we give some background.

Similar to the approach just used for computing the greatest common divisorof two sparse polynomials, we set

F (x1, . . . , xr) = arxr + · · ·+ a1x1 + a0 ∈ Z[x1, . . . , xr],

with the plan of connecting the factorization of f(x) = F (xd1 , xd2 , . . . , xdr ) withthe factorization of a multi-variable polynomial of the form

J F(ym111 · · · ym1t

t , . . . , ymr11 · · · ymrt

t

),

where the number of variables t is ≤ r and mij ∈ Z for 1 ≤ i ≤ r and 1 ≤ j ≤ t.The above multi-variable polynomial can be expressed as

yu11 · · · y

utt F (ym11

1 · · · ym1tt , . . . , ymr1

1 · · · ymrtt ),

whereuj = −min{m1j ,m2j , . . . ,mrj , 0} for 1 ≤ j ≤ t.

To make the connection with the factorization of f(x), we want the matrixM = (mij) to be such that

(11.6.16)

d1

...dr

= M

v1...vt

for some integers v1, v2, . . . , vt. In this way, the substitution yj = xvj for 1 ≤j ≤ t takes any factorization

yu11 · · · y

utt F (ym11

1 · · · ym1tt , . . . , ymr1

1 · · · ymrtt )

= F1(y1, . . . , yt) · · ·Fs(y1, . . . , yt)(11.6.17)

284

in Z[y1, . . . , yt] into the form

xu1v1+···+utvtF (xd1 , xd2 , . . . , xdr )

= F1(xv1 , . . . , xvt) · · ·Fs(xv1 , . . . , xvt).(11.6.18)

We restrict our attention to factorizations in (11.6.17) where the Fi(y1, . . . , yt)are non-constant. We will be interested in the case that s is maximal; in otherwords, we will want the right-hand side of (11.6.17) to be a complete factoriza-tion of the left-hand side of (11.6.17) into irreducibles over Q. For achieving theresults in this paper, we want some algorithm for obtaining such a complete fac-torization of multi-variable polynomials; among the various sources for this, wenote the work by Lenstra (1987) provides such an algorithm. For the moment,though, we need not take s maximal.

Since f(x) = F (xd1 , xd2 , . . . , xdr ), the above describes a factorization off(x), except that we need to take some caution as some vj may be negative sothe expressions Fi(xv1 , . . . , xvt) may not be polynomials in x. For 1 ≤ i ≤ s,define wi as the integer satisfying

(11.6.19) J Fi(xv1 , . . . , xvt) = xwiFi(xv1 , . . . , xvt).

We obtain from (11.6.18) that

xu1v1+···+utvt+w1+···+wsf(x) =s∏i=1

xwiFi(xv1 , . . . , xvt).

The definition of wi implies that this product is over polynomials in Z[x] thatare not divisible by x. The conditions a0 6= 0 and d0 = 0 imposed on f(x) inthe introduction imply that f(x) is not divisible by x. Hence, the exponent ofx appearing on the left must be 0, and we obtain the factorization

(11.6.20) f(x) =s∏i=1

xwiFi(xv1 , . . . , xvt) =s∏i=1

J Fi(xv1 , . . . , xvt).

The factorization given in (11.6.20) is crucial to our algorithm. As we areinterested in the case that f(x) has no non-constant reciprocal factor, we re-strict our attention to this case. From (11.6.20), we see that the polynomialsxwiFi(xv1 , . . . , xvt) cannot have a non-constant reciprocal factor. There are,however, still two possibilities that we need to consider for each i ∈ {1, 2, . . . , s}:

(i′) Fi(y1, . . . , yt) is reciprocal.

(ii′) J Fi(xv1 , . . . , xvt) ∈ Z.

Although we will not need to know a connection between (i′) and (ii′), we showhere that if (i′) holds for some i, then (ii′) does as well. We consider then thepossibility that

(11.6.21) J Fi(y−11 , . . . , y−1

t

)= ±Fi(y1, . . . , yt).

285

In other words, suppose that

(11.6.22) ye11 · · · yett Fi

(y−11 , . . . , y−1

t

)= ±Fi(y1, . . . , yt),

where ej = ej(i) is the degree of Fi(y1, . . . , yt) as a polynomial in yj . Substi-tuting yj = xvj into (11.6.22), we obtain

(11.6.23) xwi+e1v1+···+etvtFi(x−v1 , . . . , x−vt

)= ±xwiFi

(xv1 , . . . , xvt

).

By the definition of wi, the polynomial on the right does not vanish at 0. Assume(ii′) does not hold. Let α be a zero of this polynomial. Then substitutingx = 1/α into (11.6.23) shows that 1/α is also a zero. On the other hand, wehave already demonstrated in (11.6.20) that the right-hand side of (11.6.23) is afactor of f(x). This contradicts that f(x) has no non-constant reciprocal factor.Hence, (ii′) holds.

As noted earlier, we make use of a special case of a result due to Schinzel(1970a). In particular, the more general result implies that the above idea canin fact always be used to factor f(x) if f(x) has two nonreciprocal irreduciblefactors. In other words, there exist a matrix M and vj satisfying (11.6.16) anda factorization of the form (11.6.17) that leads to a non-trivial factorization off(x), if it exists, through the substitution yj = xvj . We are interested in thecase that f(x) has no non-constant reciprocal factor. In this case, we can obtaina complete factorization of f(x) into irreducibles.

Theorem 11.6.6. Fix

F = F (x1, . . . , xr) = arxr + · · ·+ a1x1 + a0,

where the aj are nonzero integers. There exists a finite computable set of ma-trices S with integer entries, depending only on F , with the following property:Suppose the vector

−→d = 〈d1, d2, . . . , dr〉 is in Zr with dr > dr−1 > · · · > d1 > 0

and such that f(x) = F (xd1 , xd2 , . . . , xdr ) has no non-constant reciprocal fac-tor. Then there is an r × t matrix M = (mij) ∈ S of rank t ≤ r and a vector−→v = 〈v1, v2, . . . , vt〉 in Zt such that (11.6.16) holds and the factorization givenby (11.6.17) in Z[y1, . . . , yt] of a polynomial in t variables y1, y2, . . . , yt as aproduct of s irreducible polynomials over Q implies the factorization of f(x)given by (11.6.20) as a product of polynomials in Z[x] each of which is eitherirreducible over Q or a constant.

We are ready now to apply the above to finish describing the algorithmfor establishing Theorem 11.6.1, that is to describe how (iii) can be done. Asbefore, the reader can look at the work of Filaseta, Granville, and Schinzel(2008) to obtain details for estimating the running time. As suggested by thestatement of Theorem 11.6.6, we take the coefficients aj of f(x) and considerthe multi-variable polynomial F = F (x1, . . . , xr). We compute the set S. Sincef(x) = F (xd1 , . . . , xdr ) has no non-constant reciprocal factors, there is a matrixM = (mij) ∈ S of rank t ≤ r and a vector −→v in Zt as in Theorem 11.6.6.

286

We go through each of the Or,H(1) matrices M in S and solve for the vectors−→v = 〈v1, v2, . . . , vt〉 in Zt satisfying

−→d = M−→v , where t is the number of columns

in M and we interpret−→d and −→v as column vectors. From the definition of S,

we have that the rank of M is t and t ≤ r. Hence, there can be at most one suchvector −→v for each M ∈ S. However, for each

−→d , there may be many M ∈ S

and −→v for which−→d = M−→v , and we will consider all of them.

The algorithm for Lemma 11.6.5 is performed for each of the Or,H(1) matri-ces M in S. This leads to Or,H(1) factorizations of the form given in (11.6.17)into irreducibles, each having a potentially different value for s. For eachof these, we compute the values of Fi

(xv1 , . . . , xvt

)and determine wi as in

(11.6.19). We produce then Or,H(1) factorizations of f(x) as in (11.6.20). As weobtain these factorizations, we keep track of the number of non-constant polyno-mials xwiFi

(xv1 , . . . , xvt

)appearing in (11.6.20). We choose a factorization for

which this number is maximal. Recalling that (11.6.20) follows from−→d = M−→v

and (11.6.17), we deduce from Theorem 11.6.6 that the factorization of f(x)we have chosen provides a factorization of f(x) with each xwiFi

(xv1 , . . . , xvt

)either irreducible or constant. For a factorization of f(x) into irreducibles overQ, we multiply together the constants appearing on the right of (11.6.20) andone of the irreducible polynomials J Fi

(xv1 , . . . , xvt

). This completes the proof

of Theorem 11.6.1.

287

§11.7 Exercises

(11.1) Use Berlekamp’s algorithm to factor f(x) = x6 +x3 +x2 +x+1 modulo2. You should obtain two polynomials u(x) and v(x) of degrees < 6 such thatf(x) ≡ u(x)v(x) (mod 2).

(11.2) Use the previous problem and Hensel lifting to factor f(x) = x6 + x3 +x2 + x+ 1 modulo 32. To help, let u(x) and v(x) be as in the previous problemwith deg u > deg v. Then you can take advantage of the following:

u(x) + x3v(x) ≡ x5 + 1 (mod 2)

(x+ 1)u(x) + (x2 + 1)v(x) ≡ x5 + x4 (mod 2)

xu(x) + (x2 + x)v(x) ≡ x5 (mod 2).

(11.3) In each part, apply the Gram-Schmidt orthogonalization process to thegiven subset S of V to obtain an orthogonal basis for V .

(a) S = {〈1, 1, 1〉, 〈1, 0, 1〉, 〈2, 1, 1〉}, V = Q3

(b) S = {〈1, 0, 0, 1〉, 〈1, 0, 2, 2〉, 〈1, 1, 0, 1〉, 〈2, 1, 1, 0〉}, V = Q4

(11.4) Let β1 = 〈1, 0,−5〉, β2 = 〈11, 0,−3〉 and β3 = 〈41, 2,−23〉.

(a) Verify that {β1, β2, β3} form a basis for R3. In other words, show that if a1,a2 and a3 are real numbers, then there exist real numbers b1, b2 and b3 suchthat

b1β1 + b2β2 + b3β3 = 〈a1, a2, a3〉.

(b) Show that {β1, β2, β3} is not an orthogonal basis for R3.

(c) Calculate the orthogonal basis {α1, α2, α3} for R3 obtained by the Gram-Schmidt orthogonalization process.

(d) Write 〈1, 2, 3〉 ∈ R3 as a linear combination of α1, α2 and α3 by consideringprojections of vectors rather than solving by linear equations.

(11.5) Let ~b1, . . . ,~bn be in Qn, and suppose they are linearly independent overR. Let ~b∗1, . . . ,~b

∗n be the vectors obtained from ~b1, . . . ,~bn by the Gram-Schmidt

orthogonalization process.

(a) For each i ∈ {1, . . . , n}, show that the vectors ~b∗1, . . . ,~b∗i span the same

subspace of Rn as ~b1, . . . ,~bi.

(b) Show that ~b∗1, . . . ,~b∗n are linearly independent over R.

(c) Show that ~b∗1, . . . ,~b∗n are pairwise orthogonal (i.e., for distinct i and j, we

have ~b∗i ·~b∗j = 0).

288

(11.6) (a) Let α1, α2, . . . , αn ∈ C with |α1|, |α2|, . . . , |αm| each ≥ 1 and |αm+1|,|αm+2|, . . . , |αn| each < 1. Set M = α1α2 · · ·αm. For k ∈ {1, 2, . . . , n}, let

σk =∑

1≤j1<j2<···<jk≤n

αj1αj2 · · ·αjk

be the kth elementary symmetric function on α1, α2, . . . , αn. Using inductionon m, show that

σk ≤(n− 1k − 1

)M +

(n− 1k

).

(b) Let f(x) =∑nj=0 ajx

j and g(x) =∑mj=0 bjx

j be in Z[x]. Suppose f(x) =g(x)h(x) for some h(x) ∈ Z[x]. Prove that

|bk| ≤(m− 1k − 1

)‖f‖+

(m− 1k

)|an|, for 0 ≤ k ≤ m.

(c) Prove that

|bk| ≤(m

k

)‖f‖, for 0 ≤ k ≤ m.

(11.7) Let f(x) ∈ Z[x] with deg f = n. Suppose f(x) = g(x)h(x) for someg(x), h(x) ∈ Z[x].

(a) Show thatn∑j=0

(n

j

)2

=(

2nn

).

(b) Using (a) and the previous problem, justify that

‖g‖ ≤(

2nn

)1/2

‖f‖.

(11.8) Let p be a prime, and let k be a positive integer. Observe that

xkp − 1 = Φp(x)w(x)

for some polynomial w(x) ∈ Z[x].

(a) Explain why w(x) has at least k + 1 non-zero terms.

(b) Give a brief explanation to justify the equation

−x2kp + 2xkp − 1 = Φp(x)Φp(x)w(x)w(x).

(c) Using (a) and (b), prove that there is a constant C such that for every integerN , there is a polynomial f(x) ∈ Z[x] satisfying both of the following.

289

(i) ‖f‖ ≤ C.

(ii) There is a factorization of f(x) as g(x)h(x) where g(x) and h(x) are inZ[x] and both ‖g‖ ≥ N and ‖h‖ ≥ N .

(11.9) Let s1, s2, . . . , sn be distinct, non-zero, squarefree positive integers, andlet p1, . . . , pr be all the primes dividing s1 · · · sn. Recall that, in Section 11.3,we have shown that the number field Q(

√p1, . . . ,

√pr) has degree 2r over Q .

Using this fact, explain why the numbers√s1, . . .

√sn are linearly independent

over Q.

(11.10) (a) In the example of Swinnerton-Dyer, we took the numbers aj to besquarefree pairwise relatively prime integers > 1. Discuss what conditions arereally needed on the aj ’s. In particular, consider the possibility that some ofthe aj ’s are negative.

(b) We have shown that x4 + 1 is reducible modulo every prime. Show that thisfollows from part (a), possibly redoing part (a) so that it does.

(11.11) (a) Let a and b be relatively prime positive integers. Prove that

gcd(xab − 1, (xa − 1)(xb − 1)

)=

(xa − 1)(xb − 1)x− 1

.

(b) Explain how (a) implies that Theorem 11.6.2 is not true if we omit thecondition that at least one of f(x) and g(x) is free of cyclotomic factors.

Documents

Irreducible Polynomials over Finite Fieldspeople.math.sc.edu/filaseta/gradcourses/Math788M/... · a prime psuch that f(x) is irreducible modulo p. If we can nd such a prime pand pdoes