Applied Combinatorics â€“ Math 6409

Applied Combinatorics – Math 6409

S. E. Payne

Student Version - Fall 2003

2

Contents

0.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1 Basic Counting Techniques 111.1 Sets and Functions: The Twelvefold Way . . . . . . . . . . . . 111.2 Composition of Positive Integers . . . . . . . . . . . . . . . . . 171.3 Multisets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.4 Multinomial Coefficients . . . . . . . . . . . . . . . . . . . . . 201.5 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.6 Partitions of Integers . . . . . . . . . . . . . . . . . . . . . . . 241.7 Set Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.8 Table Entries in the Twelvefold Way . . . . . . . . . . . . . . 271.9 Recapitulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.10 Cayley’s Theorem: The Number of Labeled Trees . . . . . . . 301.11 The Matrix-Tree Theorem . . . . . . . . . . . . . . . . . . . . 351.12 Number Theoretic Functions . . . . . . . . . . . . . . . . . . . 381.13 Inclusion – Exclusion . . . . . . . . . . . . . . . . . . . . . . . 421.14 Rook Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 471.15 Permutations With forbidden Positions . . . . . . . . . . . . . 501.16 Recurrence Relations: Menage Numbers Again . . . . . . . . . 551.17 Solutions and/or Hints to Selected Exercises . . . . . . . . . . 58

2 Systems of Representatives and Matroids 672.1 The Theorem of Philip Hall . . . . . . . . . . . . . . . . . . . 672.2 An Algorithm for SDR’s . . . . . . . . . . . . . . . . . . . . . 722.3 Theorems of Konig and G. Birkkhoff . . . . . . . . . . . . . . 732.4 The Theorem of Marshall Hall, Jr. . . . . . . . . . . . . . . . 762.5 Matroids and the Greedy Algorithm . . . . . . . . . . . . . . . 792.6 Solutions and/or Hints to Selected Exercises . . . . . . . . . . 85

3

4 CONTENTS

3 Polya Theory 893.1 Group Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . 893.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923.3 The Cycle Index: Polya’s Theorem . . . . . . . . . . . . . . . 963.4 Sylow Theory Via Group Actions . . . . . . . . . . . . . . . . 983.5 Patterns and Weights . . . . . . . . . . . . . . . . . . . . . . . 1003.6 The Symmetric Group . . . . . . . . . . . . . . . . . . . . . . 1063.7 Counting Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 1103.8 Solutions and/or Hints to Selected Exercises . . . . . . . . . . 111

4 Formal Power Series as Generating Functions 1134.1 Using Power Series to Count Objects . . . . . . . . . . . . . . 1134.2 A famous example: Stirling numbers of the 2nd kind . . . . . 1164.3 Ordinary Generating Functions . . . . . . . . . . . . . . . . . 1184.4 Formal Power Series . . . . . . . . . . . . . . . . . . . . . . . 1214.5 Composition of Power Series . . . . . . . . . . . . . . . . . . . 1254.6 The Formal Derivative and Integral . . . . . . . . . . . . . . . 1274.7 Log, Exp and Binomial Power Series . . . . . . . . . . . . . . 1294.8 Exponential Generating Functions . . . . . . . . . . . . . . . . 1324.9 Famous Example: Bernoulli Numbers . . . . . . . . . . . . . . 1354.10 Famous Example: Fibonacci Numbers . . . . . . . . . . . . . 1374.11 Roots of a Power Series . . . . . . . . . . . . . . . . . . . . . . 1384.12 Laurent Series and Lagrange Inversion . . . . . . . . . . . . . 1394.13 EGF: A Second Look . . . . . . . . . . . . . . . . . . . . . . . 1494.14 Dirichlet Series - The Formal Theory . . . . . . . . . . . . . . 1554.15 Rational Generating Functions . . . . . . . . . . . . . . . . . . 1594.16 More Practice with Generating Functions . . . . . . . . . . . . 1644.17 The Transfer Matrix Method . . . . . . . . . . . . . . . . . . . 1674.18 A Famous NONLINEAR Recurrence . . . . . . . . . . . . . . 1764.19 MacMahon’s Master Theorem . . . . . . . . . . . . . . . . . . 177

4.19.1 Preliminary Results on Determinants . . . . . . . . . . 1774.19.3 Permutation Digraphs . . . . . . . . . . . . . . . . . . 1784.19.4 A Class of General Digraphs . . . . . . . . . . . . . . . 1794.19.5 MacMahon’s Master Theorem for Permutations . . . . 1814.19.8 Dixon’s Identity as an Application of the Master The-

orem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1844.20 Solutions and/or Hints to Selected Exercises . . . . . . . . . . 1864.21 Addendum on Exercise 4.19.9 . . . . . . . . . . . . . . . . . . 194

CONTENTS 5

4.21.1 Symmetric Polynomials . . . . . . . . . . . . . . . . . . 1944.21.7 A Special Determinant . . . . . . . . . . . . . . . . . . 1974.21.9 Application of the Master Theorem to the Matrix B . . 1994.21.10Sums of Cubes of Binomial Coefficients . . . . . . . . . 201

5 Mobius Inversion on Posets 2035.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2035.2 POSETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2055.3 Vector Spaces and Algebras . . . . . . . . . . . . . . . . . . . 2085.4 The Incidence Algebra I(P,K) . . . . . . . . . . . . . . . . . 2105.5 Optional Section on ζ . . . . . . . . . . . . . . . . . . . . . . . 2155.6 The Action of I(P,K) and Mobius Inversion . . . . . . . . . . 2165.7 Evaluating µ: the Product Theorem . . . . . . . . . . . . . . . 2185.8 More Applications of Mobius Inversion . . . . . . . . . . . . . 2265.9 Lattices and Gaussian Coefficients . . . . . . . . . . . . . . . . 2335.10 Posets with Finite Order Ideals . . . . . . . . . . . . . . . . . 2435.11 Solutions and/or Hints to Selected Exercises . . . . . . . . . . 248

6 CONTENTS

0.1 Notation

Throughout these notes the following notation will be used.

C = The set of complex numbers

N = The set of nonnegative integers

P = The set of positive integers

Q = The set of rational numbers

R = The set of real numbers

Z = The set of integers

N = a1, . . . , an = typical set with n elements

[n] = 1, 2, . . . , n; [0] = ∅

[i, j] = [i, i+ 1, . . . , j], if i ≤ j

bxc = The floor of x (i.e., the largest integer not larger than x)

dxe = The ceiling of x (i.e., smallest integer not smaller than x)

P([n]) = A : A ⊆ [n]

P(S) = A : A ⊆ S (for any set S)

|A| = The number of elements of A (also denoted #A)

(Nk

)= A : A ⊆ N and |A| = k = set of k − subsets of N

(nk

)= #

(Nk

)= number of k − subsets of N (0 ≤ k ≤ n)

0.1. NOTATION 7

((Sk

))= set of all k −multisets on S

((nk

))= number of k −multisets of an n− set

(n

a1, . . . , am

)= number of ways of putting each element of an

n− set into one of m categories C1, . . . , Cm,with ai objects in Ci,

∑ai = n.

(j)q = 1 + q + q2 + · · ·+ qj−1

(n)!q = (1)q(2)q · · · (n)q n-qtorial

[nk

]q

= (n)!q(k)!q(n−k)!q Gaussian q-binomial coefficient

Sn = The symmetric group on [n]

[nk

]= #π ∈ Sn : π has k cycles

= c(n, k) = signless Stirling number of first kind

s(n, k) = (−1)n−k[nk

](Stirling number of first kind)

nk

= S(n, k) = Number of partitions of an n-set into

k nonempty subsets (blocks)= Stirling number of second kind

B(n) = Total number of partitions of an n-set= Bell Number

nk = (n)k = n(n− 1) · · · (n− k + 1) (n to the k falling)

nk = (n)k = n(n+ 1) · · · (n+ k − 1) (n to the k rising)

8 CONTENTS

0.2 Introduction

The course at CU-Denver for which these notes were assembled, Math 6409(Applied Combinatorics), deals more or less entirely with enumerative com-binatorics. Other courses deal with combinatorial structures such as Latinsquares, designs of many types, finite geometries, etc. This course is aone semester course, but as it has been taught different ways in differentsemesters, the notes have grown to contain more than we are now able tocover in one semester. On the other hand, these notes contain considerablyless material than the standard textbooks listed below. It is always difficultto decide what to leave out, and the choices clearly are a reflection of thelikes and dislikes of the author. We have tried to include some truly tradi-tional material and some truly nontrivial material, albeit with a treatmentthat makes it accessible to the student.

Since the greater part of this course is, ultimately, devoted to developingever more sophisticated methods of counting, we begin with a brief discussionof what it means to count something. As a first example, for n ∈ N , putf(n) = |P([n])|. Then no one will argue that the formula f(n) = 2n is any-thing but nice. As a second example, let d(n) be the number of derangements

of (1, . . . , n). Then (as we show at least twice later on) d(n) = n!∑ni=0

(−1)i

i!.

This is not so nice an answer as the first one, but there are very clear proofs.Also, d(n) is the nearest integer to n!

e. This is a convenient answer, but it

lacks combinatorial significance. Finally, let f(n) be the number of n × nmatrices of 0’s and 1’s such that each row and column has 3 1’s. It has beenshown that

f(n) = 6−n∑ (−1)β(n!)2(β + 3γ)!2α3β

α!β!(γ!)26γ,

where the sum is over all α, β, γ ∈ N for which α+ β + γ = n. As far as weknow, this formula is not good for much of anything, but it is a very specificanswer that can be evaluated by computer for relatively small n.

As a different kind of example, suppose we want the Fibonacci numbersF0, F1, F2, . . . , and what we know about them is that they satisfy the recur-rence relation

Fn+1 = Fn + Fn−1 (n ≥ 1;F0 = F1 = 1).

The sequence begins with 1, 1, 2, 3, 5, 8, 21, 34, 55, 89, . . .. There are ex-act, not very complicated formulas for Fn, as we shall see later. But just to

0.2. INTRODUCTION 9

introduce the idea of a generating function, here is how a “generatingfunc-tionologist” might answer the question: The nth Fibonacci number Fn is thecoefficient of xn in the expansion of the function 1/(1 − x − x2) as a powerseries about the origin. (See the book generatingfunctionology by H. S. Wilf.)Later we shall investigate this problem a great deal more.

We shall derive a variety of techniques for counting, some purely combina-torial, some involving algebra in a moderately sophisticated way. But manyof the famous problems of combinatorial theory were “solved” before thesophisticated theory was developed. And often the simplest ways to countsome type of object offer the greatest insight into the solution. So before wedevelop the elaborate structure theory that mechanizes some of the countingproblems, we give some specific examples that involve only rather elemen-tary ideas. The first several sections are short, with attention to specificproblems rather than to erecting large theories. Then later we develop moreelaborate theories that begin to show some of the sophistication of moderncombinatorics.

Many common topics from combinatorics have received very little treat-ment in these notes. Other topics have received no mention at all! Wepropose that the reader consult at least the following well known text booksfor additional material.

P. J. Cameron, Combinatorics, Cambridge University Press, 1994.

I. P. Goulden and D. M. Jackson, Combinatorial Enumeration, Wiley-Interscience, 1983.

R. Graham, D. Knuth and O. Patashnik, Concrete Mathematics,Addison-wesley Pub. Co., 1991.

R. P. Stanley, Enumerative Combinatorics, Vol. I, Wadsworth andBrooks/Cole, 1986.

J. H. van Lint and R. M. Wilson, A Course In Combinatorics, CambridgeUniversity Press, 1992.

H. S. Wilf, generatingfunctionology, Academic Press, 1990.

In addition there is the monumental Handbook of Combinatoricsedited by R.L. Graham, M. Grotschel and L. Lovasz, published in 1995 bythe MIT Press in the USA and by North-Holland outside the USA. This is atwo volume set with over 2000 pages of articles contributed by many experts.This is a very sophisticated compendium of combinatorial mathematics that

10 CONTENTS

is difficult reading but gives many wonderful insights into the more advancedaspects of the subject.

Chapter 1

Basic Counting Techniques

1.1 Sets and Functions: The Twelvefold Way

Let N , X be finite sets with #N = n, #X = x. Put XN = f : N → X.We want to compute #(XN) subject to three types of restrictions on f andfour types of restrictions on when two functions are considered the same.

Restrictions on f :

(i) f is arbitrary

(ii) f is injective

(iii) f is surjective

Consider N to be a set of balls, X to be a set of boxes and f : N → Xa way to put balls into boxes. The balls and the boxes may be labeled orunlabeled. We illustrate the various possibilities with the following examples.

N = 1, 2, 3, X = a, b, c, d.

f =

(1 2 3a a b

); g =

(1 2 3a b a

); h =

(1 2 3b b d

);

i =

(1 2 3c b b

).

Case 1. Both the balls and the boxes are labeled (or distinguishable).

11

12 CHAPTER 1. BASIC COUNTING TECHNIQUES

f : ©1©2 ©3

a b c dg: ©1©3 ©2

a b c dh: ©1©2 ©3

a b c di: ©2©3 ©1

a b c d

Case 2. Balls unlabeled; boxes labeled.f ∼ g: ©© ©

a b c dh: ©© ©

a b c di: ©© ©

a b c d

Case 3. Balls labeled; boxes unlabeled.f ∼ h: ©1©2 ©3

g: ©1©3 ©2

i: ©2©3 ©1

Case 4. Both balls and boxes unlabeled.f ∼ g ∼ h ∼ i: ©© ©

For the four different possibilities arising according as N and X are eachlabeled or unlabeled there are different definitions describing when two func-tions from N to X are equivalent.

Definition: Two functions f , g : N → X are equivalent

1. with N unlabeled provided there is a bijection π : N → N such thatf(π(a)) = g(a) for all a ∈ N . (In words: provided some relabeling of theelements of N turns f into g.)

2. with X unlabeled provided there is a bijection σ : X → X such thatσ(f(a)) = g(a) for all a ∈ N . (In words: provided some relabeling of theelements of X turns f into g.)

1.1. SETS AND FUNCTIONS: THE TWELVEFOLD WAY 13

3. with both N and X unlabeled provided there are bijections π :N → N and σ : X → X with σ(f(π(a))) = g(a), i.e., the following diagramcommutes.

6

N

π

Nf

X

σ

X

-

?

-

g

Obs. 1. These three notions of equivalence determine equivalence rela-tions on XN . So the number of “different” functions with respect to one ofthese equivalences is the number of different equivalence classes.

Obs. 2. If f and g are equivalent in any of the above ways, then f isinjective (resp., surjective) iff g is injective (resp., surjective). So we saythe notions of injectivity and surjectivity are compatible with the equivalencerelation. By the “number of inequivalent injective functions f : N → X” wemean the number of equivalence classes all of whose elements are injective.Similarly for surjectivity.

As we develop notations, methods and results, the reader should fill in theblanks in the following table with the number of functions of the indicatedtype. Of course, sometimes the formula will be elegant and simple. Othertimes it may be rather implicit, e.g., the coefficient of some term of a givenpower series.


The 12–Fold Way

Value of |f : N → X| if |N | = n and |X| = xN X f unrestricted f injective f surjective

Labeled Labeled 1 2 3

Unlabeled Labeled 4 5 6

Labeled Unlabeled 7 8 9

Unlabeled Unlabeled 10 11 12

Let N = a1, . . . , an, X = 0, 1. Let P(N) be the set of all subsets ofN . For A ⊆ N , define fA : N → X by

fA(ai) =

1, ai ∈ A0, ai 6∈ A.

Then F : P(N) → XN : A 7→ fA is a bijection, so

|P(N)| = #(XN) = 2n. (1.1)

Exercise: 1.1.1 Generalize the result of Eq. 1.1 to provide an answer forthe first blank in the twelvefold way.

We define

(Nk

)to be the set of all k-subsets of N , and put

(nk

)=

#

(Nk

). Let N(n, k) be the number of ways to choose a k-subset T of N

and then linearly order the elements of T . Clearly N(n, k) =

(nk

)k!. On

the other hand, we could choose any element of N to be the first element

1.1. SETS AND FUNCTIONS: THE TWELVEFOLD WAY 15

of T in n ways, then choose the second element in n − 1 ways, . . . , and

finally choose the kth element in n − k + 1 ways. So N(n, k) =

(nk

)k! =

n(n− 1) · · · (n− k + 1) := nk, where this last expression is read as “n to thek falling.” (Note: Similarly, nk = n(n+ 1) · · · (n+ k− 1) is read as “n to thek rising.”) This proves the following.

(nk

)=n(n− 1) · · · (n− k + 1)

k!=nk

k!(1.2)

(=

(n)kk!

according to some authors

).

Exercise: 1.1.2 Prove:(

nk−1

)+(nk

)=(n+1k

).

Exercise: 1.1.3 Binomial Expansion If xy = yx and n ∈ N , show that

(x+ y)n =n∑i=0

(ni

)xiyn−i. (1.3)

Note:

(nk

):= nk

k!makes sense for k ∈ N and n ∈ C. What if n ∈ Z,

k ∈ Z and k < 0 or k > n? The best thing to do is to define the value ofthe binomial coefficient to be zero in these cases. Then we may write thefollowing

(1 + x)n =n∑k=0

(nk

)xk =

∑k

(nk

)xn−k, (1.4)

where in the second summation the index may be allowed to run over allintegers since the coefficient on xn−k is nonzero for at most a finite numberof values of k.

Put x = −1 in Eq. 1.4 to obtain

∑k

(−1)k(nk

)= 0. (1.5)


Put x = 1 in Eq. 1.4 to obtain

∑k

(nk

)= 2n. (1.6)

Differentiate Eq. 1.4 with respect to x (n(1 + x)n−1 =∑k k

(nk

)xk−1)

and put x = 1 to obtain

∑k

k

(nk

)= n · 2n−1. (1.7)

Exercise: 1.1.4 Prove that∑rj=m

(jm

)=(r+1m+1

).

(Hint: A straightforward induction argument works easily. For a more amus-

ing approach try the following:∑n−1i=0 (1+y)i = (1+y)n−1

(1+y)−1=∑nj=1

(nj

)yj−1. Now

compute the coefficient of ym.)

Exercise: 1.1.5 Prove that for all a, b, n ∈ N the following holds:

∑i

(ai

)(b

n− i

)=

(a+ bn

).

Exercise: 1.1.6 Let r, s, k be any nonnegative integers. Then the followingidentity holds:

k∑j=0

(r + j

r

)(s+ k − j

s

)=

(r + s+ k + 1

r + s+ 1

).

Exercise: 1.1.7 Evaluate the following two sums:a.∑ni=1

(ni

)i3i

b.∑ni=2

(ni

)i(i− 1)mi

1.2. COMPOSITION OF POSITIVE INTEGERS 17

Exercise: 1.1.8 Show that if 0 ≤ m < n, then

n∑k=m+1

(−1)k(n

k

)(k − 1

m

)= (−1)m+1.

(Hint: Fix m and induct on n.)

Exercise: 1.1.9 Show that if 0 ≤ m < n, then

m∑k=0

(−1)k(n

k

)= (−1)m

(n− 1

m

).

(Hint: Fix n > 0 and use “finite” induction on m.)

Exercise: 1.1.10 Show that:(a)

∑∞k=0

(k+nk

)12k = 2n+1.

(b)∑nk=0

(k+nk

)12k = 2n+1.

Exercise: 1.1.11 If n is a positive integer, show that

n∑k=1

(−1)k+1

k

(n

k

)=

n∑k=1

1

k.

1.2 Composition of Positive Integers

A composition of n ∈ P is an ordered set σ = (a1, . . . , ak) of positiveintegers for which n = a1 + · · · ak. In this case σ has k parts, i.e., σ isa k-composition of n. Given a k-composition σ = (a1, . . . , ak) , define a(k − 1)-subset θ(σ) of [n− 1] by

θ(σ) = a1, a1 + a2, . . . , a1 + a2 + · · ·+ ak−1.θ is a bijection between the set of k-compositions of n and (k−1)-subsets

of [n− 1]. This proves the following:

There are exactly

(n− 1k − 1

)k − compositions of n. (1.8)


Moreover, the total number of compositions of n is

n∑k=1

(n− 1k − 1

)=

n−1∑k=0

(n− 1k

)= 2n−1. (1.9)

The bijection θ is often represented schematically by drawing n dots in arow and drawing k− 1 vertical bars between the n− 1 spaces separating thedots - at most one bar to a space. For example,

· | · · | · · · | · | · · ↔ 1 + 2 + 3 + 1 + 2 = 9.

There is a closely related problem. Let N(n, k) be the number of solutions(also called weak compositions) (x1, . . . , xk) in nonnegative integers suchthat x1 +x2 + · · ·+xk = n. Put yi = xi+1 to see that N(n, k) is the numberof solutions in positive integers y1, . . . , yk to y1 + y2 + · · · + yk = n + k, i.e.,the number of k-compositions of n+ k. Hence

N(n, k) =

(n+ k − 1k − 1

)=

(n+ k − 1

n

). (1.10)

Exercise: 1.2.1 Find the number of solutions (x1, . . . , xk) in nonnegativeintegers to

∑xi ≤ n. Here k can vary over the numbers in the range 0 ≤

k ≤ r. (Hint:∑nj=0N(j, k) =

(n+kk

), and

∑rj=m

(jm

)=(r+1m+1

).)

Exercise: 1.2.2 Find the number of solutions (x1, . . . , xk) in integers to∑xi = n with xi ≥ ai for preassigned integers ai, 1 ≤ i ≤ k. (Hint: Try

yi = xi + 1 − ai.)(Ans: If k = 4 and m = a1 + a2 + a3 + a4, the answer is(n+3−m)(n+2−m)(n+1−m)

6.)

Exercise: 1.2.3 a) Show that∑nk=1

(m+k−1

k

)=∑mk=1

(n+k−1

k

). b) Derive a

closed form formula for the number of weak compositions of n into at mostm parts.

1.3. MULTISETS 19

Exercise: 1.2.4 Let S be a set of n elements. Count the ordered pairs (A,B)of subsets of S such that ∅ ⊆ A ⊆ B ⊆ S. Let c(j, k) denote the number ofsuch ordered pairs for which |A| = j and |B| = k. Show that:

(1 + y + xy)n =∑

0≤j≤k≤nc(j, k)xjyk.

What does this give if x = y = 1?

Exercise: 1.2.5 Show that

fk(x) =

(x

k

)=xk

k!

is a polynomial in x with rational coefficients (not all of which are integers)and such that for each integer m (positive, negative or zero) fk(m) is also aninteger.

1.3 Multisets

A finite multiset M on a set S is a function ν : S → N such that∑x∈S ν(x) <

∞. If∑x∈S ν(x) = k, M is called a k-multiset. Sometimes we write k = #M .

If S = x1, . . . , xn and ν(xi) = ai, write M = xa11 , . . . , x

ann . Then let((

Sk

))denote the set of all k-multisets on S and put((

nk

))= #

((Sk

)).

If M ′ = ν ′ : S → N is a second multiset on S, we say M ′ is a submultisetof M provided ν ′(x) ≤ ν(x) for all x ∈ S.

Note: The number of submultisets of M is∏x∈S(ν(x) + 1). And each

element of

((Sk

))corresponds to a weak n-composition of k : a1 + a2 +

· · ·+ an = k. This proves the following:((nk

))=

(n+ k − 1n− 1

)=

(n+ k − 1

k

). (1.11)


In the present context we give a glimpse of the generating functions thatwill be studied in greater detail later.

(1+x1 +x21 + · · ·)(1+x2 +x2

2 + · · ·) · · · (1+xn+x2n+ · · ·) =

∑ν:S→N

(∏xi∈S

xν(xi)i ).

Put all xi’s equal to x:

(1 + x+ x2 + · · ·)n =∑

ν:S→Nx∑

ν(xi) =∑

M on S

x#M =∑k≥0

((nk

))xk.

Hence we have proved that

(1− x)−n =∑k≥0

((nk

))xk. (1.12)

From this it follows that (−1)k(−nk

)=

((nk

)), a fact which is also

easy to check directly. Replacing n with n + 1 give the following versionwhich is worth memorizing:

(1− x)−(n+1) =∑k≥0

(n+ k

n

)xk. (1.13)

Exercise: 1.3.1(−nk

)= (−1)k

(n+k−1

k

).

1.4 Multinomial Coefficients

Let

(n

a1, . . . , am

)be the number of ways of putting each element of an n-

set into one of m labeled categories C1, . . . , Cm, so that Ci gets ai elements.This is also the number of ways of distributing n labeled balls into m labeledboxes so that box Bi gets ai balls.

Consider n linearly ordered blanks 1, 2, . . . , n to be assigned one ofm letters B1, . . . , Bm so that Bi is used ai times. It is easy to see that thenumber of “words” of length n from an alphabet B1, . . . , Bm with m letters

1.5. PERMUTATIONS 21

where the ith letter Bi is used ai times is

(n

a1, . . . , am

). Using this fact it

is easy to see that (n

a1, . . . , am

)=

=

(na1

)(n− a1

a2

)(n− a1 − a2

a3

)· · ·

(n− a1 − a2 − · · · − an−1

am

)

=n!

a1!a2! · · · am!.

Theorem 1.4.1 The coefficient of xa11 x

a22 · · ·xam

m in (x1+· · ·+xm)n is

(n

a1, . . . , am

).

The multinomial coefficient is defined to be zero whenever it is not thecase that a1, . . . , am are nonnegative integers whose sum is n.

Exercise: 1.4.2 Prove that (n

a1, . . . , am

)=

(n− 1

a1 − 1, a2, . . . , am

)+

(n− 1

a1, a2 − 1, . . . , am

)+ · · ·+

(n− 1

a1, a2, . . . , am − 1

).

1.5 Permutations

There are several ways to approach the study of permutations. One of themost basic is as a group of bijections. Let A be any nonempty set (finite orinfinite). Put S(A) = π : A→ A : π is a bijection.

Notation: If π : a1 7→ a2 we write π(a1) = a2 (unless noted otherwise). Ifπ, σ ∈ S(A), define the composition π σ by (π σ)(a) = π(σ(a)).

Theorem 1.5.1 (S(A), ) is a group.


For simplicity in notation we take A = [n] = 1, . . . , n, for n ∈ P andwe write Sn = S([n]). One way to represent π ∈ Sn is as a two-rowed

array π =

(1 2 . . . n

π(1) π(2) . . . π(n)

). From this representation it is easy to

write π either as a linearly ordered sequence π = π(1), π(2), . . . , π(n) or as aproduct of disjoint cycles:

π = (π(1)π2(1) · · ·πi(1))(π(j)π2(j) · · ·) · · · (· · ·).

Example: π =

(1 2 3 4 5 6 7 8 92 6 1 9 8 3 7 5 4

). Then π = 261983754 as

a linearly ordered sequence. π = (1263)(49)(58)(7) as a product of disjointcycles. Recall that disjoint cycles commute, and (135) = (351) = (513) 6=(153), etc.

We now introduce the so-called standard representation of a permutationπ ∈ Sn. Write π as a product of disjoint cycles in such a way that

(a) Each cycle is written with it largest element first, and

(b) Cycles are ordered (left to right) in increasing order of largest ele-ments.

Example: π =

(1 2 3 4 5 6 74 2 7 1 3 6 5

)= (14)(2)(375)(6) =

= (2)(41)(6)(753), where the last expression is the standard representationof π. Given a permutation π, π is the word (or permutation written as alinearly ordered sequence) obtained by writing π in standard form and erasingparentheses. So for the example above, π = 2416753. We can recover π fromπ by inserting a left parenthesis preceding each left-to-right maximum, i.e.,before each ai such that ai > aj for every j < i in π = a1a2 · · · an. Then putright parentheses where they have to be. It follows that π 7→ π is a bijectionfrom Sn to itself.

Theorem 1.5.2 Sn→Sn is a bijection. And π has k cycles if and only if πhas k left-to-right maxima.

Define

[nk

]to be the number of permutations in Sn that have exactly k

cycles. (Many authors use c(n, k) to denote this number.) So as a corollaryof Theorem 1.5.2 we have

1.5. PERMUTATIONS 23

Corollary 1.5.3 The number of permutations in Sn with exactly k left-to-

right maxima is

[nk

].

Put s(n, k) := (−1)n−kc(n, k) = (−1)n−k[nk

]. Then s(n, k) is called the

Stirling number of the first kind and

[nk

]is the signless Stirling number

of the first kind.

Lemma 1.5.4

[nk

]= (n−1)

[n− 1k

]+

[n− 1k − 1

], n, k > 0. And

[00

]=

1, but otherwise if n ≤ 0 or k ≤ 0 put

[nk

]= 0.

Proof: Let π ∈ Sn−1 be written as a product of k disjoint cycles. Wecan insert the symbol n after any of the numbers 1, . . . , n− 1 (in its cycle).This can be done in n− 1 ways, yielding the disjoint cycle decomposition ofa permutation π′ ∈ Sn with k cycles for which n appears in a cycle of length

greater than or equal to 2. So there are (n − 1)

[n− 1k

]permutations

π′ ∈ Sn with k cycles for which π′(n) 6= n. On the other hand, we can choosea permutation π ∈ Sn−1 with k − 1 cycles and extend it to a permutationπ′ ∈ Sn with k cycles satisfying π′(n) = n. This gives each π′ ∈ Sn exactlyonce, proving the desired result.

Theorem 1.5.5 For n ∈ N ,∑nk=0

[nk

]xk = xn = (x)n.

Proof: Put Fn(x) := xn = x(x + 1) · · · (x + n − 1) =∑nk=0 b(n, k)x

k.If n = 0, Fn(x) is a “void product”, which by convention is 1. So we putb(0, 0) = 1, and b(n, k) = 0 if n < 0 or k < 0. Then Fn(x) = (x+n−1)Fn−1(x)implies that∑n

k=0 b(n, k)xk = x

∑n−1k=0 b(n− 1, k)xk + (n− 1)

∑n−1k=0 b(n− 1, k)xk

=∑nk=1 b(n− 1, k − 1)xk + (n− 1)

∑n−1k=0 b(n− 1, k)xk

=∑nk=0 [b(n− 1, k − 1) + (n− 1)b(n− 1, k)]xk.


This implies that b(n, k) = (n − 1)b(n − 1, k) + b(n − 1, k − 1). Hence

the b(n, k) satisfy the same recurrence and initial conditions as the

[nk

],

implying that they are the same, viz., b(n, k) =

[nk

].

Corollary 1.5.6 xn =∑nk=0(−1)n−k

[nk

]· xk.

Proof: Put x = −y in Theorem 1.5.5 and simplify. (Use xn = (−1)n(−x)n.)

Cycle Type: If π ∈ Sn, then ci = ci(π) is the number of cycles of lengthi in π, 1 ≤ i ≤ n. Note: n =

∑ni=1 ici. Then π has type (c1, . . . , cn) and the

total number of cycles of π is c(π) =∑n

1 ci(π).

Theorem 1.5.7 The number of π ∈ Sn with type (c1, . . . , cn) is

n!

1c1c1!2c2c2! · · ·ncncn!.

Proof: Let π = a1 · · · an be any word, i.e., permutation in Sn. Suppose(c1, . . . , cn) is an admissible cycle type, i.e., ci ≥ 0 for all i and n =

∑i ici.

Insert parentheses in π so that the first c1 cycles have length 1, the nextc2 cycles have length 2, . . . , etc. This defines a map Φ : S([n]) → Sc([n]),where Sc([n]) = σ ∈ S([n]) : σ has type (c1, . . . , cn). Clearly Φ is ontoSc([n]). We claim that if σ ∈ Sc([n]), then the number of π mapped to σis 1c1c1!2

c2c2! · · ·ncncn!. This follows because in writing σ as a product ofdisjoint cycles, we can order the cycles of length i (among themselves) in ci!ways, and then choose the first elements of all these cycles in ici ways. Thesechoices for different i are all independent. So Φ : S([n]) → Sc([n]) is a manyto one map onto Sc([n]) mapping the same number of π to each σ. Since#S([n]) = n!, we obtain the desired result.

1.6 Partitions of Integers

A partition of n ∈ N is a sequence λ = (λ1, . . . , λk) ∈ N k such that

1.7. SET PARTITIONS 25

a)∑λi = n, and

b) λ1 ≥ · · · ≥ λk ≥ 0.

Two partitions of n are identical if they differ only in the number ofterminal 0’s. For example, (3, 3, 2, 1) ≡ (3, 3, 2, 1, 0, 0). The nonzero λi arethe parts of the partition λ. If λ = (λ1, . . . , λk) with λ1 ≥ · · · ≥ λk > 0,we say that λ has k parts. If λ has αi parts equal to i, we may writeλ =< 1α1 , 2α2 , . . . > where terms with αi = 0 may be omitted, and thesuperscript αi = 1 may be omitted.

Notation: “λ ` n” means λ is a partition of n. As an example we have

(4, 4, 2, 2, 2, 1) = < 11, 23, 30, 42 > = < 1, 23, 42 > ` 15

Put p(n) equal to the total number of partitions of n, and pk(n) equal tothe number of partitions of n with k parts.

Convention: p(0) = p0(0) = 1.pn(n) = 1.pn−1(n) = 1 if n > 1.p1(n) = 1 for n ≥ 1.p2(n) = bn/2c.

Exercise: 1.6.1 pk(n) = pk−1(n− 1) + pk(n− k).

Exercise: 1.6.2 Show pk(n) =∑ks=1 ps(n− k).

A great deal of time and effort has been spent studying the partitions ofn and much is known about them. However, most of the results concerningthe numbers pk(n) have been obtained via the use of generating functions.Hence after we have studied formal power series it would be reasonable toreturn to the topic of partitions. Unfortunately we probably will not havetime to do this, so this topic would be a great one for a term project.

1.7 Set Partitions

A partition of a finite set N is a collection π = B1, . . . , Bk of subsets of Nsuch that:


(a) Bi 6= ∅ for all i;

(b) Bi ∩Bj = ∅ if i 6= j;

(c) B1 ∪B2 ∪ · · · ∪Bk = N.

We call Bi a block of π and say that π has k = |π| = #π blocks. Putnk

= S(n, k) = the number of partitions of an n-set into k-blocks.

S(n, k) is called a Stirling number of the second kind.We immediately have the following list of Stirling numbers.

00

= 1;

nk

= 0 if k > n ≥ 1;

n0

= 0 if n > 0;

n1

= 1;

n2

= 2n−1 − 1;

nn

= 1;

nn− 1

=

(n2

).

Theorem 1.7.1

nk

= k

n− 1k

+

n− 1k − 1

.

Proof: To obtain a partition of [n] into k blocks, we can either(i) partition [n− 1] into k blocks and place n into any of these blocks in

k

n− 1k

ways, or

(ii) put n into a block by itself and partition [n− 1] into k − 1 blocks inn− 1k − 1

ways.

Bell Number Let B(n) be the total number of partitions of an n-set. Hence

1.8. TABLE ENTRIES IN THE TWELVEFOLD WAY 27

B(n) =n∑k=1

nk

=

n∑k=0

nk

,

for all n ≥ 1.

Theorem 1.7.2

xn =∑k

nk

xk, n ∈ N .

Proof: Check n = 0 and n = 1. Then note that x · xk = xk+1 + kxk,because xk+1 = xk(x− k) = x · xk − k · xk. Now let our induction hypothesis

be that xn−1 =∑k

n− 1k

xk for some n ≥ 2. Then xn = x · xn−1 =

x∑k

n− 1k

xk =

∑k

n− 1k

xk+1 +

∑k

n− 1k

kxk =

∑k

n− 1k − 1

xk +

∑k

n− 1k

kxk

=∑k

(k

n− 1k

+

n− 1k − 1

)xk =

∑k

nk

xk.

Corollary 1.7.3 xn =∑k(−1)n−k

nk

xk.

1.8 Table Entries in the Twelvefold Way

The reader should supply whatever is still needed for a complete proof foreach of the following.

Entry #1. #(XN) = xn.Entry #2. #f ∈ XN : f is one-to-one = xn.

Entry #3. #f ∈ XN : f is onto = x!

nx

, as it is the number of

ways of partitioning the balls (say [n]) into x blocks and then linearly orderingthe blocks. This uniquely determines an f of the type being counted.

Entry #4. A function from unlabeled N to labeled X is a placing ofunlabeled balls in labeled boxes: the only important thing is how many balls


are there to be in each box. Each choice corresponds to an n-multiset of an

x-set, i.e.,

((xn

))=

(n+ x− 1

n

)=(n+x−1x−1

).

Entry #5. Here N is unlabeled and X is labeled and f is one-to-one.Each function corresponds to putting 0 or 1 ball in each box so that n balls

are used, so the desired number of functions is the binomial coefficient

(xn

).

Entry #6. Here N is unlabeled, X is labeled, and f is onto. So each fcorresponds to an n-multiset on X with each box chosen at least once. The

number of such functions is

((x

n− x

))=

(n− 1x− 1

)=

(n− 1n− x

).

Entry #7. With N labeled and X unlabeled, a function f : N → X isdetermined by the sets f−1(b) : b ∈ X. Hence f corresponds to a partition

of N into at most x parts. The number of such partitions is∑xk=1

nk

=∑x

k=1 S(n, k).

Entry #8. Here N is labeled, X is unlabeled and f is one-to-one. Suchan f amounts to putting just one of the n balls into each of n unlabeledboxes. This is possible iff n ≤ x, and in that case there is just 1 way.

Entry #9. With N labeled, X unlabeled, and f onto, such an f corre-sponds to a partition of N into x parts. Hence the number of such functions

is

nx

.

Entry #10. With N unlabeled, X unlabeled and f arbitrary, f is de-termined by the number of elements in each block of the ker(f), i.e., f isessentially just a partitiion of [n] with at most x parts. Hence the number ofsuch f is p1(n) + · · ·+ px(n).

Entry #11. Here N and X are both unlabeled, and f is one-to-one. Son unlabeled balls distributed into x unlabeled boxes is possible in just oneway if n ≤ x and not at all otherwise.

Entry #12. Here both N and X are unlabeled and f is onto. Clearly fcorresponds to a partition of n into x parts, so there are px(n) such functions.

Several of the entries of the Twelvefold Way are quite satisfactory, butothers need considerable further development before they are really useful.

1.9. RECAPITULATION 29

1.9 Recapitulation

We have already established the following.

1.

(nk

)=

(n− 1k

)+

(n− 1k − 1

)“n choose k′′

2.

[nk

]= (n− 1)

[n− 1k

]+

[n− 1k − 1

]“n cycle k′′

3.

nk

= k

n− 1k

+

n− 1k − 1

“n subset k′′

4. xn =∑k

[nk

]xk xn = (−1)n(−x)n

5. xn =∑k(−1)n−k

[nk

]xk xn = (−1)n(−x)n

6. xn =∑k

nk

(−1)n−kxk

7. xn =∑k

nk

xk

It appears that 4. and 6. (resp., 5. and 7.) are some kind of inversesof each other. Later we shall make this a little more formal as we study theincidence algebra of a finite POSET.

Also in this section we want to recap certain results on composition ofintegers, etc.

1. P (r; r1, r2, . . . , rn) =

(r

r1, r2, . . . , rn

)=

r!

r1!r2! · · · rn!

= the number of ways to split up r people into n labeled committees withri people in committee Ci

= the number of words of length r with ri letters of type i, 1 ≤ i ≤ n,where r1 + r2 + · · ·+ rn = r


= the coefficient on xr11 · xr22 · · ·xrnn in (x1 + x2 + · · · + xn)r, where r1 +

r2 + · · ·+ rn = r

= the number of ways of putting r distinct balls into n labeled boxes withri balls in the ith box.

2. C(n, r) =

(n

r

)=

n!

r!(n− r)!

= the number of ways of selecting a committee of size r from a set of npeople

= the coefficient of xryn−r in (x+ y)n.

3. C(r + n− 1, r) =

(r + n− 1

r

)=

(r + n− 1)!

r!(n− 1)!

= the number of ways of ordering r hot dogs of n different types (selec-tion with repetition)

= the number of ways of putting r identical balls into n labeled boxes(distribution of identical objects)

= the number of solutions (x1, . . . , xn) of ordered n-tuples of nonnegativeintegers such that x1 + x2 + · · ·+ xn = r.

4. Suppose that a1, . . . , an are given integers, not necessarily nonnega-tive. Then

(r −∑

ai + n− 1

r −∑ai

)= |(y1, . . . , yn) :

n∑i=0

yi = r and yi ≥ ai for 1 ≤ i ≤ n|.

For a proof, just put xi = yi − ai and note that yi ≥ ai iff xi ≥ 0.

NOTE: If∑ai > r, then the number of solutions is 0.

1.10 Cayley’s Theorem: The Number of La-

beled Trees

One of the most famous results in combinatorics is Cayley’s Theorem thatsays that the number of labeled trees on n vertices is nn−2. We give several

1.10. CAYLEY’S THEOREM: THE NUMBER OF LABELED TREES 31

proofs that illustrate different types of combinatorial arguments. But firstwe recall the basic facts about trees.

Let G be a finite graph G = (V,E) on n = |V | vertices and having b = |E|edges. Then G is called a tree provided G is connected and has no cycles. Itfollows that for any two vertices x, y ∈ V , there is a unique path in G from xto y. Moreover, if x and y are two vertices at maximum distance in G, thenx and y each have degree 1. Hence any tree with at least two vertices has atleast two vertices of degree 1. Such a vertex will be called a hanging vertex.An easy induction argument shows that if G is a tree on n vertices, then ithas n− 1 edges. As a kind of converse, if G is an acyclic graph on n verticeswith n − 1 edges, it must be a tree. Clearly we need only verify that G isconnected. Each connected component of G is a tree by definition. But if Ghas k connected components T1, . . . , Tk, where Ti has ni vertices and ni − 1edges, and where n1 + · · · + nk = n, then G has

∑ki=1 ni − 1 = n − k edges.

So n − k = n − 1 implies G is connected. Similarly, if G is connected withb = n − 1, then G has a spanning tree T with n − 1 edges. Hence G = Tmust be acyclic and hence a tree. It follows that if G is a graph on n verticesand b edges, then G is a tree if and only if at least two (and hence all three)of the following hold:

(a) G is connected;

(b) G is acyclic;

(c) b = n− 1.

A labeled tree on [n] is just a spanning tree of the complete graph Kn on[n]. Hence we may state Cayley’s theorem as follows.

Theorem 1.10.1 The number of spanning trees of Kn is nn−2.

Proof #1. The first proof is due to H. Prufer (1918). It uses an algorithmthat uniquely characterizes the tree.

Let T be a tree with V = [n], so the vertex set already has a naturalorder. Let T1 := T . For i = 1, 2, . . . , n− 2, let bi denote the vertex of degree1 with the smallest label in Ti, and let ai be the vertex adjacent to bi, andlet Ti+1 be the tree obtained by deleting the vertex bi and the edge ai, bifrom Ti. The “code” assigned to the tree T is [a1, a2, . . . , an−2].


x3

x4

x2 z

1

x5

z6

x7

x10

z8

z9

@@

@@

XXXXXXXXBB

BB

BB

A

AAA

HHHHH

HHH

A Tree on 10 Points

As an example, consider the tree T = T1 on 10 points in the figure. Thevertex of degree 1 with smallest index is 3. It is joined to vertex 2. We definea1 = 2, b1 = 3, then delete vertex 3 and edge 3, 2, to obtain a tree T2 withone edge and one vertex less. This procedure is repeated eight times yieldingthe sequences

[a1, a2, . . . , a8] = [2, 2, 1, 1, 7, 1, 10, 10],[b1, b2, . . . , b8] = [3, 4, 2, 5, 6, 7, 1, 8]

and terminating with the edge 9, 10.The code for the tree is the sequence: [a1, a2, . . . , a8] =

[2, 2, 1, 1, 7, 1, 10, 10].

To reverse the procedure, start with any code [a1, a2, . . . , a8] =[2, 2, 1, 1, 7, 1, 10, 10]. Write an−1 := n. For i = 1, 2, . . . , n − 1, let bi be thevertex with smallest index which is not in

ai, ai+1, . . . , an−1 ∪ b1, b2, . . . , bi−1.

Then ai, bi : i = 1, . . . , n− 1 will be the edge set of a spanning tree.

Exercise: 1.10.2 With the sequence bi defined from the code as indicated inthe proof above, show that bi, ai : i = 1, . . . , n− 1 will be the edge set ofa tree on [n]. Fill in the details of why the mapping associating a code to atree, and the mapping associating a tree to a code, are inverses.

Proof #2. This proof starts by showing that the number N(d1, . . . , dn) oflabeled trees on vertices v1, . . . , vn in which vi has degree di+1, 1 ≤ i ≤ n,

1.10. CAYLEY’S THEOREM: THE NUMBER OF LABELED TREES 33

is the multinomial coefficient(

n−2d1,...,dn

). As an inductive hypothesis we assume

that this result holds for trees with fewer than n vertices and leave to thereader the task of checking that the result holds for n = 3. Since the degreeof each vertex is at least 1, we know that the d’s are all nonnegative integers.The sum of the degrees of the vertices counts the n−1 edges twice, so we have2(n−1) =

∑ni=1(di+1) = (

∑di)+n⇒ ∑

di = n−2. Hence at least(

n−2d1,...,dn

)is in proper form. We also know that any tree has at least two vertices withdegree 1. We need to show that if (d1, . . . , dn) is a sequence of nonnegativeintegers with

∑di = n−2 then (d1+1, . . . , dn+1) really is the degree sequence

of(

n−2d1,...,dn

)labeled trees. Clearly if

∑di = n− 2 then at least two of the di’s

equal zero. The following argument would work with any particular dj = 0,but for notational ease we suppose that dn = 0. If there is a labeled tree withdegree sequence (d1 + 1, . . . , 1), then the vertex vn is adjacent to a uniquevertex vj with degree at least 2. So the tree obtained by removing vn andthe edge vj, vn has degree sequence (d1 +1, . . . , dj, . . . , dn−1 +1). It followsthat N(d1, . . . , dn−1, 0) = N(d1 − 1, d2, . . . , dn−1) +N(d1, d2 − 1, . . . , dn−1) +· · ·+N(d1, d2, . . . , dn−1 − 1). By the induction hypothesis this is the sum ofthe multinomial coefficients(

n− 3

d1 − 1, d2, . . . , dn−1

)+

(n− 3

d1, d2 − 1, . . . , dn−1

)+ · · ·+

+

(n− 3

d1, d2, . . . , dn−1 − 1

)=

(n− 2

d1, d2, . . . , dn−1, 0

).

Cayley’s Theorem now follows. For the number T (n) of labeled treeson n vertices is the sum of all the terms N(d1, . . . , dn) with di ≥ 0 and∑ni=1 di = n − 2, which is the sum of all terms

(n−2

d1,d2,...,dn

)with di ≥ 0 and∑n

i=1 di = n− 2. Now in the multinomial expansion of (a1 +a2 + · · ·+an)n−2

set a1 = · · · = an = 1 to obtain the desired result T (n) = (1+1+. . .+1)n−2 =nn−2.

Proof #3. This proof establishes a bijection between the set of labeledtrees on n vertices and the set of mappings from the set 2, 3, . . . , n−1 to theset [n] = 1, 2, . . . , n. Clearly the number of such mappings is nn−2. Supposef is such a mapping. Construct a functional digraph D on the vertices 1through n by defining (i, f(i)), i = 2, . . . , n− 1 to be the arcs. Clearly 1 andn have zero outdegrees in D, but each of them could have positive indegree.In either case, the (weakly) connected component containing 1 (respectively,


n) may be viewed as an “in-tree” rooted at 1 (respectively, n). Any othercomponent consists of an oriented circuit, to each point of which an in-tree isattached with that point as root. Some of these in-trees may consist only ofthe root. The ith oriented circuit has smallest element ri, and the circuits areto be ordered among theselves so that r1 < r2 < . . . < rk. In the ith circuit,let li be the vertex to which the arc from ri points, i.e., f(ri) = li. Supposethere are k oriented circuits. We may now construct a tree T from D bydeleting the arcs (ri, l− i) (to create a forest of trees) and then adjoining thearcs (1, l1), (r1, l2), . . . , (rk−1, lk), (rk, n).

For the reverse process, suppose the labeled tree T is given. Put r0 := 1,and define ri to be the smallest vertex on the (unique) path from ri−1 to n.Now delete the edges ri−1, li, i = 1, . . . , k − 1, and rk, n, to createk + 2 components. View the vertex 1 as the root of a directed in-tree.Similarly, view each vertex along the path from li to ri as the root of anin-tree. Now adjoin the directed arcs (ri, li). We may now view this directedtree as the functional digraph of a unique function from 2, 3, . . . , n− 1 to[n]. Moreover, it should be clear that this correspondence between functionsfrom 2, 3, . . . , n− 1 to [n] and labeled trees on [n] is a bijection.

Proof #4. In this proof, due to Joyal (1981), we describe a many-to-onefunction F from the set of nn functions from [n] to [n] to the set of labeledtrees on [n] such that the preimage of each labeled tree contains n2 functions.

First, recall that a permutation π of the elements of a set S may beviewed simultaneously as a linear arrangement of of the elements of S andas a product of disjoint oriented cycles of the objects of S. In the presentcontext we want S to be a set of disjoint rooted trees on [n] that use preciselyall the elements of [n]. But there are many such sets S, and the general resultwe need is that the number of linear arrangements of disjoint rooted treeson [n] that use precisely all the elements of [n] is the same as the number ofcollections of disjoint oriented cycles of disjoint rooted trees on [n] that useprecisely all the elements of [n].

To each function f we may associate its functional digraph which has anarc from i to f(i) for each i in [n]. Every (weakly) connected componentof a functional digraph (i.e., connected component of the underlying undi-rected graph) can be represented by an oriented cycle of rooted trees, sothat the cycles corresponding to different components are disjoint and all thecomponents use the elements in [n], each exactly once. Clearly there are nn

functions from [n] to [n], each corresponding uniquely to a functional digraphwhich is represented by a collection of disjoint oriented cycles of rooted trees

1.11. THE MATRIX-TREE THEOREM 35

on [n] that together use each element in [n] exactly once. Each collectionof disjoint oriented cycles of rooted trees (using each element of [n] exactlyonce) corresponds uniquely to a linear arrangement of a collection of rootedtrees (using each element of [n] exactly once). Hence nn is the number oflinear arrangements of rooted trees on [n] (by which we always mean thatthe rooted trees in a given linear arrangement use each element of [n] exactlyonce).

We claim now that nn = n2tn, where tn is the number of (labeled) treeson [n].

It is clear that n2tn is the number of triples (x, y, T ), where x, y ∈ [n] andT is a tree on [n]. Given such a triple, we obtain a linear arrangement ofrooted trees by removing all arcs on the unique path from x to y and takingthe nodes on this path to be the roots of the trees that remain, and orderingthese trees by the order of their roots in the original path from x to y. Inthis way each labeled tree corresponds to n2 linear arrangements of rootedtrees on [n].

1.11 The Matrix-Tree Theorem

The “matrix-tree” theorem expresses the number of spanning trees in a graphas the determinant of an appropriate matrix, from which we obtain one moreproof of Cayley’s theorem counting labeled trees. The main ingredient inthe proof is the following theorem known as the Cauchy-Binet Theorem. Itis more commonly stated and applied with the diagonal matrix 4 belowtaken to be the identity matrix. However, the generality given here actuallysimplifies the proof.

Theorem 1.11.1 Let A and B be, respectively, r ×m and m× r matrices,with r ≤ m. Let 4 be the m×m diagonal matrix with entry ei in the (i, i)-position. For an r-subset S of [m], let AS and BS denote, respectively, ther× r submatrices of A and B consisting of the columns of A, or the rows ofB, indexed by the elements of S. Then

det(A4B) =∑S

det(AS)det(BS)∏i∈S

ei,

where the sum is over all r-subsets S of [m].


Proof: We prove the theorem assuming that e1, . . . , em are independent(commuting) indeterminates over F . Of course it will then hold for all valuesof e1, . . . , em in F .

Recall that if C = (cij) is any r × r matrix over F , then

det(C) =∑σ∈Sr

sgn(σ)c1σ(1)c2σ(2) · · · crσ(r).

Given thatA = (aij) andB = (bij), the (i,j)-entry ofA4B is∑mk=1 aikekbkj,

and this is a linear form in the indeterminates e1, . . . , em. Hence det(A4B) isa homogeneous polynomial of degree r in e1, . . . , em. Suppose that det(A4B)has a monomial et11 e

t22 . . . where the number of indeterminates ei that have

ti > 0 is less than r. Substitute 0 for the indeterminates ei that do notappear in et11 e

t22 . . ., i.e., that have ti = 0. This will not affect the monomial

et11 et22 . . . or its coefficient in det(A4 B). But after this substitution 4 has

rank less than r, so A4 B has rank less than r, implying that det(A4 B)must be the zero polynomial. Hence we see that the coefficient of a monomialin the polynomial det(A4 B) is zero unless that monomial is the productof r distinct indeterminates ei, i.e., unless it is of the form

∏i∈S ei for some

r-subset S of [m].The coefficient of a monomial

∏i∈S ei in det(A4 B) is found by setting

ei = 1 for i ∈ S, and ei = 0 for i 6∈ S. When this substitution is made in4, A4 B evaluates to ASB

S. So the coefficient of∏i∈S ei in det(A4 B) is

det(AS)det(BS).

Exercise: 1.11.2 Let M be an n× n matrix all of whose linesums are zero.Then one of the eigenvalues of M is λ1 = 0. Let λ2, . . . , λn be the othereigenvalues of M . Show that all principal n − 1 by n − 1 submatrices havethe same determinant and that this value is 1

nλ2λ3 · · ·λn.

An incidence matrix N of a directed graph H is a matrix whose rowsare indexed by the vertices V of H, whose columns are indexed by the edgesE of H, and whose entries are defined by:

N(x, e) =

0 if x is not incident with e, or e is a loop,1 if x is the head of e,−1 if x is the tail of e.

Lemma 1.11.3 If H has k components, then rank(N) = |V | − k.

1.11. THE MATRIX-TREE THEOREM 37

Proof: N has v = |V | rows. The rank of N is v − n, where n is thedimension of the left null space of N , i.e., the dimension of the space of rowvectors g for which gN = 0. But if e is any edge, directed from x to y, thengN = 0 if and only if g(x) − g(y) = 0. Hence gN = 0 iff g is constant oneach component of H, which says that n is the number k of components ofH.

Lemma 1.11.4 Let A be a square matrix that has at most two nonzero en-tries in each column, at most one 1 in each column, at most one -1 in eachcolumn, and whose entries are all either 0, 1 or -1. Then det(A) is 0, 1 or-1.

Proof: This follows by induction on the number of rows. If every columnhas both a 1 and a -1, then the sum of all the rows is zero, so the matrix issingular and det(A) = 0. Otherwise, expand the determinant by a columnwith one nonzero entry, to find that it is equal to ±1 times the determinantof a smaller matrix with the same property.

Corollary 1.11.5 Every square submatrix of an incidence matrix of a di-rected graph has determinant 0 or ±1. (Such a matrix is called totallyunimodular.)

Theorem 1.11.6 (The Matrix-Tree Theorem) The number of spanning treesin a connected graph G on n vertices and without loops is the determinant ofany n− 1 by n− 1 principal submatrix of the matrix D −A, where A is theadjacency matrix of G and D is the diagonal matrix whose diagonal containsthe degrees of the corresponding vertices of G.

Proof: First let H be a connected digraph with n vertices and with inci-dence matrix N . H must have at least n − 1 edges, because it is connectedand must have a spanning tree, so we may let S be a set of n−1 edges. Usingthe notation of the Cauchy-Binet Theorem, consider the n by n−1 submatrixNS of N whose columns are indexed by elements of S. By Lemma 1.11.3, NS

has rank n−1 iff the spanning subgraph of H with S as edge set is connected,i.e., iff S is the edge set of a tree in H. Let N ′ be obtained by dropping anysingle row of the incidence matrix N . Since the sum of all rows of N (or ofNS) is zero, the rank of N ′

S is the same as the rank of NS. Hence we havethe following:


det(N ′S) =

±1 if S is the edge set of a spanning tree in H,0 otherwise.

(1.14)

Now let G be a connected loopless graph on n vertices. Let H be anydigraph obtained by orienting G, and let N be an incidence matrix of H.Then we claim NNT = D − A. For,

(NNT )xy =∑e∈E(G)N(x, e)N(y, e)

=

deg(x) if x = y,−t if x and y are joined by t edges in G.

An n − 1 by n − 1 principal submatrix of D − A is of the form N ′N ′T

where N ′ is obtained from N by dropping any one row. By Cauchy-Binet,

det(N ′N ′T ) =∑S

det(N ′S)det(N

′TS ) =

∑S

(det(N ′S))

2,

where the sum is over all n− 1 subsets S of the edge set. By Eq. 1.14 this isthe number of spanning trees of G.

Exercise: 1.11.7 (Cayley’s Theorem Again) In the Matrix-Tree Theorem,take G to be the complete graph Kn. Here the matrix D−A is nI−J , whereI is the identity matrix of order n, and J is the n by n matrix of all 1’s.Now calculate the determinant of any n− 1 by n− 1 principal submatrix ofthis matrix to obtain another proof that Kn has nn−2 spanning trees.

Exercise: 1.11.8 In the statement of the Matirx-Tree Theorem it is not nec-essary to use principal subdeterminants. If the n − 1 × n − 1 submatrix Mis obtained by deleting the ith row and jth column from D − A, then thenumber of spanning trees is (−1)i+jdet(M). This follows from the more gen-eral lemma: If A is an n − 1 × n matrix whose row sums are all equal to0 and if Aj is obtained by deleting the jth column of A, 1 ≤ j ≤ n, thendet(Aj) = −det(Aj+1).

1.12 Number Theoretic Functions

An arithmetic function (sometimes called a number theoretic function) is afunction whose domain is the set P of positive integers and whose range is asubset of the complex numbers C. Hence CP is just the set of all arithmetic

1.12. NUMBER THEORETIC FUNCTIONS 39

functions. If f is an arithmetic function not the zero function, f is said tobe multiplicative provided f(mn) = f(m)f(n) whenever (m,n) = 1, and tobe totally multiplicative provided f(mn) = f(m)f(n) for all m,n ∈ P . Thefollowing examples will be of special interest to us here.

Example 1.12.1 I(1) = 1 and I(n) = 0 if n > 1.

Example 1.12.2 U(n) = 1 for all n ∈ P.

Example 1.12.3 E(n) = n for all n ∈ P.

Example 1.12.4 The omega function: ω(n) is the number of distinct primesdividing n.

Example 1.12.5 The mu function: µ(n) = (−1)ω(n), if n is square-free, andµ(n) = 0 otherwise.

Example 1.12.6 Euler’s phi-function: φ(n) is the number of integers k, 1 ≤k ≤ n, with (k, n) = 1.

The following additional examples often arise in practice.

Example 1.12.7 The Omega function: Ω(n) is the number of primes divid-ing n counting multiplicity. So ω(n) = Ω(n) iff n is square-free.

Example 1.12.8 The tau function: τ(n) is the number of positive divisorsof n.

Example 1.12.9 The sigma function: σ(n) is the sum of the positive divi-sors of n.

Example 1.12.10 A generalization of the sigma function: σk(n) is the sumof the kth powers of the positive divisors of n.

Dirichlet (convolution) Product of Arithmetic Functions.

Def. If f and g are arithmetic functions, define the Dirichlet productf ∗ g by:

(f ∗ g)(n) =∑d|nf(d)g(n/d) =

∑d1d2=n

f(d1)g(d2).


Obs. 1.12.11 f ∗ g = g ∗ f .

Obs. 1.12.12 If f, g, h are arithmetic functions, (f ∗g)∗h = f ∗ (g ∗h), and[(f ∗ g) ∗ h)](n) =

∑d1d2d3=n f(d1)g(d2)h(d3).

Obs. 1.12.13 I ∗ f = f ∗ I = f for all f . And I is the unique multiplicativeidentity.

Obs. 1.12.14 An arithmetic function f has a (necessarily unique) multi-plicative inverse f−1 iff f(1) 6= 0.

Proof: If f ∗ f−1 = I, then f(1)f−1(1) = (f ∗ f−1)(1) = I(1) = 1, sof(1) 6= 0. Conversely, if f(1) 6= 0, then f−1(1) = (f(1))−1. Use inductionon n. For n > 1, if f−1(1), f−1(2), . . . , f−1(n− 1) are known, f−1(n) may beobtained from 0 = I(n) = (f ∗ f−1)(n) =

∑d|n f(d)f−1(n/d) for n > 1.

The following theorem has essentially been proved.

Theorem 1.12.15 The set of all arithmetic functions f with f(1) 6= 0 formsa group under Dirichlet multiplication.

Theorem 1.12.16 The set of all multiplicative functions is a subgroup.

Proof: f(1) 6= 0 6= g(1) implies (f ∗ g)(1) 6= 0. Associativity holds byObs. 1.12.12. The identity I is clearly multiplicative. So suppose f, g aremultiplicative. Let (m,n) = 1. Then

(f ∗ g)(mn) =∑d|mn f(d)g

(mnd

)=∑d1|m

∑d2|n f(d1d2)g

(mnd1d2

)=∑d1|m

∑d2|n f(d1)f(d2)g(m/d1)g(n/d2)

=∑d1|m f(d1)g(m/d1) ·

∑d2|n f(d2)g(n/d2)

= (f ∗ g)(m)(f ∗ g)(n).Finally, we need to show that if f is multiplicative, in which case f−1

exists, then also f−1 is multiplicative. Define g as follows. Put g(1) = 1,and for every prime p and every j > 0 put g(pj) = f−1(pj). Then extendg multiplicatively for all n ∈ P . Since f and g are both multiplicative, sois f ∗ g. Then for any prime power pk, (f ∗ g)(pk) =

∑d1d2=pk f(d1)g(d2) =∑

d1d2=pk f(d1)f−1(d2) = (f ∗ f−1)(pk) = I(pk). So f ∗ g and I coincide on

prime powers and are multiplicative. Hence f ∗g = I, implying g = f−1, i.e.,f−1 is multiplicative.

Clearly µ is multiplicative, and∑d|n µ(d) = 1 if n = 1. For n = pe,∑

d|n µ(d) =∑ej=0 µ(pj) = 1 + (−1) + 0 · · ·+ 0 = 0. Hence µ ∗ U = I and we

have proved the following:

1.12. NUMBER THEORETIC FUNCTIONS 41

Obs. 1.12.17 µ−1 = U ; U−1 = µ.

Theorem 1.12.18 Mobius Inversion: F = U ∗ f iff f = µ ∗ F .

This follows from µ−1 = U and associativity. In its more usual form itappears as:

F (n) =∑d|nf(d) ∀n ∈ P iff f(n) =

∑d|nµ(d)F (n/d) ∀n ∈ P .

NOTE: Here we sometimes say F is the sum function of f . When Fand f are related this way it is interesting to note that F is multiplicative ifand only if f is multiplicative. For if f is multiplicative, then F = U ∗ f ismultiplicative. Conversely, if F = U ∗ f is multiplicative, then µ ∗ F = f isalso multiplicative.

Exercise: 1.12.19 1. τ = U ∗U is multiplicative, and τ(n) =∏pα||n(α+

1).

2. φ = µ ∗ E is multiplicative. (First show E = φ ∗ U .)

3. σ = U ∗ E is multiplicative and σ(n) =∏pα||n

(pα+1−1p−1

).

4. φ ∗ τ = σ.

5. σ ∗ φ = E ∗ E.

6. E−1(n) = nµ(n).

Sometimes it is useful to have even more structure on CP . For f, g ∈ CP ,define the sum of f and g as follows:

(f + g)(n) = f(n) + g(n)

Then a large part of the following theorem has already been proved andthe remainder is left as an exercise.

Theorem 1.12.20 With the above definitions of addition and convolutionproduct, (CP ,+, ∗) is a commutative ring with unity I, and f ∈ CP is a unitiff f(1) 6= 0.


Exercise: 1.12.21 For g ∈ CP , define g ∈ CP by g(n) = ng(n). Show that

g 7→ g is a ring automorphism. In particular, ˆg−1 = (g)−1.

Exercise: 1.12.22 In how many ways can a “necklace” with n beads beformed out of beads labeled L, R, 1, 2, . . . , m so there is at least one L,and the L’s and R’s alternate (so that the number of L’s is the same as thenumber of R’s)?

1.13 Inclusion – Exclusion

Let E be a set of N objects. Let a1, a2, . . . , am be a set of m propertiesthat these objects may or may not have. In general these properties are notmutually exclusive. Let Ai be the set of obejcts in E that have property ai.In fact, it could even happen that Ai and Aj are the same set even when i andj are different. Let N(ai) be the number of objects that have the propertyai. Let N(a′i) be the number of objects that do not have property ai. ThenN(aia

′j) denotes the number of objects that have property ai but do not have

property aj. It is easy to see how to generalize this notation and to establishidentities such as the following:

N = N(ai) +N(a′i); N = N(aiaj) +N(aia′j) +N(a′iaj) +N(a′ia

′j). (1.15)

We now introduce some additional notation.

s0 = N ;

s1 = N(a1) +N(a2) + · · ·+N(am) =∑iN(ai);

s2 = N(a1a2) +N(a1a3) + · · ·+N(am−1am) =∑i6=j N(aiaj);

s3 = N(a1a2a3) + · · ·+N(am−2am−1am) =∑

1≤i<j<k≤mN(aiajak);

...

sm = N(a1a2 · · · am).

Also,

1.13. INCLUSION – EXCLUSION 43

e0 = N(a′1a′2 · · · a′m);

e1 = N(a1a′2a′3 · · · a′m) +N(a′1a2a

′3 · · · a′m) + · · ·+N(a′1a

′2 · · · a′m−1am);

...

em = N(a1a2 · · · am).

In other words, ei is the number of objects that have exactly i properties.

Theorem 1.13.1 For 0 ≤ r ≤ m, we have

er =m−r∑j=0

(−1)j(r + j

j

)sr+j.

Proof: Clearly, if an object has fewer than r of the properties, then itcontributes zero to both sides of the equation. Suppose an object has exactlyr of the properties. Then it contributes exactly 1 to the left hand side. Onthe right hand side it contributes 1 to the term with j = 0 and it contributes0 to all the other terms. Suppose an object has exactly r+k properties with0 ≤ k ≤ m − r, so it contributes exactly 1 to the the left hand side. Onthe right hand side it contributes exactly

(r+kr+j

)to sr+j. So the total count it

contributes to the right hand side is

j=m−r∑j=0

(r + j

j

)(r + k

r + j

)(−1)j.

Notice that (r + j

j

)(r + k

r + j

)=

(r + j)!

r!j!

(r + k)!

(r + j)!(k − j)!

=(r + k)!

r!k!

k!

j!(k − j)!=

(r + k

r

)(k

j

).

Thus the total count on the right hand side is

(r + k

r

)j=k∑j=0

(k

j

)(−1)j

= 0.


This concludes the proof.

Put S(x) =∑i=mi=0 six

i and E(x) =∑i=mi=0 eix

i.Using Theorem 1.13.1 we see that

E(x) =m∑r=0

erxr =

m∑r=0

m−r∑j=0

(−1)j(r + j

j

)sr+j

xr

=∑

0≤r≤m

0≤j≤m−r

(−1)j(r + j

j

)xrsr+j =

m∑k=0

sk

(k∑r=0

(−1)k−r(

k

k − r

)xr)

=m∑k=0

sk(x− 1)k = S(x− 1).

This proves the following:

Theorem 1.13.2 E(x) = S(x− 1).

Of course, it now follows that E(x + 1) = S(x), from which we easilydeduce the following results:

sj =m∑k=j

(k

j

)ek. (1.16)

E(0) = S(−1) =m∑i=0

(−1)isi. (1.17)

1

2[E(1) + E(−1)] =

∑i

e2i =1

2

s0 +r∑j=0

(−2)jsj

. (1.18)

1

2[E(1)− E(−1)] =

∑i

e2i+1 =1

2

s0 −r∑j=0

(−2)jsj

. (1.19)

Equation 1.17 is the traditional inclusion-exclusion principle. Equation 1.18gives the number of objects having an even number of properties, and Equa-tion 1.19 gives the number of objects having an odd number of properties.We can also easily find a formula for the number of objects having at least tof the properties.

1.13. INCLUSION – EXCLUSION 45

Exercise: 1.13.3 For k ≥ 0, t ≥ 0, show that∑kj=0

(t+kj

)= (−1)k

(t+k−1k

).

Theorem 1.13.4 The number of objects having at least t of the m propertiesis given by ∑

r≥ter =

m−t∑j=0

(−1)j(t+ j − 1

t− 1

)st+j.

Proof: The proof of this result amounts to collecting terms appropri-ately in the left hand sum and observing that the coefficient on st+j is∑ji=0(−1)i

(t+ji

), which by Ex. 1.13.3 is equal to (−1)j

(t+j−1t−1

). In detail,

and using Theorem 1.13.1,

m∑r=t

er =m∑r=t

m−r∑j=0

(−1)j(r + j

j

)sr+j.

Here t and m are fixed with 0 ≤ t ≤ m, and r and j are dummy variableswith t ≤ r ≤ m and 0 ≤ j ≤ m − r. We introduce new dummy variables kand i by the invertible substitution r = t + i, j = k − i. The constraints onk and i are 0 ≤ i ≤ k ≤ m− t. So continuing, we have

m∑r=t

er =∑

i,k:0≤i≤k≤m−t(−1)k−i

(t+ k

k − i

)st+k =

m−t∑k=0

[k∑i=0

(−1)k−i(t+ k

k − i

)]st+k

=m−t∑k=0

k∑j=0

(−1)j(t+ k

j

) st+k =m−t∑k=0

(−1)k(t+ k − 1

k

)st+k.

If we replace each property ai with its negation a′i, and use the notationsi, S(x), ei, etc., for the corresponding analogues of si, ei, etc., we can seehow to write S(x) in terms of the si.

Theorem 1.13.5

S(x) =m∑r=0

(∑k

(−1)k(m− k

m− r

)sk

)xr.

Proof: It is clear that ei = em−i, i.e., E(x) is the reverse xmE(

1x

)of

E(x). Recall that S(x) = E(x+ 1). Then


S(x) = E(x+ 1) = (x+ 1)mE(

1

x+ 1

)= (x+ 1mE

(1 +

−xs+ 1

)

= (x+ 1)mS( −xx+ 1

)=

m∑k=0

(−x)k(x+ 1)m−ksk.

The coefficient of xr in this expression is

[xr]

∑k

(−x)k(x+ 1)m−ksk

=∑k

[xr−k](−1)k(x+ 1)m−ksk

=∑k

(−1)k(m− k

r − k

)sk,

from which the theorem follows.

Application to derangements: Let Dn be the number of permutationsσ = b1 · · · bn of 1, 2, . . . , n for which bi 6= i for all i, i.e., Dn is the numberof derangements of n things. Here the N objects are the n! permutations.The property ai is defined by: The permutation σ has property ai provided

bi = i. Then N(ai1 · · · air) = (n − r)!, and sj =

(nj

)(n − j)! = n!

j!. So

e0 =∑np=0(−1)p n!

p!= n!

∑np=0

(−1)p

p!.

Problem des Rencontres: Let Dn,r be the number of permutationsσ = b1 · · · bn with exactly r fixed elements, i.e., bj = j for r values of j.

Choose r fixed symbols in

(nr

)ways, and multiply by the number Dn−r of

derangements on the remaining n− r elements.

Dn,r = n!r!(n−r)!

[(n− r)!( 1

0!− 1

1!+ 1

2!+ · · · ± 1

(n−r)!)]. Or,

Dn,r =∑n−rp=0(−1)p

(r + pr

)n!

(r+p)!= n!

r!

∑n−rp=0(−1)p 1

p!.

Application to Euler’s Phi function: Let n = pa1 · · · par be the primepower factorization of the positive integer n. Apply the Inclusion–ExclusionPrinciple with E = [n] = 1, . . . , n, and let ai be the property (of a positiveinteger) that it is divisible by pi, 1 ≤ i ≤ r. This yields

φ(n) = n−r∑i=1

n

pi+

∑1≤i<j≤r

n

pipj− · · · = n

r∏i=1

(1− 1

pi

).

1.14. ROOK POLYNOMIALS 47

1.14 Rook Polynomials

Let C be an n ×m matrix, 1 ≤ n ≤ m, each of whose entries is a 0 or a 1.A line of C is a row or column of C. An independent k-set of C is a set of k1’s of C with no two on the same line. Given the matrix C, for 0 ≤ r ≤ n welet rk(C) = rk be the number of independent k-sets of C. A diagonal of C isa set of n entries of C with no two in the same line, i.e., a set of n entries ofC with one in each row and no two in the same column. Let E be the set ofdiagonals of C and let N = |E| = m(m − 1) · · · (m − n + 1) be the numberof diagonals of C. Let aj be the property (that a diagonal may or may nothave) that the entry in row j of the diagonal is a 1.

If we select an independent j-set of C (in rj ways) and then in the re-maining n− j rows select n− j entries in (m− j)(m− j − 1) · · · (m− n+ 1)ways, we see that using the notation of the inclusion-exclusion principle wehave

sj = rj · (m− j)(m− j − 1) · · · (m− n+ 1) =(m− j)!

(m− n)!· rj. (1.20)

The number of diagonals of C with exactly j 1’s is clearly the same asthe number of diagonals of C with exactly n− j 0’s. If J is the n×m (0, 1)-matrix each entry of which is a 1s, then C = J − C is the complement of C.If E(x) =

∑mi=0 eix

i where ei is the number of diagonals of C having exactlyi 1’s of C, then the reverse polynomial is given by

E(x) = xnE(1

x) =

m∑i=0

eixi =

m∑i=0

en−i,

where ei is the number of diagonals of C with exactly i 0’s of C. Clearlyei = en−i.

∑k

(m− k)!

(m− n)!rkx

k =∑k

skxk = S(x)

= E(x+ 1) = (x+ 1)nE(

1

x+ 1

)= (x+ 1)nE

(1 +

−xx+ 1

)

=(x+ 1)n

(m− n)!

∑i

ri(m− i)!E( −xx+ 1

)i


=∑i

ri ·(m− i)!

(m− n)!(−x)i(x+ 1)n−i.

The coefficient of xk in the first term of this sequence of equal expressionsisclearly (m−k)!

(m−n)!rk. The coefficient of xk in the last term is

[xk]

∑i

ri(m− i)!

(m− n)!(−x)i(x+ 1)n−i

=∑i

[xk−i]

ri

(m− i)!

(m− n)!(−1)i(x+ 1)n−i

=∑i

(−1)i(m− i)!

(m− n)!

(n− i

k − i

).

Hence we have established the following theorem.

Theorem 1.14.1

rk =∑i

(−1)i(m− i)!

(m− k)!

(n− i

n− k

)ri.

There is a special case that is often used.

Corollary 1.14.2 Suppose m = n. Then

rn =∑i

(−1)i(n− i)! · ri.

For a given n×m (0,1)-matrix C we continue to let rk denote the numberof independent k-sets of C, and let

RC(x) = R(x) =n∑k=0

rkxk

be the ordinary generating function of the sequence (r0, r1, r2, . . .). ThenR(x) is the rook polynomial of the given matrix.

If C is the direct sum of two (0, 1)-matrices C1 and C2, i.e., no line of Ccontains 1’s in both C1 and C2, it is easy to see that the independent sets ofC1 are completely independent of the independent sets of C2. It follows thatrk(C) =

∑kj=0 rj(C1)rk−j(C2), and hence

1.14. ROOK POLYNOMIALS 49

RC(x) = RC1(x)RC2(x) (1.21)

It is also easy to see that if some one line of C contains all the 1’s of C,then RC(x) = 1 + ax, where a is the number of 1’s of C.

Suppose that in a given matrix C an entry 1ij (in row i and column j) isselected and marked as a special entry. Let C ′ denote the matrix obtained bydeleting row i and column j of the matrix C, and let C ′′ denote the matrixobtained by replacing the entry 1ij of C with a 0ij. Then the independentk-sets of C are naturally divided into two classes: those that have a 1 in rowi and column j and those that do not. The number of independent k-sets ofthe first type is rk−1(C

′) and the number of the second type is rk(C′′). Hence

we have the relation

rk(C) = rk−1(C′) + rk(C

′′).

It is now easy to see that we have

RC(x) = xRC′(x) +RC′′(x). (1.22)

Equation 1.22 is called the expansion formula. The rook polynomial of a(0, 1)-matrix of arbitrary size and shape may be found by repeated appli-cations of the expansion formula. To facilitate giving an example of this,let 1 1 0 0

0 1 1 00 0 1 1′

denote the rook polynomial of the displayed matrix and the 1′ indicates theentry about which the expansion formula is about to be applied. Then bythe expansion formula we have (since we write the matrix to mean its rookpolynomial)

RC(x) = x

(1 1 00 1 1′

)+

1 1 0 00 1′ 1 00 0 1 0

= x

[x(

1 1)

+

(1 1 00 1′ 0

)]+

x( 1 0 00 1 0

)+

1 1 0 00 0 1 00 0 1 0


= x2(1 + 2s) + x

[x(1 0) +

(1 1 00 0 0

)]+ x(1 + x)2 + (1 + 2x)2

= x2 + 2x3 + x2(1 + x) + x(1 + 2x) + x+ 2x2 + x3 + 1 + 4x+ rx2

= 4x3 + 10x2 + 6x+ 1.

Exercise: Compute the rook polynomials of the following matrices:

a.

(1 11 0

).

b.

0 1 01 1 11 1 0

.

(Answer: 1 + 6x+ 7x2 + x3.)

c.

1 1 1 1 11 0 1 1 10 0 1 1 00 0 1 1 1

.

1.15 Permutations With forbidden Positions

Consider the distribution of four distinct objects, labeled a, b, c, and d, intofour distinct positions, labeled 1, 2, 3, and 4, with no two objects occupyingthe same position. A distribution can be represented in the form of a ma-trix as illustrated below, where the rows correspond to the objects and thecolumns correspond to the positions. A 1 in a cell indicates that the objectin the row containing the cell occupies the position in the column containingthe cell. Thus, the distribution shown in the figure is as follows: a is placedin the second position, b is placed in the fourth position, c is placed in thefirst position, and d is placed in the third position.

abcd

0 1 0 00 0 0 11 0 0 00 0 1 0

.

1.15. PERMUTATIONS WITH FORBIDDEN POSITIONS 51

Since an object cannot be placed in more than one position and a positioncannot hold more than one object, in the matrix representation of an accept-able distribution there will never be more than one 1 in a row or column.Hence an acceptable distribution is equivalent to an independent 4-set of 1’s.We can extend this notion to the case where there are forbidden positionsfor each of the objects. For example, for the derangement of four objects,the forbidden positions are just those along the main diagonal. Also, it iseasy to see that rk(x) =

(4k

), so that RI(x) = (1 + x)4. Hence the problem

of enumerating the number of derangements of four objects is equivalent tothe problem of finding the value of r4 for the complementary matrix

0 1 1 11 0 1 11 1 0 11 1 1 0

.

By Theorem 1.14.1, r4 =∑4i=0(−1)i (4−i)!

(4−4)!

(4−i4−4

)ri =

∑4i=0(−1)i(4 − i)! ×(

4i

)=∑4i=0(−1)i 1

i!. Of course, this agrees with the usual formula.

Nontaking Rooks A chess piece called a rook can capture any oppo-nent’s piece in the same row or column of the given rook (provided there areno intervening pieces). Instead of using a normal 8× 8 chessboard, supposewe “play chess” on the “board” consisting solely of those positions of an n×m(0,1)-matrix where the 1’s appear. Counting the number of ways to place kmutually nontaking rooks on this board of entries equal to 1 is equivalent toour earlier problem of counting the number of independent k-sets of 1’s inthe matrix. Consider the example represented by the following matrix:

B =

1 1 0 0 11 1 1 0 11 0 1 1 01 1 1 1 11 1 1 1 0

Thus rk(B) counts the number of ways k nontaking rooks can be placed

in those entries of B equal to 1. The 5 × 5 matrix B could be consideredtohave arisen from a job assignment problem. The rows correspond to workers,the columns to jobs, and the (i, j) entry is a 1 provided worker i is suitable


for job j. We wish to determine the number of ways in which each worker canbe assigned to one job, no more than one worker per job, so that a workeronly gets a job to which he or she is suited. It is easy to see that this isequivalent to the problem of computing r5(B). Since there are several more1’s than 0’s, it might be easier to deal with the complementary matrix

B′ =

0 0 1 1 00 0 0 1 00 1 0 0 10 0 0 0 00 0 0 0 1

Easy Exercise: Show that if the matrix C ′ is obtained from matrix C

by deleting rows or columns with no entries equal to 1, then rk(C′) = rk(C).

Let B′′ be the matrix obtained by deleting column 1 and row 4 from B′.So

B′′ =

0 1 1 00 0 1 01 0 0 10 0 0 1

By Equation 1.21, we see that rB′′(x) = rC1(x)× rC2(x),where

C1 = C2 =

(1 10 1

).

We easily compute rC1(x) = 1 + 3x+ x2, so

rB′(x) = rB′′(x) = (1 + 3x+ x2)2 = 1 + 6x+ 11x2 + 6x3 + x4.

Now using (the dual of) Theorem 1.14.1 we find

r5(B) =∑i

(−1)i(5− i)!

(5− 5)!

(5− i

5− 5

)ri

=∑i

(−1)i(5− i)!ri = 5!× 1− 4!× 6 + 3!× 11− 2!× 6 + 1!× 1− 0!× 0 = 31.

The next example is a 5×7 matrix B that arises from a problem of storingcomputer programs. The (i, j) position is a 1 provided storage location j has

1.15. PERMUTATIONS WITH FORBIDDEN POSITIONS 53

sufficient storage capacity for program i. We wish to assign each program toa storage location with sufficient storage capacity, at most one program perlocation. The number of ways this can be done is again given by r5(B).

B =

1 0 0 0 1 0 01 0 1 0 1 0 01 1 1 0 1 0 11 1 1 0 1 0 01 1 1 1 1 1 1

Exercise Compute r5(B).

Probleme des Menages A menage is a permutation of 1, 2, . . . , n in whichi does not appear in position i or i+ 1 (mod n). Let Pn be the permutationmatrix with a 1 in position (i, i + 1) (mod n), 1 ≤ i ≤ n. So Pn representsthe cycle (1, 2, 3 . . . , n). Then let Mn = In + Pn, and let Mn(x) be the rookpolynomial of Mn. If Jn is the n× n matrix of 1’s, then en(Jn − In − Pn) isthe number Un of menages.

Let M∗n be obtained from Mn by changing the 1 in position (n, 1) to a 0,

and let M0n be obtained from Mn by changing both 1’s of column 1 to 0’s

(i.e., the 1’s in positions (1,1) and (n, 1) become 0). It should be clear aftera little thought (using the expansion formula and the fact that a matrix andits transpose have the same rook polynomial), that

Mn(x) = M∗n(x) + xM∗

n−1(x).

Since M0n has only zeros in its first column, M0

n(x) = M0n(x), where

M0n =

11 1

1 1· · ·

1 11

with 1’s in positions (1, 1), (2, 1), (2, 2), (3, 2), (3, 3), . . . , (n−1, n−2), (n, n−1). Here M0

n is n× (n− 1). Now select the 1 in position (n, n− 1) to use theexpansion theorem for rook polynomials. So deleting the row and columncontaining 1(n,n−1), we get M0

n−1. Also, replacing 1(n,n−1) with a zero andthen removing the bottom row of 0’s, we get (M∗

n−1)T . Since a matrix and

its transpose have the same rook polynomial, we have


M0n(x) = x ·M0

n−1(x) +M∗n−1(x). (1.23)

M∗n has one 1 in its bottom row, in position (n, n). Expand about this 1.

Deleting the row and column of this 1 gives m∗n−1. Changing this 1 to a 0

and deleting the bottom row of zeros gives the transpose of M0n. Hence

M∗n(x) = x ·M∗

n−1(x) +M0n(x) = (x+ 1)M∗

n−1(x) + x ·M0n−1(x). (1.24)

Rewrite the last two equations as a matrix equation:(M0

n(x)M∗

n(x)

)=

(x 1x x+ 1

)(M0

n−1(x)M∗

n−1(x)

). (1.25)

M∗1 = (1) −→ M∗

1 (x) = 1 + x.

M01 = (0) −→ M0

1 (x) = 1.

M∗2 =

(1 10 1

)−→ M∗

2 (x) = 1 + 3x+ x2.

M02 =

(0 10 1

)−→ M0

2 (x) = 1 + 2x.

Induction Hypothesis (for n ≥ 0):

M0n(x) =

n−1∑k=0

(2(n− 1)− k + 1

k

)xk;

M∗n(x) =

n∑k=0

(2n− k

k

)xk.

Then (x 1x x+ 1

)(M0

n(x)M∗

n(x)

)

=

∑n−1k=0

(2(n−1)−k+1

k

)xk+1 +

∑nk=0

(2n−kk

)xk∑n−1

k=0

(2(n−1)−k+1

k

)xk+1 +

∑nk=0

(2n−kk

)(xk+1 + xk)

1.16. RECURRENCE RELATIONS: MENAGE NUMBERS AGAIN 55

=

∑nk=1

(2n−kk−1

)xk +

∑nk=0

(2n−kk

)∑nk=1

(2n−kk−1

)xk +

∑n+1k=1

(2n−k+1k−1

)xk +

∑nk=0

(2n−kk

)xk

At this point we can easily compute that

Mn(x) = M∗N(x) + xM∗

n−1 =n∑j=1

(2n− j

j

)2n

2n− jxj

−→ rj(I +P ) =

(2n− j

j

)2n

2n− j−→ sj(I +P ) = (n− j)!

(2n− j

j

)2n

2n− j

−→ en(J − I − P ) = en(I + P ) = e0(I + P )

=n∑i=0

(−1)i(n− j)!

(2n− j

j

)2n

2n− j= Un.

1.16 Recurrence Relations: Menage Numbers

Again

The original Probleme des Menages was probably that formulated byLucas. This asks for the number of ways of seating n married couples at acircular table with men and women in alternate positions and such that nowife sits next to her husband. The wives may be seated first, and this may bedone in 2n! ways. Then each husband is excluded from the two seats besidehis wife, but the number of ways of seating the husbands is independent ofthe seating arrangement of the wives. Thus is Mn denotes the number ofseating arrangements for this version of the probleme des menages, it is clearthat

Mn = 2n!Un.

Consequently we may concentrate our attention on the menage numbersUn. The formula we derived using rook polynomials will now be obtainedusing recursion techniques.

Lemma 1.16.1 Let f(n, k) denote the number of ways of selecting k objects,no two consecutive, from n objects arranged in a row. Then

f(n, k) =

(n− k + 1

k

). (1.26)


Proof: Clearly

f(n, 1) =

(n

1

)= n,

and for n > 1,

f(n, n) =

(1

n

)= 0.

Now let 1 < k < n. Split the selections into those that include the firstobject and those that do not. Those that include the first object cannotinclude the second object and are enumerated by f(n− 2, k− 1). Those thatdo not include the first object are enumerated by f(n− 1, k). Hence we havethe recurrence

f(n, k) = f(n− 1, k) + f(n− 2, k − 1). (1.27)

We may now prove Eq. 1.26 by strong induction on n. Our inductionhypothesis includes the assertions that

f(n− 1, k) =

(n− k

k

); f(n− 2, k − 1) =

(n− k

k − 1

).

These together with Eq. 1.27 clearly imply Eq. 1.26.

Lemma 1.16.2 Let g(n, k) denote the number of ways of selecting k objects,no two consecutive, from n objects arranged in a circle. Then

g(n, k) =n

n− k

(n− k

k

))n > k).

Proof: As before, split the selections into those that inlcude the firstobject and those that do not. the selections that include the first objectcannot include the second object or the last object and are enumerated by

f(n− 3, k − 1).

The selections that do not include the first object are enumerated by

f(n− 1, k).

Henceg(n, k) = f(n− 1, k) + f(n− 3, k − 1),

1.16. RECURRENCE RELATIONS: MENAGE NUMBERS AGAIN 57

and Lemma 1.16.2 is an easy consequence of Lemma 1.16.1.Now return again to the consideration of the permutations of 1, 2, . . . , n.

Let ai be the property that a permutation has i in position i, 1 ≤ i ≤ n, andlet bi be the property that a permutation has i in position i+1, 1 ≤ i ≤ n−1,with bn the property that the permutation has n in position 1. Now let the2n properties be listed in a row:

a1, b1, a2, b2, . . . , an, bn.

Select k of these properties and ask for the number of permutations thatsatisfy each of the k properties. The answer is 0 if the k properties are notcompatible. If they are compatible, then k images under the permutationare fixed and there are (n − k)! ways to complete the permutation. Let vkdenote the number of ways of selecting k compatible properties from the 2nproperties. Then by the classical inclusion-exclusion principle,

Un =n∑i=0

(−1)ivi(n− i)!. (1.28)

It remains to evaluate vk. But we see that if the 2n properties are arrangedin a circle, then only the consecutive ones are not compatible. Hence byLemma 1.16.2,

vk =2n

2n− k

(2n− k

k

), (1.29)

and this completes the proof.

Chapter 2

Systems of Representatives andMatroids

2.1 The Theorem of Philip Hall

The material of this chapter does not belong to enumerative combinatorics,but it is of such fundamental importance in the general field of combinatoricsthat we feel impelled to include it.

Let S and I be arbitrary sets. For each i ∈ I let Ai ⊆ S. If ai ∈ Ai for alli ∈ I, we say ai : i ∈ I is a system of representatives for A = (Ai : i ∈ I).If in addition ai 6= aj whenever i 6= j, even though Ai may equal Aj, thenai : i ∈ I is a system of distinct representatives (SDR) for A. Our firstproblem is: Under what conditions does some family A of subsets of a set Shave an SDR?

For a finite collection of sets a reasonable answer was given by Philip Hallin 1935. It is obvious that if A = (Ai : i ∈ I) has an SDR, then the unionof each k of the members of A = (Ai : i ∈ I) must have at least k elements.Hall’s observation was that this obvious necessary condition is also sufficient.We state the condition formally as follows:

Condition (H) : Let I = [n] = 1, 2, . . . , n, and let S be any (nonempty)set. For each i ∈ I, let Ai ⊆ S. Then A = (S1, . . . , Sn) satisfies Condition(H) provided for each K ⊆ I, | ∪k∈K Sk| ≥ |K|.

Theorem 2.1.1 The family A = (S1, . . . , Sn) of finitely many (not neces-sarily distinct) sets has an SDR if and only if it satisfies Condition (H).

67

68CHAPTER 2. SYSTEMS OF REPRESENTATIVES AND MATROIDS

Proof: As Condition (H) is clearly necessary, we now show that it is alsosufficient. Br,s denotes a block of r subsets (Si1 , . . . , Sir) belonging to A,where s = | ∪ Sj : Sj ∈ Br,s|. So Condition (H) says: s ≥ r for each blockBr,s. If s = r, Br,s is called a critical block. (By convention, the empty blockB0,0 is critical.)

If Br,s = (A1, . . . , Au, Cu+1, . . . , Cr) andBt,v = (A1, . . . , Au, Du+1, . . . , Dt), write Br,s ∩Bt,v =(A1, . . . , Au); Br,s ∪ Bt,v = (A1, . . . , Au, Cu+1, . . . , Cr, Du+1, . . . , Dt). Herethe notation implies that A1, . . . , Au are precisely the subsets in both blocks.Then write

Br,s∩Bt,v = Bu,w, where w = |∪Ai : 1 ≤ i ≤ u|, and Br,s∪Bt,v = By,z,where y = r + t− u, z = | ∪ Si : Si ∈ Br,s ∪Bt,v|.

The proof will be by induction on the number n of sets in the family A,but first we need two lemmas.

Lemma 2.1.2 If A satisfies Condition (H), then the union and intersectionof critical blocks are themselves critical blocks.

Proof of Lemma 2.1.2. Let Br,r and Bt,t be given critical blocks. Say Br,r ∩Bt,t = Bu,v; Br,r ∪ Bt,t = By,z. The z elements of the union will be the r + telements of Br,r and Bt,t reduced by the number of elements in both blocks,and this latter number includes at least the v elements in the intersection:z ≤ r + t− v. Also v ≥ u and z ≥ y by Condition (H). Note: y + u = r + t.Hence r+ t− v ≥ z ≥ y = r+ t−u ≥ r+ t− v, implying that equality holdsthroughout. Hence u = v and y = z as desired for the proof of Lemma 2.1.2.

Lemma 2.1.3 If Bk,k is any critical block of A, the deletion of elements ofBk,k from all sets in A not belonging to Bk,k produces a new family A′ inwhich Condition (H) is still valid.

Proof of Lemma2.1.3. Let Br,s be an arbitrary block, and (Br,s)′ = B′

r,s′

the block after the deletion. We must show that s′ ≥ r. Let Br,s∩Bk,k = Bu,v

and Br,s ∪Bk,k = By,z. Say

Br,s = (A1, . . . , Au, Cu+1, . . . , Cr),

Bk,k = (A1, . . . , Au, Du+1, . . . , Dk).

2.1. THE THEOREM OF PHILIP HALL 69

So Bu,v = (A1, . . . , Au), By,z = (A1, . . . , Au, Cu+1, . . . , Cr, Du+1, . . . , Dk).The deleted block (Br,s)

′ = B′r,s′ is (A1, . . . , Au, C

′u+1, . . . , C

′r). But Cu+1, . . . , Cr,

as blocks of the union By,z, contain at least z− k elements not in Bk,k. Thuss′ ≥ v + (z − k) ≥ u + y − k = u + (r + k − u) − k = r. Hence s′ ≥ r, asdesired for the proof of Lemma 2.1.3.

As indicated above, for the proof of the main theorem we now use induc-tion on n. For n = 1 the theorem is obviously true.

Induction Hypothesis: Suppose the theorem holds (Condition (H) impliesthat there is an SDR) for any family of m sets, 1 ≤ m < n.

We need to show the theorem holds for a system of n sets. So let 1 <n, assume the induction hypothesis, and let A = (S1, . . . , Sn) be a givencollection of subsets of S satisfying Condition (H).

First Case: There is some critical block Bk,k with 1 ≤ k < n. Deletethe elements in the members of Bk,k from the remaining subsets, to obtaina new family A′ = Bk,k ∪ B′

n−k,v, where Bk,k and B′n−k,v have no common

elements in their members. By Lemma 2.1.3, Condition (H) holds in A′, andhence holds separately in Bk,k and in B′

n−k,v viewed as families of sets. Bythe induction hypothesis, Bk,k and B′

n−k,v have (disjoint) SDR’s whose unionis an SDR for A.

Remaining Case: There is no critical block for A except possibly theentire system. Select any Sj of A and then select any element of Sj as itsrepresentative. Delete this element from all remaining sets to obtain a familyA′. Hence a block Br,s with r < n becomes a block B′

r,s′ with s′ ∈ s, s− 1.By hypothesis Br,s was not critical, so s ≥ r + 1 and s′ ≥ r. So Condition(H) holds for the family A′ \ Sj, which by induction has an SDR. Add tothis SDR the element selected as a representative for Sj to obtain an SDRfor A.

In the text by van Lint and Wilson, Theorem 5.3 gives a lower bound onthe number of SDR’s for a family of sets that depends only on the sizes ofthe the sets. It is as follows.

Theorem 5.3 of van Lint and Wilson: Let A = (S0, S1, . . . , Sn−1) bea family of n sets that does have an SDR. Put mi = |Si| and suppose thatm0 ≤ m1 ≤ · · · ≤ mn−1. Then the number of SDR’s for A is greater than orequal to


Fn(m0,m1, . . . ,mn−1) :=n−1∏i=0

(mi − i)∗,

where (a)∗ := max1, a.

They leave as an exercise the problem of showing that this is the bestpossible lower bound depending only on the size of the sets.

Exercise: 2.1.4 Let A = (A1, . . . , An) be a family of subsets of 1, . . . , n.Suppose that the incidence matrix of the family is invertible. Show that thefamily has an SDR.

Exercise: 2.1.5 Prove the following generalization of Hall’s Theorem:Let A = (A1, . . . , An) be a family of subsets of X that satisfies the follow-

ing property: There is an integer r with 0 ≤ r < n for which the union ofeach subfamily of k subsets of A, for all k with 0 ≤ k ≤ n, has at least k− relements. Then there is a subfamily of size n− r which has an SDR. (Hint:Start by adding r “dummy” elements that belong to all the sets.)

Exercise: 2.1.6 Let G be a (finite, undirected, simple) graph with vertex setV . Let C = Cx : x ∈ V be a family of sets indexed by the vertices ofG. For X ⊆ V , let CX = ∪x∈XCx. A set X ⊆ V is C-colorable if one canassign to each vertex x ∈ X a “color” cx ∈ Cx so that cx 6= cy whenever xand y are adjacent in G. Prove that if |CX | ≥ |X| whenever X induces aconnected subgraph of G, then V is C-colorable. (In the current literature ofgraph theory, the sets assigned to the vertices are called lists, and the desiredproper coloring of G chosen from the lists is a list coloring of G. When G isa complete graph, this exercise gives precisely Hall’s Theorem on SDR’s. Acurrent research topic in graph theory is the investigation of modifications ofthis condition that suffice for the existence of list colorings.

Exercise: 2.1.7 With the same notation of the previous exercise, prove thatif every proper subset of V is C-colorable and |CV | ≥ |V |, then V is C-colorable.

2.1. THE THEOREM OF PHILIP HALL 71

We now interpret the SDR problem as one on matchings in bipartitegraphs. Let G = (X, Y, E) be a bipartite graph. For each S ⊆ X, let N(S)denote the set of elements of Y connected to at least one element of S byan edge, and put δ(S) = |S| − |N(S)|. Put δ(G) = maxδ(S) : S ⊆ X.Since δ(∅) = 0, clearly δ(G) ≥ 0. Then Hall’s theorem states that G has anX-saturating matching if and only if δ(G) = 0.

Theorem 2.1.8 G has a matching of size t (or larger) if and only if t ≤|X| − δ(S) for all S ⊆ X.

Proof: First note that Hall’s theorem says that G has a matching of sizet = |X| if and only if δ(S) ≤ 0 for all S ⊆ X iff |X| ≤ |X| − δ(S) forall S ⊆ X. So our theorem is true in case t = |X|. Now suppose thatt < |X|. Form a new graph G′ = (X, Y ∪ Z, E ′) by adding new verticesZ = z1, . . . , z|X|−t to Y , and join each zi to each element of X by an edgeof G′.

If G has a matching of size t, then G′ has a matching of size |X|, implyingthat for all S ⊆ X,

|S| ≤ |N ′(S)| = |N(S)|+ |X| − t,

implying

|N(S)| ≥ |S| − |X|+ t = t− (|X| − |S|) = t− |X \ S|.

This is also equivalent to t ≤ |X| − (|S| − |N(S)|) = |X| − δ(S).

Conversely, suppose |N(S)| ≥ t−|X\S| = t−(|X|−|S|). Then |N ′(S)| =|N(S)|+ |X| − t ≥ (t− |X|+ |S|) + |X| − t = |S|. By Hall’s theorem, G′ hasan X-saturating matching M . At most |X| − t edges of M join X to Z, soat least t edges of M are from X to Y .

Note that t ≤ |X| − δ(S) for all S ⊆ X iff t ≤ minS⊆X(|X| − δ(S)) =|X| −maxS⊆Xδ(S) = |X| − δ(G).

Corollary 2.1.9 The largest matching of G has size |X| − δ(G) = m(G),i.e., m(G) + δ(G) = |X|.


2.2 An Algorithm for SDR’s

Suppose sets S1, . . . , Sn are given and we have picked an SDRAr = a1, . . . , arof S1, . . . , Sr in any way. Here is how to find an SDR for S1, . . . , Sr, Sr+1, orto determine that S1, . . . , Sr, Sr+1 does not satisfy Hall’s Condition (H).

Construct ordered sets T1, T2, . . . , etc., as follows. Put T1 = Sr+1 =b1, . . . , bt. If some bi is not yet used in Ar, let ar+1 = bi. Otherwise,assume that all the elements b1, . . . , bt are already in Ar. Form T2 as follows.First, let S(b1) denote the Sj for which b1 = aj. Then

T2 = b1, b2, . . . , bt; bt+1, . . . , bs,

where bt+1, . . . , bs are the elements in S(b1) not already in T1.

If some one of bt+1, . . . , bs is not in Ar, use it to represent S(b1) and useb1 to represent Sr+1. Leave the other ai’s as before.

Each list Tj looks like Tj = b1, b2, . . . , bk, bk+1, . . . , bm. If all membersof Tj are in Ar, construct

Tj+1 = b1, . . . , bk, bk+1, bk+2, . . . , bm, (list here any members of S(bk+1)

not already listed).

If some bm+s, s > 0, is not in Ar, let bm+s represent S(bk+1). Then ifbk+1 ∈ S(bj) \ S(bj−1), let bk+1 represent S(bj). And if bj ∈ S(bi) \ S(bi−1),let bj represent S(bi). Then bi ∈ S(bu) \ S(bu−1). Let bi represent S(bu).Eventually, working down subscripts, some bp ∈ S(bj) with bj ∈ T1. Let bjrepresent Sr+1, and let bp represent S(bj). (Each bj is in some S(bi) withi < j.)

Exercise: 2.2.1 You have seven employees P1, . . . , P7 and seven jobs J1, . . . , J7.You ask each employee to select two jobs. They select jobs as given below.You start by assigning to P1 the job J2 and to P2 the job J6. Illustrate ouralgorithm for producing systems of distinct representatives (when they exist)to complete this job assignment (if it is possible).

P1 selects jobs numbered 1 and 2.P2 selects jobs numbered 5 and 6.P3 selects jobs numbered 2 and 3.P4 selects jobs numbered 6 and 7.

2.3. THEOREMS OF KONIG AND G. BIRKKHOFF 73

P5 selects jobs numbered 3 and 4.P6 selects jobs numbered 7 and 6.P7 selects jobs numbered 4 and 2.

2.3 Theorems of Konig and G. Birkkhoff

Theorem 2.3.1 If the entries of a rectangular matrix are zeros and ones,the minimum number of lines (i.e., rows and columns) that contain all theones is equal to the maximum number of ones that can be chosen with no twoon a line.

Proof: Let A = (aij) be an n × t matrix of 0’s and 1’s. Let m bethe minumum number of lines containing all the 1’s, and M the maximumnumber of 1’s no two on a line. Then trivially m ≥M , since no line can passthrough two of the 1’s counted by M . We need to show M ≥ m.

Suppose a minimum covering by m lines consists of r rows and s columns,where r + s = m. We may reorder rows and columns so these become thefirst r rows and first s columns. Without loss of generality assume r ≥ 1.For i = 1, . . . , r, put Si = j : aij = 1 and j > s. So Si indicates whichcolumns beyond the first s have a 1 in row i.

Claim: A = (S1, . . . , Sr) satisfies Condition (H). For supppose some k ofthese sets contain together at most k− 1 elements. Then these k rows couldbe replaced by the appropriate k−1 (or fewer) columns, and all the 1’s wouldstill be covered by this choice of rows and columns. By the minimality of mthis is not possible! Hence A has an SDR corresponding to 1’s in the firstr rows, no two in the same line and none in the first s columns. By a dualargument (if s > 1), we may choose s 1’s, no two on a line, none in the firstr rows and all in the first s columns. These r + s = m 1’s have no two on aline, so m ≤M . If s = 0, i.e., r = m, just use the r 1’s to see r = m ≤M .

Theorem 2.3.2 (Systems of Common Representatives) If a set S is parti-tioned into a finite number n of subsets in two ways S = A1 + · · · + An =B1 + · · · + Bn and if no k of the A’s are contained in fewer than k of theB’s, for each k = 1, . . . , n, then there will be elements x1, . . . , xn that are si-multaneously representatives of the A’s and B’s (maybe after reordering theB’s).


Proof: For each Ai, put Si = j : Ai ∩ Bj 6= ∅. The hypothesis of thethereom is just Condition (H) for the system A = (S1, . . . , Sn). Let j1, . . . , jnbe an SDR for A, and choose xi ∈ Ai∩Bji . Then x1, . . . , xn is simultaneouslyan SDR for both the A’s and the B’s.

Corollary 2.3.3 If B is a finite group with (not necessarily distinct) sub-groups H and K, with |H| = |K|, then there is a set of elements of B thatare simultaneously representatives for right cosets of H and left (or right!)cosets of K.

Exercise: 2.3.4 Sixteen (male - female) couples and a caller attend a squaredance. At the door each dancer selects a name-tag of one of the colors red,blue, green, white. There are four tags of each color for males, and the samefor females. As the tags are selected, each dancer fails to notice what colorher/his partner selects. The caller is then given the job of constructing foursquares with four (original!) couples each in such a way that in each squareno two dancers of the same sex have tags of the same color. Show that thisis possible no matter how the dancers select their name tags.

Corollary 2.3.5 (Theorem of G. Birkkhoff) Let A = (aij) be an n×n matrixwhere the aij are nonnegative real numbers such that each row and columnhas the same sum. Then A is a sum of nonnegative multiples of permutationmatrices.

Proof: A permutation matrix P is a square matrix of 0’s and 1’s witha single 1 in each row and column. We are to prove that if

∑ni=1 aij = t =∑n

j=1 aij, aij ≥ 0, then A =∑uiPi, ui ≥ 0, each Pi a permutation matrix.

The proof is by induction on the number w of nonzero entries aij.If A 6= 0, then w ≥ n. If w = n, then clearly (?) A = tP for some

permutation matrix P . So suppose w > n, and that the theorem has beenestablished for all such matrices with fewer than w nonzero entries. For eachi = 1, . . . , n, let Si be the set of j’s for which aij > 0.

Claim: A = (S1, . . . , Sn) satisfies Condition (H). For suppose some k ofthe sets Si1 , . . . , Sik contain together at most k − 1 indices j. Then rowsi1, . . . , ik have positive entries in at most k − 1 columns. But adding theseentries by rows we get tk, and adding by columns we get at most (k − 1)t,

2.3. THEOREMS OF KONIG AND G. BIRKKHOFF 75

an impossibility. Hence A has an SDR j1, . . . , jn. This means that each ofa1j1 , a2j2 , . . . , anjn is positive. Put P1 = (cij), where

cij =

1, if j = ji0, otherwise.

Put u1 = minaiji : 1 ≤ i ≤ n. Then A1 = A− u1P1 is a matrix of nonneg-ative numbers in which each row and column sum is t − u1. By the choiceof u1, A1 has fewer nonzero entries than does A. Hence by the induction hy-pothesis there are permutation matrices P2, . . . , Ps and nonnegative numbersu2, . . . , us for which A1 =

∑sj=2 ujPj. So A =

∑si=1 uiPi, as desired.

Exercise: 2.3.6 Let n be a positive integer, and let aij (1 ≤ i, j ≤ n − 1)

be real numbers in[

n−2(n−1)2

, 1n−1

]which are independent over the rationals and

such that their rational span does not contain 1. Let A be the matrix of ordern whose (i, j) entry equals

aij if 1 ≤ i, j ≤ n− 1;1−∑n−1

k=1 aik if i 6= n and j = n;1−∑n−1

k=1 akj if j 6= n and i = n;2− n+

∑n−1k=1

∑n−1l=1 akl if i = j = n.

Show that A is a doubly stochastic matrix of order n. Show that A cannot beexpressed as the nonnegative linear combination of n2 − 2n + 1 permutationmatrices.

Theorem 2.3.7 Let A = (aij) be a doubly stochastic matrix of order n withf(A) fully indecomposable components and #(A) nonzero entries. Then Ais the nonnegative linear combination of #(A)− 2n+ f(A) + 1 permutationmatrices.

Proof: The proof is by induction on #(A). Since A is doubly stochastic,#(A) ≥ n. If #(A) = n, then A is a permutation matrix, and since #(A)−2n + f(A) + 1 = n − 2n + n + 1 = 1, the theorem holds in this case.Now assume that #(A) > n. Let k and l be integers such that akl is asmallest positive entry of A. Since A is doubly stochastic there exists apermutation matrix P = (pij) such that if pij = 1 then aij > 0, and pkl = 1.(By the proof of the claim in the proof of Cor 2.3.5.) Since #(A) > n,


akl 6= 1. Let B =(

11−akl

)(A− aklP ). Then B is a doubly stochastic matrix

with #(B) < #(A). By induction B is a nonnegative linear combination of#(B)−2n+f(B)+1 permutation matrices. Hence A is a nonnegative linearcombination of #(B) − 2n + f(B) + 2 permutation matrixes. If f(B) =f(A), then since #(A) > #(B), A is a nonnegative linear combination of#(A)− 2n+ f(A) + 1 permutation matrices, and we are done. Now supposethat f(B) > f(A). Let S be the set of (i, j) such that pij = 1 and aij = akl.By permuting rows and columns we may assume that B is a direct sumB1 ⊕B2 ⊕ · · · ⊕Bk of fully indecomposable doubly stochastic matrices. LetAij denote the submatrix of A with rows those of Bi and columns those ofBj. If Aii is not a fully indecomposable component of A, then there existsa j, j 6= i, such that Aij 6= 0. It follows that |S| ≥ f(B) − f(A). Since#(B) + |S| = #(A) and f(A) ≤ f(B)− |S|, we have

#(A)− 2n+ f(n) + 1 < #(B)− 2n+ f(B) + 2.

Therefore A is a nonnegative linear combination of #(A)− 2n+ f(A)+1permutation matrices, and the proof follows by induction.

In the proof of the following theorem we need the following elementaryinequality that can be proved by induction on r.

Exercise: 2.3.8 Let k1, . . . , kr, r ≥ 1, be r positive integers. Then

r∑i=1

k2i + r ≤

(r∑i=1

ki

)2

+ 1.

Theorem 2.3.9 Let A be a doubly stochastic matrix of order n. Then A isthe nonnegative linear combination of n2 − 2n+ 2 permutation matrices.

Proof: Let A have r = f(A) fully indecomposable components A1, . . . , Arwith Ai being ki by ki. Then #(A) + f(A) ≤ ∑r

i=1 k2i + r ≤ (

∑ri=0 ki)

2 +1 = n2 + 1. Then by the previous theorem, A is the nonnegative linearcombination of #(A) − 2n + 1 + f(A) ≤ (n2 + 1) − 2n + 1 = n2 − 2n + 2permutation matrices.

2.4 The Theorem of Marshall Hall, Jr.

Many of the ideas of “finite” combinatorics have generalizations to situationsin which some of the sets involved are infinite. We just touch on this subject.

2.4. THE THEOREM OF MARSHALL HALL, JR. 77

Given a family A of sets, if the number of sets in the family is infinite,there are several ways the theorem of P. Hall can be generalized. One of thefirst (and to our mind one of the most useful) was given by Marshal Hall, Jr.(no relative of P. Hall), and is as follows.

Theorem 2.4.1 Suppose that for each i in some index set I there is a finitesubset Ai of a set S. The system A = (Ai)i∈I has an SDR if and only ifthe following Condition (H’) holds: For each finite subset I ′ of I the systemA′ = (Ai)i∈I′ satisfies Condition (H).

Proof: We establish a partial order on deletions, writing D1 ⊆ D2 fordeletions D1 and D2 iff each element deleted by D1 is also deleted by D2.Of course, we are interested only in deletions which preserve Condition (H’).If all deletions in an ascending chain D1 ⊆ D2 ⊆ · · · ⊆ Di ⊆ · · · preserveCondition (H), let D be the deletion which consists of deleting an elementb from a set A iff there is some i for which b is deleted from A by Di. Weassert that deletion D also preserves Condition (H).

In any block Br,s of A, (r, s <∞), at most a finite number of deletions inthe chain can affect Br,s. If no deletion of the chain affects Br,s, then of courseD does not affect Br,s, and Condition (H) still holds for Br,s. Otherwise, letDn be the last deletion that affects Br,s. So under Dn (and hence also underD) (Br,s)

′ = B′r,s′ still satisfies Condition (H) by hypothesis, i.e., s′ ≥ r. But

Br,s is arbitrary, so D preserves Condition (H) on A. By Zorn’s Lemma,there will be a maximal deletion D preserving Condition (H). We show thatunder such a maximal deletion D preserving Condition H, each deleted setS ′i has only a single element. Clearly these elements would form an SDR forthe original A.

Suppose there is an a1 not belonging to a critical block. Delete a1 fromevery set Ai containing a1. Under this deletion a block Br,s is replaced by ablock B′

r,s′ with s′ ≥ s − 1 ≥ r, so Condition (H) is preserved. Hence aftera maximal deletion each element left is in some critical block. And if Bk,k isa critical block, we may delete elements of Bk,k from all sets not in Bk,k andstill preserve Condition (H) by Lemma 2.1.3 (since it needs to apply onlyto finitely many sets at a time). By Theorem 2.1.1 each critical block Bk,k

(being finite) possesses an SDR when Condition (H) holds. Hence we mayperform an additional deletion leaving Bk,k as a collection of singleton setsand with Condition (H) still holding for the entire remaining sets. It is nowclear that after a maximal deletion D preserving Condition (H), each element


is in a critical block, and each critical block consists of singleton sets. Henceafter a maximal deletion D preserving Condition (H), each set consists of asingle element, and these elements form an SDR for A.

The following theorem, sometimes called the Cantor–Schroeder–BernsteinTheorem, will be used with the theorem of M. Hall, Jr. to show that anytwo bases of a vector space V over a field F must have the same cardinality.

Theorem 2.4.2 Let X, Y be sets, and let θ : X → Y and ψ : Y → X beinjective mappings. Then there exists a bijection φ : X → Y .

Proof: The elements of X will be referred to as males, those of Y asfemales. For x ∈ X, if θ(x) = y, we say y is the daughter of x and x is thefather of y. Analogously, if ψ(y) = x, we say x is the son of y and y is themother of x. A male with no mother is said to be an “adam.” A female withno father is said to be an “eve.” Ancestors and descendants are defined inthe natural way, except that each x or y is both an ancestor of itself and adescendant of itself. If z ∈ X ∪ Y has an ancestor that is an adam (resp.,eve) we say that z has an adam (resp., eve). Partition X and Y into thefollowing disjoint sets:

X1 = x ∈ X : x has no eve;

X2 = x ∈ X : x has an eve;

Y1 = y ∈ Y : y has no eve;

Y2 = y ∈ Y : y has an eve.

Now a little thought shows that θ : X1 → Y1 is a bijection, and ψ−1 :X2 → Y2 is a a bijection. So

φ = θ|X1 ∪ ψ−1|X2

is a bijection from X to Y .

Corollary 2.4.3 If V is a vector space over the field F and if B1 and B2

are two bases for V , then |B1| = |B2|.

2.5. MATROIDS AND THE GREEDY ALGORITHM 79

Proof: Let B1 = xi : i ∈ I and B2 = yj : j ∈ J. For each i ∈ I, letΓi = j ∈ J : yj occurs with nonzero coefficient in the unique linearexpression for xi in terms of the y′js. Then the union of any k (≥ 1) Γ′is, sayΓi1 , . . . ,Γik , each of which of course is finite, must contain at least k distinctelements. For otherwise xi1 , . . . , xik would belong to a space of dimensionless than k, and hence be linearly dependent. Thus the family (Γi : i ∈ I) ofsets must have an SDR. This means there is a function θ : I → J which isan injection. Similarly, there is an injection ψ : J → I. So by the precedingtheorem there is a bijection J ↔ I, i.e., |B1| = |B2|.

2.5 Matroids and the Greedy Algorithm

A matroid on a set X is a collection I of subsets of X (called independentsubsets of X) satisfying the following:

• Subset Rule: Each subset of an independent set (including the emptyset) is independent.

• Expansion Rule: If I, J ∈ I with |I| < |J |, then there is some x ∈ Jfor which I ∪ x ∈ I.

An independent set of maximal size in I is called a basis. An additivecost function f is a function f : P(X) → R with f(∅) = 0 and such thatf(S) =

∑x∈S f(x), for each S ⊆ X.

Theorem 2.5.1 Let I be a matroid of independent sets on X, and let f :P(X) → R be an additive cost function. Then the greedy algorithm (givenbelow) selects a basis of minimum cost.

Proof: The Greedy Algorithm is as follows:

1. Let I = ∅.

2. From set X pick an element x with f(x) minimum.

3. If I ∪ x is independent, replace I with I ∪ x.

4. Delete x from X.


Repeat Steps 2 through 4 until X is empty.

The Expansion Rule implies that all maximal independent sets have thesame size, and the Greedy Algorithm will add to I until it is a basis. Supposethe Greedy Algorithm selects a basis B = (b1, b2, . . . , bk) and that A =(a1, . . . , ak) is some other basis, both ordered so that if i < j then f(bi) ≤f(bj) and f(ai) ≤ f(aj). By Step 2 of the Greedy Algorithm, f(b1) ≤ f(a1).If f(bi) ≤ f(ai) for all i, then f(B) ≤ f(A). So suppose there is somej such that f(bj) > f(aj), but f(bi) ≤ f(ai) for i = 1, 2, . . . , j − 1.Then a1, . . . , aj and b1, b2, . . . , bj−1 are both independent. Bythe Expansion Property, for some ai with i ≤ j, b1, . . . , bj−1, ai isindependent. Since f(ai) ≤ f(aj) < f(bj), ai would have been selected bythe Greedy Algorithm to be bj (almost true; at least bj would not have beenchosen!). Hence f(bj) ≤ f(aj) and B must be a minimum-cost basis.

We now consider three contexts in which the Greedy Algorithm has turnedout to be very useful.

Let A = (S1, . . . , Sm) be a family of (not necessarily distinct) subsets of aset X. Suppose each element of X has a weight (or cost) assigned to it. Forx ∈ X let f(x) be the cost assigned to x. And for S ⊆ X, the cost of S isto be f(S) =

∑x∈S f(x). We want to conctruct an SDR for some subfamily

A′ = (Si1 , . . . , Sik) of A which is as large as possible, and such that thecost f(D) of the SDR D is the minimum possible among all maximum-sizedSDR’s.

We say that a subset I of X is an independent set (of representatives forA) if the elements of I may be matched with members of A so that theyform an SDR for the sets with which they are matched. So we want to beable to find a cheapest independent set of maximal size.

The same problem can be expressed in terms of bipartite graphs. Givena bipartite graph G = (X,Y,E), we say that a subset S of X is independent(for matchings of X into Y ) if there is a matching M of G which matchesall elements of S to elements of Y . If each element x of X is assigned a(nonnegative) cost f(x), for each subset S ⊆ X put F (S) =

∑x∈S f(x).

Then the problem is to find a cheapest (or sometimes a most expensive)independent set in X, and it is usually desirable also to have a correspondingmatching.

Given A = (S1, . . . , Sm), Si ⊆ X, construct a bipartite graph G =(X,A, E) with (x, Sj) ∈ E iff x ∈ Sj. Then a subset I of X is indepen-


dent in the matching sense iff it is independent in the SDR sense.

Theorem 2.5.2 In both examples we have just given (matchings of bipartitegraphs and SDR’s) the independent sets form a matroid.

We do the case for matchings in bipartite graphs, and we note that if theempty set (as a subset of X in G = (X, Y,E)) is defined to be independent,then clearly the set I of independent subsets of X satisfies the Subset Rule.So we consider the Expansion Rule. However, before proceding into theproof we need to be sure that our terminology is clear and we need to provea couple preliminary lemmas.

A matching M of size m in a graph G is a set of m edges, no two of whichhave a vertex in common. A vertex is said to be matched (to another vertex)by M if it lies in an edge of M . We defined a bipartite graph G with partsX and Y to be a graph whose vertex set is the union of the two disjoint setsX and Y and whose edges all connect a vertex in X with a vertex in Y . Acomplete matching of X int Y is a matching of G with X edges. Our firstlemma describes the interaction of two matchings.

Lemma 2.5.3 Let M1 and M2 be matchings of the graph G = (V,E). LetG′ = (V,E ′) be the subgraph with E ′ = (M1∪M2)\ (M1∩M2) = (M1 \M2)∪(M2 \ M1). Then each connected component of G′ is one of the followingthree types:

(i) a single vertex(ii) a cycle with an even number of edges and whose edges are

alternately in M1 and M2.(iii) a chain whose edges are alternately in M1 and M2,

and whose two end vertices are each matched by one ofM1, M2 but not both.

Moreover, if |M1| < |M2|, there is a component of G′ of type (iii) withfirst and last edges in M2 and whose endpoints are not M1-matched.

Proof: If x is a vertex of G that is neither M1-matched nor M2-matched,then x is an isolated vertex of G′. Similarly, if some edge through x lies inboth M1 and M2, then x is an isolated vertex of G′. Now suppose G1 is aconnected component of G′ having n vertices and at least one edge. SinceG1 is connected, it has at least n − 1 edges (since it has a spanning tree).


Each vertex of G1 has degree 1 or 2 (on at most one edge of M1 and at mostone edge of M2). Hence

2(n− 1) ≤ 2(number of edges of G1) =∑

x∈V (G1)

deg(x) ≤ 2n.

So G1 has either n−1 or n edges. If G1 has n−1 edges, it must be a treein which each vertex has degree at most 2, i.e., G1 is a chain whose edges arealternately in M1 and M2. If x is an endpoint of G1, it is easy to see that xis matched by only one of M1, M2. (If x ∈ e1 ∈ M1 and x ∈ e2 ∈ M2, thenif e1 = e2 this edge is not in E ′; if e1 6= e2, both edges e1, e2 are in G′ and xcould not be an endpoint of G1.)

If G1 has n edges, it must be a cycle whose edges alternate in M1 andM2, forcing it to have an even number edges.

Finally, if |M1| < |M2|, there must be some connected component G1

with more M2-edges than M1-edges. So G1 is of type (iii) with first and lastedge in M2 and whose endpoints are not M1-matched.

If M is a matching for G, a path v0e1v1e2 · · · envn is an alternating pathfor M if whenever ei is in M , ei+1 is not and whenever ei is not in M , ei+1

is in M . We now show how to use the kind of alternating path that arises incase (iii) of the previous Lemma to obtain a larger matching.

Lemma 2.5.4 Let M be a matching in a graph G and let P be an alternatingpath with edge set E ′ beginning and ending at unmatched vertices. Let M ′ =M ∩ E ′. Then

(M \M ′) ∪ (E ′ \M ′) = (M \ E ′) ∪ (E ′ \M)

is a matching with one more edge than M has.

Proof: Every other edge of P is in M . However, P begins and ends withedges not in M , so there is a number k such that P has k edges in M andk+1 edges not in M . The first and last vertices of P are unmatched, and allother vertices in P are matched by M ′, so no edge in M \M ′ contains anyvertex in P . Thus, the edges of M \M ′ have no vertices in common with theedges of E ′ \M ′. Further, since P is a path and E ′ \M ′ consists of everyother edge of the path, the edges of E ′ \M ′ have no vertices in common.Thus:

(M \M ′) ∪ (E ′ \M ′) = (M \ E ′) ∪ (E ′ \M)


is a matching and by the sum principle, it has m− k+ k+ 1 = m+ 1 edges.

We are now ready for the proof of the theorem.Proof: As mentioned above, the set I of independent subsets of X (of

vertices of a bipartite graph G = (X, Y,E)) satisfies the Subset Rule. So wenow consider the Expansion Rule.

Suppose M1 is a matching of S into Y , M2 is a matching of T into Y ,where S ⊆ X, T ⊆ X and |S| < |T |. Let G′ be the graph on X ∪ Ywith edgeset E ′ = (M1 ∪M2) \ (M1 ∩M2) = (M1 \M2) ∪ (M2 \M1). Here|M1| = |S| < |T | = |M2|, and clearly |E ′ ∩M1| < |E ′ ∩M2|.

At least one of the connected components of G′ has one more edge inM2 than it has in M1. So by Lemma 2.5.3 the graph G′ has a connectedcomponent that must be an M1-alternating path P whose first and last edgesare in M2. Each vertex of this path that is touched by an M1 edge is alsotouched by an M2 edge. And the endpoints of this path are not M1-matched.Since the path has an odd number of edges, its two endpoints lie one in X,one in Y . Say x is the endpoint lying in X. Let E ′′ be the edge-set of thepath P . Then M ′ = (M1 \ E ′′) ∪ (E ′′ \M1) is a matching with one edgemore than M1, and M ′ is a matching of S ∪ x into Y . Hence S ∪ x isindependent, and we selected x from T .

There is a converse due to Berge that is quite interesting, but we leaveits proof as an exercise.

Theorem 2.5.5 Suppose G is a graph and M is a matching of G. Then Mis a matching of maximum size (among all matchings) if and only if there isno alternating path connecting two unmatched vertices.

Exercise: 2.5.6 Prove Theorem 2.5.5.

There is a third standard example of a matroid.

Theorem 2.5.7 The edgesets of forests of a graph G = (V,E) form theindependent sets of a matroid on E.

Proof: If for some F ⊆ E it is true that (V, F ) has no cycles, then (V, F ′)has no cycles for any subset F ′ of F . This says that the Subset Rule issatisfied.


Recall that a tree is a connected graph with k vertices and k − 1 edges.Thus a forest on n vertices with c connected components will consist of c treesand will thus have n − c edges. Suppose F ′ and F are forests (contained inE) with r edges and s edges, respectively, with r < s. If no edge of F canbe added to F ′ to give an independent set, then adding any edge of F to F ′

gives a cycle. In particular, each edge of F must connect two points in thesame connected component of (V, F ′). Thus each connected component of(V, F ) is a subset of a connected component of (V, F ′). Then (V, F ) has nomore edges than (V, F ′), so r ≥ s, a contradiction. Hence the forests of Gsatisfy the expansion rule, implying that the collection of edgesets of forestsof G is a collection of independent sets of a matroid on E.

Corollary 2.5.8 The Greedy Algorithm applied to cost-weighted edges of aconnected graph produces a minimum-cost spanning tree. In fact, this is whatis usually called Kruskal’s algorithm.

Chapter 3

Polya Theory

3.1 Group Actions

Let X be a nonempty set and SX the symmetric group on X, i.e., SX isthe group of all permutations of the elements of X with the group operationbeing the composition of functions. Let G be a group. An action of G onX is a homomorphism µ : G → SX . In other words, µ is a function from Gto SX satisfying

µ(g1) µ(g2) = µ(g1 · g2) (3.1)

for all g1, g2 ∈ G.

Often (µ(g))(x) is written as g(x) if only one action is being considered.The only difference between thinking of G as acting on X and thinking of Gas a group of permutations of the elements of X is that for some g1, g2 ∈ G,g1 6= g2, it might be that µ(g1) and µ(g2) are actually the same permutation,i.e., g1(x) = g2(x) for all x ∈ X. Also, sometimes there are several differentactions of G on X which may be considered in the same context.

Theorem 3.1.1 Let µ be an action of G on X and let e be the identity ofG. Then the following hold:

(i) µ(e)is the identity permutation on X.

(ii) µ(g−1) = [µ(g)]−1, for each g ∈ G.

(iii) More generally, for each n ∈ Z, µ(gn) = (µ(g))n.

89

90 CHAPTER 3. POLYA THEORY

Proof: These results are special cases of results usually proved for homo-morphisms in general. If you don’t remember them, you should work out theproofs in this special case.

Let G act on X. For x, y ∈ X, define x ∼ y iff there is some g ∈ G forwhich g(x) = y.

Theorem 3.1.2 The relation “∼” is an equivalence relation on X.

Proof: This is an easy exercise.

The “∼” equivalence classes are called G-orbits in X. The orbit con-taining x is denoted xG or sometimes just [x] if there is no likelihood ofconfusion.

For g ∈ G, put Xg = x ∈ X : g(x) = x, so Xg is the set of elements ofX fixed by g. For x ∈ X, put Gx = g ∈ G : g(x) = x. Gx is the stabilizerof x in G.

Theorem 3.1.3 For x ∈ X, Gx is a subgroup of G (written Gx ≤ G). If Gis finite, then |G| = |[x]| · |Gx|.

Proof: It is an easy exercise to show that Gx is a subgroup of G. Havingdone that, define a function f from the set of left cosets of Gx in G to [x] by:

f(gGx) = g(x).

First we show that f is well-defined. If g1Gx = g2Gx, then g−12 g1 ∈ Gx,

so that (g−12 · g1)(x) = x, which implies g1(x) = g2(x). Hence f(g1Gx) =

f(g2Gx). So f is well-defined. Now we claim f is a bijection. Supposef(g1Gx) = f(g2Gx), so by definition g1(x) = g2(x) and(g−1

2 · g1)(x) = x. Hence g−12 · g1 ∈ Gx, implying g1Gx = g2Gx, so f is one-to-

one. If y ∈ [x], then there is a g ∈ G with g(x) = y. So f(gGx) = g(x) = y,implying f is onto [x].

Hence f is a bijection from the set of left cosets of Gx in G to [x], i.e.,|G|/|Gx| = |[x]| as claimed.

Theorem 3.1.4 For some x, y ∈ X and g ∈ G, suppose that g(x) = y.Then

(i) H ≤ Gx iff gHg−1 ≤ Gy; in particular,

(ii) Gg(x) = gGxg−1.

3.1. GROUP ACTIONS 91

Proof: Easy exercise.

Theorem 3.1.5 The Orbit-Counting Lemma (Not Burnside’s Lemma) Letk be the number of G-orbits in X. Then

k =1

|G|

∑g∈G

|Xg|

.Proof: Put S = (x, g) ∈ X × G : g(x) = x. We determine |S| in two

ways: |S| =∑x∈X |Gx| =

∑g∈G |Xg|. Since x ∼ y iff [x] = [y], in which

case |[x]| = |[y]|, it must be that |G|/|Gx| = |G|/|Gy| whenever x ∼ y. So∑y∈[x] |Gy| =

∑y∈[x] |Gx| = |[x]| · |Gx| = |G|. Hence

∑x∈X |Gx| = k · |G| =∑

g∈G |Xg|. And hence k =(∑

g∈G |Xg|)/|G|.

The following situation often arises. There is some given action ν of Gon some set X. F is the set of all functions from X into some set Y . Thenthere is a natural action µ of G on F defined by: For each g ∈ G and eachf ∈ F = Y X ,

µ(g)(f) = f ν(g−1).

Theorem 3.1.6 µ : G→ SF is an action of G on F .

Proof: First note that µ(g1 · g2)(f) = f ν((g1 · g2)−1) = f ν(g−1

2 ·g−11 ) = f [ν(g−1

2 ) ν(g−11 )] = [f ν(g−1

2 )] ν(g−11 ) = µ(g1)(f ν(g−1

2 ) =µ(g1)(µ(g2)(f)) = (µ(g1) µ(g2))(f). So µ is an action of G on F providedeach µ(g) is actually a permutation of the elements of F .

So suppose µ(g)(f1) = µ(g)(f2), i.e., f1 ν(g−1) = f2 ν(g−1). But sinceν(g−1) is a permutation of the elements of X, it must be that f1 = f2, soµ(g) is one-to-one on F . For each g ∈ G and f : X → Y , f ν(g) ∈ Y X andµ(g)(f ν(g)) = (f ν(g)) ν(g−1) = f , implying that µ(g) is onto.

To use Not Burnside’s Lemma to count G-orbits in F , we need to compute|Fµ(g)| for each g ∈ G.

Theorem 3.1.7 For g ∈ G, let c be the number of cycles of ν(g) as a per-mutation on X. Then |Fµ(g)| = |Y |c.


Proof: For f : X → Y , g ∈ G, we want to know when is f ν(g−1) = f ,i.e., (f ν(g−1))(x) = f(x). This is iff f(ν(g−1)(x)) = f(x). So f must havethe same value at x, ν(g−1)(x), ν(g−2)(x), . . ., etc. This just says that f isconstant on the orbits of ν(g) in X. So if c is the number of cycles of ν(g) asa permutation on X, then |Y |c is the number of functions f : X → Y whichare constant on the orbits of ν(g).

Applying Not Burnside’s Lemma to the action µ of G on F = Y X , wehave:

Theorem 3.1.8 The number of G-orbits in F is

1

|G|∑g∈G

|Fµ(g)| =1

|G|∑g∈G

|Y |c(g),

where c(g) is the number of cycles of ν(g) as a permutation on X.

3.2 Applications

Example 3.2.1 Let G be the group of symmetries of the square1 24 3

writ-

ten as permutations of [4] = 1, 2, 3, 4. The convention here is that if asymmetry σ of the square moves a corner labeled i to the corner previouslylabeled j, then σ(i) = j. We want to paint the corners with W and R (whiteand red) and then determine how many essentially different paintings thereare.

Here X = 1, 2, 3, 4, Y = W,R. A painting f is just a functionf : X → Y . Two paintings f1, f2 are the same if there is a g ∈ G withf1 g−1 = f2. So the number of distinct paintings is the number of G-orbitsin F = f : X → Y , which is

1

|G|∑g∈G

|Y |c(g).

G = e, (1234), (13)(24), (1432), (24), (12)(34), (13), (14)(23).

3.2. APPLICATIONS 93

So the number of distinct paintings is

1

8(24 + 21 + 22 + 21 + 23 + 22 + 23 + 22) = 6.

The distinct paintings are listed as follows:

W WW W

W RW W

W RR W

W WR R

W RR R

R RR R

Example 3.2.2 How many necklaces are there with n beads of m colors iftwo are the same provided one can be rotated into the other?

Let σ be the basic rotation σ = (1, 2, 3, . . . , n). So G = σi : 1 ≤ i ≤ n.The length of each cycle in σi is the order of σi, which is

n

gcd(n, i).

So the number of cycles in σi is gcd(n, i). For each d such that d|n, thereare φ(n/d) integers i with 1 ≤ i ≤ n and d = gcd(n, i). So the number ofσi ∈ G with d cycles is φ(n/d), for each d with d|n. If Y is the set of mcolors to be used, the number of distinct necklace patterns under the actionof G is

1

n

∑g∈G

mc(g) =1

n

∑d|nφ(n/d)md =

1

n

∑d|nφ(d)mn/d.

Note: If f(x) = 1n

∑d|n φ(d)xn/d, then f(x) is a polynomial of degree n

with rational coefficients which lie between 0 and 1, but f(m) is a positiveinteger for each positive integer m.

Exercise: 3.2.3 If G also has “flips,” so |G| = 2n, i.e., G is dihedral, howmany necklaces of n beads in m colors are there? (Hint: Do the cases n odd,n even separately.)


Solution: If n is odd, each of the n additional permutations is a flip abouta uniquely defined vertex (i.e., bead), and it has 1+ n−1

2= n+1

2cycles. If n is

even, each of the n additional permutations is a flip. n/2 of them are aboutan axis through two opposite beads and each has 2 + n−2

2= n+2

2cycles. The

other n/2 flips are about an axis that misses each bead and has n2

cycles. Sothe number of distinct necklace patterns under the action of G is

1

2n

∑d|nφ(n

d)md +

n

2(m

n+22 ) +

n

2m

n2

.Answer to Exercise

(i) For n odd, the number of necklaces is:

1

2n

∑d|nφ(n/d)md + nm

n+12

.(ii) For n even, the number of necklaces is:

1

2n

∑d|nφ(n/d)md +

1

2nm

n2 (m+ 1)

.

Example 3.2.4 A switching function f in n variables is a functionf : Zn

2 → Z2. Starting with a group G acting on the set [n] = 1, . . . , n wecan define an action of G on the set of all switching functions in n variablesand then ask how many inequivalent switching functions in n variables thereare.

As a first case, let G = Sn be the group of all permutations of the elementsof [n], and define an action ν of G on

Z [n]2 = x : [n] → Z2 = 0, 1

according to the following: For x ∈ Z [n]2 (write x = (x1, . . . , xn): xi =0 or 1,

and where xi is the image of i under x), and g ∈ G, put

3.2. APPLICATIONS 95

(ν(g))(x) = x g−1 = (xg−1(1), . . . , xg−1(n)).

Now let

Fn = f : Zn2 → Z2 = Z(Zn

2 ).

So there is an action µ of G on Fn defined by: For f ∈ Fn, g ∈ G,

(µ(g))(f) = f ν(g−1),

i.e.,

(µ(g)(f))(x) = f(ν(g−1)(x)) = f(x g) = f(xg(1), . . . , xg(n)).

This last equation says that g ∈ Sn acts on Fn by: (g(f))(x) = f(x g),i.e.,

g(f) : (x1, . . . , xn) 7→ f(xg(1), . . . , xg(n)).

We say that f1 and f2 are equivalent if they are in the same G-orbit, andwe would like to determine how many inequivalent switching functions in nvariables there are. This is too difficult for us to do for general n, so we putn = 3. But first we rename the elements of Z3

2 essentially by listing them ina natural order so that we can simplify notation in what follows.

(000) ↔ 0; (001) ↔ 1; (010) ↔ 2; (011) ↔ 3;

(100) ↔ 4; (101) ↔ 5; (110) ↔ 6; (111) ↔ 7.

We note that

G = S3 = e, (123), (132), (12), (13), (23)

effects an action ν on Z32 by: ν(g)(x) = x g−1:


e : (x1, x2, x3) 7→ (x1, x2, x3) ⇒ ν(e) = (0)(1)(2)(3)(4)(5)(6)(7)

(123) : (x1, x2, x3) 7→ (x3, x1, x2) ⇒ ν(123) = (0)(142)(356)(7)

(12) : (x1, x2, x3) 7→ (x2, x1, x3) ⇒ ν(12) = (0)(1)(24)(35)(6)(7)

(13) : (x1, x2, x3) 7→ (x3, x2, x1) ⇒ ν(13) = (0)(14)(2)(36)(5)(7)

(132) : (x1, x2, x3) 7→ (x2, x3, x1) ⇒ ν(132) = (0)(124)(365)(7)

(23) : (x1, x2, x3) 7→ (x1, x3, x2) ⇒ ν(23) = (0)(12)(3)(4)(56)(7)

F3 = f : Z32 → Z2 = f : 0, . . . , 7 → Z2.

For g ∈ G, |(F3)g| = 2c(g), where c(g) is the number of cycles in ν(g). Sothe number of G-orbits in F3 is

1

|G|∑g∈G

2c(g) =1

6[28 + 24 + 26 + 26 + 24 + 26] = 80

(whereas |F3| = 2(23) = 28 = 256).

Exercise: 3.2.5 Repeat this problem with n = 2.

Exercise: 3.2.6 Repeat this problem with n = 3, but extend G to a group oforder 12 on Z3

2 by allowing complementation: xi = 1+xi, with addition mod2.

3.3 The Cycle Index: Polya’s Theorem

Let G act on a set X, with n = |X|. For each g ∈ G, let λt(g) be the numberof cycles of length t in the cycle decomposition of g as a permutation on X,1 ≤ t ≤ n. Let x1, . . . , xn be variables. Then the CYCLE INDEX of G(relative to the given action of G) is the polynomial

PG(x1, . . . , xn) =1

|G|∑g∈G

xλ1(g)1 · · ·xλn(g)

n .

3.3. THE CYCLE INDEX: POLYA’S THEOREM 97

If Y is a set with m = |Y |, then G induces an action on Y X , as we haveseen above. And the “ordinary” version of Polya’s counting theorem is givenas follows:

Theorem 3.3.1 The number of G-orbits in Y X is

PG(m, . . . ,m) =1

|G|∑g∈G

m(∑n

t=1λt(g)).

Proof: Of course, this is exactly what Theorem 3.1.8 says.

Example 3.3.2 THE GROUP OF RIGID MOTIONS OF THE CUBE

Consider a cube in 3-space. It has 8 vertices, 6 faces and 12 edges. LetG be the group of rigid motions, (i.e., rotations) of the cube. G consists ofthe following rotations:

(a) The identity.

(b) Three rotations of 180 degrees about axes connecting centers of op-posite faces.

(c) Six rotations of 90 degrees around axes connecting centers of oppositefaces.

(d) Six rotations of 180 degrees around axes joining midpoints of oppositeedges.

(e) Eight rotations of 120 degrees about axes connecting opposite vertices.

Exercise: 3.3.3 Compute the Cycle Index of G considered as a group ofpermutations on the vertices (resp., edges; resp., faces) of the cube.

CONVENTION: If A labels a vertex (edge, face, etc.) which is carriedby a motion π to the position originally held by the vertex (edge, face, etc.)labeled B, we write π(A) = B.


3.4 Sylow Theory Via Group Actions

We begin with a preliminary result.

Theorem 3.4.1 Let n = pαm where p is a prime number, and let pr||m(i.e., pr divides m, but pr+1 does not divide m). Then

pr||(pαmpα

).

Proof: The question is: What power of p divides

(pαmpα

)=

(pαm)!

(pα)!(pαm− pα)!=

=pαm(pαm− 1) · · · (pαm− i) · · · (pαm− pα + 1)

pα(pα − 1) · · · (pα − i) · · · (pα − pα + 1)?

Looking at this expression written out, one can see that except for thefactor m in the numerator, the power of p dividing (pαm − i) = pα − i +pα(m − 1) is the same as that dividing pα − i, since 0 ≤ i ≤ pα − 1, so allpowers of p cancel out except the power which divides m.

Theorem 3.4.2 (Sylow Theorem 1) Let G be a finite group with |G| = n =pαq, pβ||n. (So β ≥ α.) Then G has a subgroup of order pα.

Proof: From the preceding result it follows that pβ−α||(

npα

). Put X =

S ⊆ G : |S| = pα. µ : G→ SX is an action of G on X, where µ is definedby

µ(g)S = gS = gs : s ∈ S.

Fix S = g1, . . . , gpα ∈ X. If g ∈ GS, then S = gS ⇒ gg1, . . . , ggpα =g1, . . . , gpα, which implies that gg1 = gk for some k, so that

g = gkg−11 ∈ g1g

−11 , g2g

−11 , . . . , gpαg−1

1 .

3.4. SYLOW THEORY VIA GROUP ACTIONS 99

From this it is clear that |GS| ≤ pα.

Let O1, . . . ,Oh be the distinct G-orbits in X.

(npα

)= |X| = ∑h

t=1 |Ot|,

so that pβ−α+1 does not divide |X|. Hence there is some t for which pβ−α+1

does not divide |Ot|. Let O be any one of the orbits Ot for which pβ−α+1 doesnot divide |O|. For any S ∈ O, O = gS : g ∈ G, so |G| = |GS| · |O| = pβd

where p does not divide d. Hence |O| = pβd|GS |

, forcing pα to divide |GS|.Putting this together with the result of the previous paragraph, |GS| = pα.

This shows that if O is an orbit for which pβ−α+1 does not divide |O| (andthere must be at least one such), then for each S ∈ O, |GS| = pα. So G hasa subgroup of order pα.

Theorem 3.4.3 (Sylow Theorem 2) If pα is an exact divisor of |G|, then allsubgroups of G with order pα (i.e., all Sylow p-subgroups of G) are conjugatein G.

Proof: In Theorem 3.4.2 put β = α. So there is an orbit O for which pdoes not divide |O|, and |GS| = pα for each S ∈ O.

We claim that the converse holds: If H ≤ G with |H| = pα, then theremust be some T ∈ O for which H = GT , so that H and GS are conjugatefor each S ∈ O.

Given H ≤ G with |H| = pα, clearly H acts on O. Let Q1, . . . ,Qm beall the H-orbits in O, so that |O| =

∑mt=1 |Qt|. Since p does not divide |O|,

there must be some t for which p does not divide |Qt|. Choose any T ∈ Qt,so Qt = hT : h ∈ H. Then |HT | · |Qt| = |H| = pα and |Qt| = pα

|HT |. Hence

pα = |HT |, since p does not divide |Qt|. Since HT ≤ H and |HT | = |H|, wehave H = HT . Also T ∈ O with p not dividing |O|, so |GT | = pα. ThenHT ≤ GT and |HT | = |GT | imply that HT = GT . Hence H = GT .

Theorem 3.4.4 (Sylow Theorem 3) Let pα be an exact divisor of |G|, with|G| = pαq. Let tp be the number of Sylow p-subgroups in G. Then:

(a) tp ≡ 1 (mod p);

(b) tp|q.

Proof: Let H1, . . . , Hr be the distinct Sylow p-subgroups of G. Recallthat there is an orbit O with p not dividing |O|, and each Hi is the stabilizer


of some element of O. In fact, each Hi is the stabilizer of the same numbers of elements of O, so |O| = rs.

Let P1 = S ∈ O : H1 = GS. So |P1| = s. If T ∈ O \ P1, then theH1-orbit U containing T has more than one element. |(H1)T | · |U| = |H1|.But |H1| = pα, |U| > 1, imply that p divides |U|. So rs = |O| = s + mp ≡s 6≡ 0 (mod p). So s 6≡ 0 (mod p) and s(r − 1) ≡ 0 (mod p) imply thatr ≡ 1 (mod p).

Finally, |O| = |G|/|GS| = q for any S ∈ O, so that rs = q. With r = tpthe theorem is proved.

3.5 Patterns and Weights

Let X and Y be finite, nonempty sets (with |X| = m), and let ν : G → SXbe an action of the group G on the set X. Then G induces an equivalencerelation on Y X , with equivalence classes being called patterns: viz., for f, h ∈Y X , then f ∼ h if and only if there is some g ∈ G for which f = h ν(g−1).

Let Ω be a commutative ring containing the rational numbers Q as asubring. Frequently, Ω is a polynomial ring over Q in finitely many variables.Also, we suppose there is some weight function w : Y → Ω. Y is called thestore, and

∑y∈Y w(y) is the store inventory. A weight function W : Y X → Ω

is then defined by

W (f) =∏x∈X

w(f(x)).

It is easy to see that if f and h belong to the same pattern, then W (f) =W (h). For, with h = f ν(g−1),

W (h) =∏

x∈Xw(h(x)) =

∏x∈X

w[f(ν(g−1)(x))] =∏

x∈Xw[f(x)] = W (f),

since ν(g−1)(x) varies over all elements of X as x varies over all elements ofX. So the weight of a pattern may be defined as the weight of any function in

that pattern. The inventory∑f∈Y X W (f) of Y X is equal to

(∑y∈Y w(y)

)|X|.

This is a special case (with each |Xi| = 1) of the following result, whose proofis given.

3.5. PATTERNS AND WEIGHTS 101

Theorem 3.5.1 If X is partitioned into disjoint, nonempty subsets X =X1 + · · ·+Xk, put

S = f ∈ Y X : f is constant on each Xi, i = 1, . . . , k.

Then the inventory of S is defined to be∑f∈SW (f) and is equal to

∏ki=1

∑y∈Y (w(y))|Xi|

.

Proof: A term in the product is obtained by selecting one term in eachfactor and multiplying them together. This is equivalent to selecting a map-ping φ of the set 1, . . . , k into Y , yielding the term

∏ki=1 [w(φ(i))]|Xi|. Let

ψ : X → 1, . . . , k be defined by ψ(x) = i if and only if x ∈ Xi. Putf = φ ψ. Then

w(φ(i))|Xi| =∏x∈Xi

w(φ(i)) =∏x∈Xi

w((φ ψ)(x)) =∏x∈Xi

w(f(x)),

from which it follows that

k∏i=1

[w(φ(i))]|Xi| =∏x∈X

w(f(x)) = W (f).

Since each f ∈ S can be written uniquely in the form f = φ ψ for someφ : 1, . . . , k → Y , the desired result is easily seen to hold; viz.:

k∏i=1

∑y∈Y

(w(y))|Xi|

=∑

φ:1,...,k→Y

(k∏i=1

w(φ(i))|Xi|)

=∑f∈S

W (f).

If each |Xi| = 1, we have S = Y X and∑f∈Y X W (f) =

(∑y∈Y w(y)

)|X|.

Theorem 3.5.2 (Polya-Redfield) The Pattern Inventory is given by:

∑F

W (F ) = PG(∑y∈Y

w(y),∑y∈Y

[w(y)]2, . . . ,∑y∈Y

[w(y)]m),

where the summation is over all patterns F , and PG is the cycle index. Inparticular, if all weights are chosen equal to 1, the number of patterns is


PG(|Y |, |Y |, . . . , |Y |). If f ∈ F where F is a given pattern, then W (F ) =W (f) =

∏x∈X w(f(x)). If w(yi) = xi, where x1, . . . , xm are independent com-

muting variables, then W (f) = x1b1x2

b2 · · ·xmbm, where bi is the number oftimes the color yi appears in the coloring of any f in the pattern F . Hence thecoefficient of x1

b1x2b2 · · ·xmbm in PG(

∑y∈Y w(y),

∑y∈Y [w(y)]2, . . . ,

∑y∈Y [w(y)]m)

is the number of patterns in which the color yi appears bi times.

Proof: Let w be one of the possible values that the weight of a functionmay have. Put S = f ∈ Y X : W (f) = w. If g ∈ G, then W (f ν(g−1)) =w. Hence for each g ∈ G, µ(g) : f → f ν(g−1) maps S into S. (And it iseasy to see from earlier results that µ is an action of G on S.) Clearly, forf1, f2 ∈ S, f1 and f2 belong to the same pattern (in the sense mentioned atthe beginning of this section) if and only if they are equivalent relative tothe action µ of G on S. Not Burnside’s Lemma applied to µ : G→ SS saysthat the number of patterns contained in S is equal to 1

|G|∑g∈G ψw(g), where

ψw(g) denotes the number of functions f with W (f) = w and f = µ(g)(f) =f ν(g−1).

The patterns contained in S all have weight w. So if we multiply by wand sum over all possible values of w, we obtain the patern inventory

∑F

W (F ) =1

|G|∑w

∑g∈G

ψw(g) · w.

Also,

∑w

ψw(g) · w =(g)∑f

W (f),

where the right hand side is summed over all f ∈ Y X with f = f ν(g−1).It follows that

∑(W (F )) =

1

|G|∑g∈G

(g)∑f

W (f).

Here ν(g) splits X into cycles. And f = f ν(g−1) means

f(x) = f(ν(g−1)(x)) = · · · = f(ν(g−i)(x)),


i.e., f is constant on each cycle of ν(g−1), and hence on each cycle of ν(g).Conversely, each f constant on each cycle of ν(g) automatically satisfiesf = f ν(g−1), since ν(g−1)(x) always belongs to the same cycle as x itself.

Thus if the cycles are X1, . . . , Xk, then∑(g)f W (f) is the inventory calculated

by Theorem 3.5.1 to be

(g)∑f

W (f) =k∏i=1

∑y∈Y

[w(y)]|Xi|

.Let (b1, . . . , bm) be the cycle type of ν(g). This means that among the

numbers |X1|, . . . , |Xk|, the number 1 occurs b1 times, 2 occurs b2 times, . . . ,etc. Hence

(g)∑f

W (f) =

∑y∈Y

(w(y))

b1

·

∑y∈Y

(w(y))2

b2

· · ·

∑y∈Y

(w(y))m

bm

.

Finally,∑F W (F ) = 1

|G|∑g∈G

∑(g)f W (f) is obtained by putting

xi =∑y∈Y (w(y))i in PG(x1, . . . , xm) = 1

|G|∑g∈G x

b11 · · ·xbmm .

We close this section with some examples that partly duplicate some ofthose given earlier.

Example 3.5.3 Suppose we want to distribute m counters over three personsP1, P2, P3 with the condition that P1 obtain the same number as P2. In howmany ways is this possible?

We are not interested in the individual counters, but only in the numbereach person gets. Hence we want functions f defined on X = P1, P2, P3with range Y = 0, 1, . . . ,m and with the restrictions f(P1) = f(P2) and∑3i=1 f(Pi) = m. Put X1 = P1, P2, and X2 = P3. Define w : Y → Ω

by w(i) = xi. Thus the functions we are interested in have weight xm, andthey are the only ones with weight xm. By Theorem 3.5.1 the inventory∑f∈SW (f) must be equal to

2∏i=1

∑y∈Y

w(y)|Xi|

= (1 + x2 + x4 + · · ·+ x2m)(1 + x+ x2 + · · ·+ xm).


But the coefficient of xm in this product is the coefficient of xm in

(1− x2)−1(1− x)−1 =1

4(1 + x)−1 +

1

2(1− x)−2 +

1

4(1− x)−1,

which is the coefficient of xm in

1

4(1− x+ x2 − x3 + · · ·+ (−1)mxm + · · ·)+

+1

2

∞∑i=0

(2 + i− 1

i

)xi +

1

4(1 + x+ x2 + · · ·+ xm + · · ·),

which is equal to

1

2(m+ 1) +

1

4((−1)m + 1) =

12m+ 1, m even,

12(m+ 1), m odd.

For the next few examples let G be the group of rigid motions of a cube.The elements of G were given earlier, but this time we want to include thedetails giving the cycle indices. Recall the elements of G from Example 3.3.2.

Example 3.5.4 Let X be the set of vertices of the cube. The cycle types areindicated as follows:

(a) x81 (b) x4

2 (c) x24

(d) x42 (e) x2

1x23

So PG = 124

(x81 + 9x4

2 + 6x24 + 8x2

1x23).

Example 3.5.5 Let X be the set of edges of the cube. The cycle types areindicated as follows:

(a) x121 (b) x6

2 (c) x34

(d) x21x

52 (e) x4

3

So PG = 124

(x121 + 3x6

2 + 6x34 + 6x2

1x52 + 8x4

3).


Example 3.5.6 Let X be the set of faces of the cube. Then

PG =1

24(x6

1 + 3x21x

22 + 6x2

1x4 + 6x32 + 8x2

3).

Example 3.5.7 Determine how many ways a cube can be painted so thateach face is red or blue? In other words, how many patterns are there?

Let X be the set of faces of the cube, and µ : G→ SX as in the precedingexample. Put Y = red, blue, with the weight of each element being 1. Thenthe number of patterns is PG(2, 2, . . .) = 1

24(26+3·24+6·23+6·23+8·22) = 10.

(Summary: (a) All faces red; (b) five red, one blue; (c) two opposite facesblue, the others red; (d) two adjacent faces blue, the others red; (e) threefaces at one vertex red, the others blue; (f) two opposite faces plus one otherred, the remaining faces blue; (g), (h), (i) and (j) obtained from (d), (c), (b)and (a) upon interchanging red and blue.)

Example 3.5.8 In the preceding example, how many color patterns showfour red faces, two blue?

Let w(red) = x, w(blue) = y. Then the pattern inventory is

∑F

W (F ) =1

24[(x+ y)6 + 3(x+ y)2(x2 + y2)2+

+6(x+ y)2(x4 + y4) + 6(x2 + y2)3 + 8(x3 + y3)2].

The coefficient of x4y2 is 124

(15 + 9 + 6 + 18 + 0) = 2.

Example 3.5.9 In how many ways can the eight vertices be painted with ncolors?

Let X be the set of vertices, with µ : X → SX as in Example 3.5.4. LetY = c1, . . . , cn, with w(ci) = xi. Then the pattern inventory PI is givenby


PI =1

24[(x1 + · · ·+ xn)

8 + 9(x21 + · · ·+ x2

n)4 + 6(x4

1 + · · ·+ x4n)

2+

+8(x1 + · · ·+ xn)2(x3

1 + · · ·+ x3n)

2].

If the total number of patterns is all that is sought, putting xi = 1 showsthis number to be 1

24n2(n6 + 17n2 + 6).

Example 3.5.10 Let G be a finite group of order m. For each a ∈ G putλa(g) = ag. So λa ∈ SG. Then Gλ = λa : a ∈ G is a subgroup ofSG, and Λ : G → Gλ : a 7→ λa is an isomorphism called the left regularrepresentation of G. We now calculate the cycle index of G relative to itsleft regular representation.

Let k(a) be the order of a for each a ∈ G. Then λa splits G into m/k(a)cycles of length k(a). So

PG =1

m

∑a∈G

xm/k(a)k(a) =

1

m

∑d|m

ν(d)xm/dd ,

where ν(d) is the number of elements a in G of order k(a) = d. If G is cyclicof order m, then ν(d) = φ(d) for each d such that d|m, so

PG =1

m

∑d:d|m

φ(d)xm/dd .

3.6 The Symmetric Group

Let X be a finite set with m elements. Let G = SX ' Sm. Let b =(b1, . . . , bm) be a permissible cycle type of some g ∈ G, i.e., bi ≥ 0 andb1 + 2b2 + 3b3 + · · ·+mbm = m. Then:

Theorem 3.6.1 The number #(b) of permutations in Sm and having type bis

3.6. THE SYMMETRIC GROUP 107

#(b) =m!

b1!1b1b2!2b2b3!3b3 · · · bm!mbm.

Proof: There are

(mb1

)ways to form the 1-cycles. Suppose we have

taken care of the 1-cycles, 2-cycles, . . . , (k− 1)-cycles and are about to formthe k-cycles. We have ηk−1 = m − b1 − 2b2 − · · · − (k − 1)bk−1 elements at

our disposal (η0 = m). The first k-cycle can be formed (k − 1)!

(ηk−1

k

)

ways, the second k-cycle in (k− 1)!

(ηk−1 − k

k

)ways, . . . , the bkth k-cycle

in (k − 1)!

(ηk−1 − (bk − 1)k

k

)ways. Hence the k-cycles can be formed in

1

bk!

(k − 1)!

(ηk−1

k

)(k − 1)!

(ηk−1 − k

k

)(k − 1)!

(ηk−1 − 2k

k

)· · ·

· · · (k − 1)!

(ηk−1 − (bk − 1)k

k

)=

=1

bk!

(k − 1)!ηk−1!

k!(ηk−1 − k)!· (k − 1)!(ηk−1 − k)!

k!(ηk−1 − 2k)!· (k − 1)!(ηk−1 − 2k)!

k!(ηk−1 − 3k)!· · ·

·(k − 1)!(ηk−1 − k(bk − 1))!

k!(ηk−1 − bkk)! =

ηk−1!

bk!kbkηk!

ways. (The initial factor 1bk!

arises from the fact that the k-cycles may be

written in any order.)

From Theorem 3.6.1 it follows readily that:

Theorem 3.6.2 The cycle index of Sm is given by

PSm =∑b

xb11 · · ·xbmmb1!1b1b2!2b2 · · · bm!mbm

,

where the sum is over all b = (b1, . . . , bm) for which bi ≥ 0, b1 + 2b2 + · · · +mbm = m.


Now let M = 1, 2, . . . ,m, and let X be the set of unordered pairs i, jof distinct elements of M . The symmetric group Sm acting on M also has anatural action on X. For each σ ∈ Sm, define σ∗ ∈ SX by

σ∗ : i, j 7→ σ(i), σ(j).

Then µ : σ 7→ σ∗ is an action of Sm on X, and

µ : Sm → S∗m = σ∗ : σ ∈ Sm

is an isomorphism. We need to calculate the cycle index of S∗m.This is feasible because the cycle type of each σ determines the cycle type

of the corresponding σ∗. Specifically, each factor xbtt in a term xb11 · · ·xbmm ofPSm corresponding to a σ of cycle type (b1, . . . , bm) yields specific factors inthe term corresponding to σ∗.

Let σ ∈ Sm have cycle type (b1, . . . , bm),∑mi=1 ibi = m. Then for i, j ∈

X we ask what is the length of the cycle of σ∗ to which i, j belongs?First suppose i and j belong to the same cycle of σ of length t. If t isodd, (σ∗)k : i, j 7→ σk(i), σk(j) = i, j if and only if 1) σk(i) = i andσk(j) = j, or 2) σk(i) = j and σk(j) = i. Since σk(i) = j and σk(j) = iimply σ2k(i) = i, it must be that k is a multiple of t. Hence σi(i) = i andσk(j) = j. So the σ∗-cycle of i, j has length t also. Hence one cycle of σ

having odd length t produces 1t

(t2

)= t−1

2cycles of length t.

Now suppose i and j belong to the same cycle of σ of even length t = 2k.This cycle produces one cycle of σ∗ of length k and

1

2k

[(2k1

)− k

]=

1

2k

[2k(2k − 1)

2− 2k

2

]=t− 2

2.

Since there are bt cycles of length t, the pairs i, j belonging to commoncycles of σ yield cycles of σ∗ in such a way that the terms of the cycle indexof Sm yield terms of the cycle index of S∗m as follows:

xbtt ∈ PSm 7→

xbt( t−1

2 )t , t odd;[xt/2x

( t−22 )

t

]bt, t even.

3.6. THE SYMMETRIC GROUP 109

Now we suppose that i and j come from distinct cycles cr and ct of σof lengths r and t, respectively. (Note: (r, t) and [r, t], denote the greatestcommon divisor and the least common multiple, respectively, of r and t.)Also, rt = (r, t)[r, t]. The cycles cr and ct induce on the pairs of elements, oneeach from cr and ct, exactly (r, t) cycles of length [r, t], since σk(i), σk(j) =i, j if and only if k ≡ 0 (mod [r, t]). In particular, if r = t = k, there are kcycles of length k. Hence we have the following:

If r 6= t, xbrr xbtt ∈ PSm 7→ x

(r,t)brbt[r,t] ∈ PS∗m .

If r = t = k, xbkk ∈ PSm 7→ x

k

(bk2

)k .

Multiplying over appropriate cases, and summing over permissible cycletypes b, we finally obtain the cycle index of S∗m:

PS∗m(x1, . . . , xm) =

1

m!

∑b

m!

b1! · · · bm!1b12b2 · · ·mbm·

·[m−1

2 ]∏k=0

xkb2k+1

2k+1 ·[m

2 ]∏k=1

(xkxk−12k )b2k ·

[m2 ]∏

k=1

x

k

(bk2

)k ·

∏1≤r<t≤m−1

x(r,t)brbt[r,t]

Example 3.6.3 For m = 4 we have

PS4 =1

4!(x4

1 + 6x21x2 + 8x1x3 + 3x2

2 + 6x4).

PS∗4 =1

4!(x6

1 + 6x21x

22 + 8x2

3 + 3x21x

22 + 6x2x4).


For future reference we calculate:

PS∗4 (1 + x, 1 + x2, 1 + x3, 1 + x4) =

1 + x+ 2x2 + 3x3 + 2x4 + x5 + x6.

3.7 Counting Graphs

We count the graphs Γ on m vertices with q edges. Let G denote the set ofgraphs Γ on the vertices M = 1, 2, . . . ,m. Such a Γ is a function from thesetX of unordered pairs i, j of distinct elements ofM to the set Y = 0, 1,where Γ(i, j) is 1 or 0, according as i, j is an edge or a nonedge of thegraph Γ.

Two such graphs Γ1 and Γ2 are equivalent if and only if there is a rela-beling of the vertices of Γ1 so that it has the same edges as Γ2, i.e., iff thereis a permutation σ ∈ Sm so that as functions Γ2 = Γ1 σ∗, where for eachσ ∈ Sm, σ∗ acts on X as usual: σ∗ : i, j 7→ σ(i), σ(j).

Let w : Y → 1, x be defined by w(i) = xi. Then the weight of a graphΓ ∈ Y X is W (Γ) = xq, where q is the number of edges of Γ. The patterninventory of G = Y X is PS∗m(1 + x, 1 + x2, 1 + x3, . . .) by Polya’s theorem. Sothe number of graphs on m vertices with q edges is the coefficient of xq inPS∗m(1 + x, 1 + x2, 1 + x3, . . .).

In particular, putting m = 4, we see from the preceding section that thereare 11 graphs on 4 vertices.


Chapter 4

Formal Power Series asGenerating Functions

4.1 Using Power Series to Count Objects

Suppose we are interested in all combinations of three objects O1, O2, O3:

∅, O1, O2, O3, O1, O2, O1, O3, O2, O3, O1, O2, O3.

Consider the “generating function”

C3(x) = (1 +O1x)(1 +O2x)(1 +O3x)

= 1 + (O1 +O2 +O3)x+ (O1O2 +O1O3 +O2O3)x2 + (O1O2O3)x

3.

This is readily generalized to

Cn(x) = (1 +O1x)(1 +O2x) · · · (1 +Onx) = 1 + a1x+ · · · anxn,

where ak is the “kth elementary symmetric function” of the n variables O1 toOn. Cn(x), after multiplication of it factors, contains the actual exhibitionof combinations. If only the number of combinations is of interest, the objectlabels may be ignored and the generating function becomes an enumeratinggenerating function (sometimes just called an enumerator), i.e.,

Cn(x) = (1 + x)n =∞∑k=0

(nk

)xk.

113

114CHAPTER 4. FORMAL POWER SERIES AS GENERATING FUNCTIONS

As a simple (and familiar!) example of the way in which generatingfunctions are used, consider the following :

Cn(x) = (1 + x)n = (1 + x)Cn−1(x) = (1 + x)n−1∑k=0

(n− 1k

)xk

=

(n− 1

0

)+

[(n− 1

0

)+

(n− 1

1

)]x+ · · ·+

+

[(n− 1n− 2

)+

(n− 1n− 1

)]xn−1 +

(n− 1n− 1

)xn,

where for 1 ≤ k ≤ n− 1, the coefficient on xk is(n− 1k − 1

)+

(n− 1k

).

But

Cn(x) =n∑k=0

(nk

)xk,

so (nk

)=

(n− 1k − 1

)+

(n− 1k

),

a basic relation for binomial coefficients.Now look again at the generating function for combinations of n distinct

objects. In its factored form, each object is represented by a binomial andeach binomial spells out the fact that its object has two possibilities in anycombination: either it is absent (the term 1) or it is present (the term Oix forthe object Oi). So Cn(x) = (1+O1x) · · · (1+Onx) is the generating functionfor combinations without repetition. For repetitions of a certain kind, weuse a special generating function. For example, if an object may appear ina combination zero, one or two times, then the function is the polynomial1 + Ox + O2x2. If the number of repetitions is unlimited, it is the function1+Ox+O2x2 +O3x3 · · · = (1−Ox)−1. If the number of repetitions is even,the generating function is 1 + O2x2 + O4x4 + · · · + O2kx2k + · · ·. Moreover,the specification of repetitions may be made arbitrarily for each object. Thegenerating function is a representation of this specification in its factored

4.1. USING POWER SERIES TO COUNT OBJECTS 115

form as well as a representation of the corresponding combinations in itsdeveloped (multiplied out) form.

EXAMPLE: For combinations of n objects with no restrictions on thenumber of repetitions for any object, the enumerating generating function is

(1 + x+ x2 + · · ·)n = (1− x)−n =∞∑k=0

(−nk

)(−x)k

=∞∑k=0

(−n)(−n− 1) · · · (−n− k + 1)(−x)k/(k!) =∞∑k=0

(n+ k − 1

k

)xk.

This is worth stating as a theorem.

Theorem 4.1.1 The number of combinations, with arbitrary repetition, ofn objects k at a time, is the same as the number of combinations withoutrepetition of n + k − 1 objects, k at a time. The corresponding generatingfunction is given by:

(1 + x+ x2 + · · ·)n = (1− x)−n =∞∑k=0

(n+ k − 1

k

)xk. (4.1)

For the problem in the above example with the added specification thateach object must appear at least once, the enumerator is (x + x2 + · · ·)n =

xn(1 − x)−n =∑∞k=0

(n+ k − 1

k

)xk+n =

∑∞k=n

(k − 1k − n

)xk. Hence the

number of combinations of n objects taken k at a time with the restrictionthat each object appear at least once is the same as the number of combina-tions without repetition of k − 1 objects taken k − n at a time.

In this section we have seen power series manipulated without any con-cern for whether or not they converge in an analytic sense. We want toestablish a firm foundation for the theory of formal power series so that wecan continue to work “magic” with series, so a significant part of this chap-ter will be devoted to developing those properties of formal power series thathave proved to be most useful. Before starting this development we give onemore example to illustrate the power of these methods.


4.2 A famous example: Stirling numbers of

the 2nd kind

We have let

nk

be the number of partitions of [n] into k nonempty classes,

for integers n, k, with 0 < k ≤ n. Also, we put

nk

= 0 if k > n or n < 0

or k < 0. Further, we take

n0

= 0 for n 6= 0. Hence we have

nk

=

n− 1k − 1

+ k

n− 1k

(4.2)

for (n, k) 6= (0, 0). And

00

= 1. (Recall: We proved this earlier for

0 < k ≤ n.)

For each integer k ≥ 0 put

Bk(x) =∑n

nk

xn. (4.3)

Multiply Eq. 4.2 by xn and sum over n:

Bk(x) = xBk−1(x) + kxBk(x) (k ≥ 1; B0(x) = 1.) (4.4)

This leads to:

Bk(x) =x

1− kxBk−1(x) =

x2

(1− kx)(1− (k − 1)x)Bk−2(x).

SoB1(x) = x1−x ·1; B2(x) = x

1−2xB1(x) = x2

(1−x)(1−2x); B3(x) = x

1−3xB2(x) =

x3

(1−x)(1−2x)(1−3x). And in general,

Bk(x) =∑n

nk

xn =

xk

(1− x)(1− 2x) · · · (1− kx), k ≥ 0. (4.5)

The partial fraction expansion of Eq. 4.5 has the form

1

(1− x)(1− 2x) · · · (1− kx)=

k∑j=1

αj1− jx

.

4.2. A FAMOUS EXAMPLE: STIRLING NUMBERS OF THE 2ND KIND117

To find the α’s, fix r, 1 ≤ r ≤ k, and multiply both sides by 1− rx.

1

(1− x) · · · (1− (r − 1)x)(1− (r + 1)x) · · · (1− kx)=

=k∑

j=1;j 6=r

αj(1− rx)

(1− jx)+ αr.

Now putting x = 1/r gives:

αr =1

(1− 1/r)(1− 2/r) · · · (1− (r − 1)/r)(1− (r + 1)/r) · · · (1− k/r)

=rk−1

(r − 1)(r − 2) · · · (1)(−1) · · · (−(k − r))=

rk−1(−1)k−r

(r − 1)!(k − r)!,

which implies

αr =(−1)k−rrk−1

(r − 1)!(k − r)!, 1 ≤ r ≤ k.

Notation: If f(x) =∑∞n=0 anx

n, then [xn]f = an. Clearly in generalwe have [xn]f = [xn+r]xrf.

We now have for k ≥ 1:nk

= [xn]

xk

(1− x) · · · (1− kx)

= [xn−k]

1

(1− x) · · · (1− kx)

= [xn−k]k∑r=1

αr1− rx

=k∑r=1

αr[xn−k]

1

1− rx

=k∑r=1

αrrn−k =

k∑r=1

(−1)k−rrn−1

(r − 1)!(k − r)!=

=1

(k − 1)!

k∑r=1

(−1)k−rrn−1

(k − 1r − 1

).

This gives a closed form formula for

nk

:

nk

=

1

(k − 1)!

k−1∑r=0

(−1)k−r−1(r + 1)n−1

(k − 1r

). (4.6)


4.3 Ordinary Generating Functions

Now that we have seen some utility for the use of power series as “generatingfunctions,” we want to introduce formal power series and lay a sound theoret-ical foundation for their use. However, this material is rather technical, andwe believe it makes better pedagogical sense to use power series informallyas “ordinary” generating functions first. This will permit the reader to getused to working with these objects before having to deal with the abstractdefinitions and theorems.

Def. The symbol fops↔ an∞0 means that f is the ordinary power series

generating function for the sequence an∞0 , i.e., f(x) =∑∞i=0 aix

i.

Rule 1. If fops↔ an∞0 , then for a positive integer h,

f − a0 − a1x− · · · − ah−1xh−1

xh↔ an+h∞n=0 .

This follows immediately, since the L.H.S. equals ahxh+ah+1x

h+1+···xh = ah +

ah+1x+ · · ·+ ah+nxn + · · ·.

Def. The derivative of f =∑anx

n is f ′ =∑nanx

n−1.

Proposition 1. f ′ = 0 iff f = a0, i.e., ai = 0 for ı ≥ 1.

Proposition 2. If f ′ = f , then f = cex.Proof: f = f ′ iff

∑∞n=0 anx

n =∑∞n=1 nanx

n−1 =∑∞n=0(n + 1)an+1x

n iffan = (n + 1)an+1 for all n ≥ 0, i.e., an+1

an

n+1for n ≥ 0. By induction on n,

an = A0/n! for all n ≥ 0. So f = a0∑ xn

n!= a0e

x.

Starting with f =∑∞n=0 anx

n, we clearly have

f ′ =∞∑n=0

(n+ 1)an+1xn.

Then we see that

xf ′ =∞∑n=0

(n+ 1)an+1xn+1 =

∞∑n=0

nanxn.

So if we let D be the operator on f that gives f ′, then

fops↔ an∞0 ⇒ (xD)f

ops↔ nan∞0 .

4.3. ORDINARY GENERATING FUNCTIONS 119

Then clearly (xD)2f ↔ n2an∞0 . And (2− 3(xD) + 5(xD)2)f ↔ (2−3n+ 5n2)an∞0 . More generally, we obtain

Rule 2: If fops↔ an∞0 and P is a polynomial, then

(P (xD))fops↔ P (n)ann≥0.

The next Rule is just the product definition.

Rule 3: If fops↔ an∞0 and g

ops↔ bn∞0 , then

fgops↔

n∑r=0

arbn−r∞0 .

An immediate generalization is:

Rule 4: If fops↔ an∞0 , then

fkops↔

∑an1an2 · · · ank

∞n=0

where the sum is over all (an1 , . . . , ank) for which n1 + n2 + · · ·+ nk = n and

ni ≥ 0.

Example: Lef f(n, k) be the number of weak k-compositions of n. Since1

(1−x)ops↔ 1, by Rule 4, 1

(1−x)k

ops↔ f(n, k)∞n=0. And as we have already seen,

(1 − x)−k =∑∞n=0

(−kn

)(−1)nxn implies that f(n, k) = (−1)n

(−kn

)=(

n+ k − 1n

).

We now ask what happens when we multiply by 11−x :

f(x)/(1− x) = (a0 + a1x+ a2x2 + · · ·)(1 + x+ x2 + · · ·) = a0 + (a0 + a1)x+

(a0 + a1 + a2)x2 + · · ·.

This leads to Rule 5:

Rule 5: If fops↔ an∞0 , then f/(1− x)

ops↔ ∑nj=0 ajn≥0.

Exercise: 4.3.1 Dn(

1(1−x)m+1

)= (m+n)!

m!(1−x)m+n+1 ; m,n ≥ 0.

Recall that there is a “formal” Taylor’s formula. If you did not prove itwhen we met it earlier, do it now.


Exercise: 4.3.2 If f(x) =∑∞n=0 anx

n, then an = 1n!Dn(f(x))|x=0.

We can put together the two previous exercises to obtain

[xn]

(1

(1− x)m+1

)=

(m+ n)!

m!n!(4.7)

Example 4.3.3 Start with f = 11−x

ops↔ 1n≥0. By Rule 2 we have (xD)2(

11−x

)ops↔

n2n≥0. Then by Rule 5, 11−x(xD)2

(1

1−x

)ops↔ ∑n

j=0 j2n≥0. This implies∑n

j=0 j2 = [xn]

(1

1−x(xD)2(

11−x

))= [xn]x(1+x)

(1−x)4 , after some calculation. From

Eq.4.7 with m = 3, [xn] 1(1−x)4 = (n+3)!

3!n!=

(n+ 3

3

). Hence

∑nj=0 j

2 =

[xn](

x(1−x)4 + x2

(1−x)4)

= [xn−1](

1(1−x)4

)+[xn−2]

(1

(1−x)4)

=

(n+ 2

3

)+

(n+ 1

3

)=

n(n+1)(2n+1)6

.

Example 4.3.4 (Harmonic numbers) Put Hn = 1 + 12+ 1

3+ · · ·+ 1

n, n ≥ 1.

Start with f =∑n≥1

xn

n= −log(1−x). (Check: f ′ =

∑n≥1 x

n−1 =∑n≥0 x

n =1

1−x , and (−log(1−x))′ = (−1) · 11−x · (−1) = 1

1−x). So fops↔ 1

nn≥1. By Rule

5, 11−xf

ops↔ Hn∞n=1. So

1

1− xlog

(1

1− x

)ops↔ Hn∞n=1.

In the first three sections of this chapter we gave examples of how to use“ordinary power series” as generating functions. There are other ways toassociate a possibly infinite sequence with a power series. In later sectionswe explore a couple more types, the more important of which are the exp-ponential generating functions. But first we begin our exposition of formalpower series.

Exercise: 4.3.5 Given three types of objects O1, O2 and O3, let an be thenumber of combinations of these objects, n at a time, so that O1 is selected atmost 2 times, O2 is selected an odd number of times, and O3 is selected at leastonce. Determine the ordinary generating function of the sequence an∞n=0

and determine a recursion that is satisfied by the sequence. Compute a closedform formula for an. As always, be sure to include enough details so that Ican see a proof that your answers are correct.

4.4. FORMAL POWER SERIES 121

Exercise: 4.3.6 If O1 can appear any number of times and O2 can appearany multiple of k times, use Rule 5 to show that the number of combinationsof n objects must be an = 1 + bn

kc.

Exercise: 4.3.7 Put an = |(i, j) : i ≥ 0; j ≥ 0; i + 3j = n|. Determinethe ordinary generating function for the sequence an∞n=0 and determine anexplict value in closed form for an.

Exercise: 4.3.8 If O1 can appear an even number of times and O2 a multipleof 4 times, determine the number of combinations of n of the objects.

Exercise: 4.3.9 If O1 can appear an odd number of times, O2 an even num-ber of times, and O3 either 0 or 1 time, how many combinations of n of thesethree objects are there? (Find the enumerating generating function.)

4.4 Formal Power Series

If we want to be fairly general we define formal power series in one indeter-minate z over a commutative ring A with unity 1. Even though usually inour applications A is the field of complex numbers or perhaps even just thefield of real numbers, it really does not take much additional effort to assumemerely that A is a commutative ring with unity. On the other hand, oftenwhen power series are defined and their theory developed, series in n inde-terminates are discussed. For simplicity, we will use only one indeterminatein our “formal” development and in many applications. But the extension toseveral indeterminates is conceptually natural and will be more or less takenfor granted after a certain stage.

A power series in one indeterminate x over the commutative ring A withunity 1 is a formal sum

∞∑i=0

aixi = a0 + a1x+ · · ·+ anx

n + · · · , ai ∈ A.

To begin with, an∞n=0 is an arbitrary sequence in A; no question ofconvergence is considered. Two power series

∑aix

i and∑bix

i are equal iffai = bi for all i = 0, 1, 2, . . . .


Define addition and multiplication of two power series as follows:

(i)∑aix

i +∑bix

i =∑

(ai + bi)xi, and

(ii) (∑aix

i)(∑bix

i) =∑cix

i,

where ci =∑ij=0 ajbi−j.

As an example of multiplication, consider the product (1−x)∑∞n=0 x

n = 1.Be sure to check this out! It says that (1− x)−1 =

∑∞n=0 x

n.

The ring of all formal power series in x over the commutative ring A isdenoted by A[[x]].

Exercise: 4.4.1 Prove that with these definitions, if A is an integral domain,then the set A[[x]] of all formal power series in x over A is an integral domainwith unity equal to

∑∞i=0 aix

i, where a0 = 1, and ai = 0 for i > 0.

Proof: All the details are fairly routine. We include here only the detailsfor associativity of multiplication. So, to show[(∑

aixi)·(∑

bixi)] (∑

cixi)

=(∑

aixi) [(∑

bixi) (∑

cixi)]

we must show that the coefficient of xj on on the left hand side is equal to thecoefficient of xj on the right hand side. The coefficient of xj on the L.H.S.is∑jk=0 dkcj−k, where dk =

∑ki=0 aibk−i. This is

∑jk=0

(∑ki=0 aibk−i

)cj−k =∑

aibkcl, where this last sum is over all i, k, l for which i + k + l = j, 0 ≤i, k, l ≤ j. And the last equality is obtained by observing that each termof the R.H.S. of this last equality appears exactly once on the left, and viceversa. Similarly, the coefficient of xj on the R.H.S. of the main equality tobe established is also equal to

∑aibkcl, where this sum is over all i, j, k for

which i+ k + l = j, 0 ≤ i, k, l ≤ j.

We have established the following:

[xj](∑

aixi)(∑

bixi)(∑

cixi)

=∑

aibkcl (4.8)

where the sum is over all i, k, l for which i+ k + l = j, 0 ≤ i, k, l ≤ j.This suggests more general formulas, but we write down only one special

case:

4.4. FORMAL POWER SERIES 123

[xn]

( ∞∑i=0

aixi

)k =∑

an1an2 · · · ank(4.9)

where the sum is over all (n1, n2, . . . , nk) for which ni ≥ 0 and n1 +n2 + · · ·+nk = n.

Def.n Let f(x) =∑aix

i be a nonzero element of A[[x]]. Then there mustbe a smallest integer n for which an 6= 0. This n is called the order of f(x)and will be denoted by o(f(x)). If n = o(f(x)), then an will be called theinitial term of f(x). We say that the order of the zero power series (i.e., thezero element of the ring A[[x]]) is +∞.

Theorem 4.4.2 If f, g ∈ A[[x]], then

1. o(f + g) ≥ mino(f), o(g);

2. o(fg) ≥ o(f) + o(g).

Furthermore, if A is an integral domain, then A[[x]] is an integral domainand o(fg) = o(f) + o(g).

Proof: Easy exercise.

Theorem 4.4.3 If f(x) =∑∞i=0 aix

i ∈ A[[x]], then f(x) is a unit in A[[x]]iff a0 is a unit in A.

Proof: If f(x)g(x) = 1, then (with g(x) =∑bix

i), a0b0 = 1, implying a0

is a unit in A. Conversely, suppose a0 is a unit in A. Then there is a b0 ∈ Afor which a0b0 = 1. And there is a (unique!) b1 ∈ A for which a0b1+a1b0 = 0,i.e., b1 = a−1

0 (−a1b0) = −a1b20. Proceed inductively. Suppose b0, b1, . . . , bn

have been determined so that∑ji=0 aibj−i = 0 for j = 1, 2 . . . , n. Then put

bn+1 = −a−10 (a1bn + · · ·+ an+1b0) = −b0(a1bn + · · ·+ an+1b0). By induction,

g(x) =∑∞i=0 bix

i ∈ A[[x]] is constructed (uniquely!) so that f(x)g(x) = 1.

Theorem 4.4.4 If K is a field, then the units of K[[x]] are the power seriesof order 0, i.e., with nonzero constant term.

The next result is inserted here for the algebraists. It will not be neededin this course.


Theorem 4.4.5 Let K be a field. Then the ring K[[x]] has a unique maximalideal (generated by x), and the nontrivial ideals are all powers of this one.

Theorem 4.4.6 Let K be a field, and let f(x), g(x) ∈ K[[x]] with f(x) 6= 0.Then there is a unique power series q(x) ∈ K[[x]] and there is a unique poly-nomial r(x) ∈ K[x] such that g(x) = q(x)f(x) + r(x) and either deg(r(x)) <o(f(x)) or r(x) = 0.

Proof: Let g(x) =∑aix

i, f(x) =∑bix

i, where h = o(f(x)) < ∞. Putr(x) = a0 + a1x+ · · ·+ ah−1x

h−1. Then g(x)− r(x) = anxn + an+1x

n+1 + · · · ,where an 6= 0 and n ≥ h. Now f(x) = bhx

h + bh+1xh+1 · · · = xh(bh +

bh+1x+ · · ·), where bh + bh+1x+ · · · is invertible with a unique inverse v(x) ∈K[[x]]. Then g(x) − r(x) = xn(an + an+1x + · · ·) = xn−h(an + an+1x +· · ·)xh = xn−h(an+an+1x+ · · ·)xh(bh+ bh+1x+ · · ·)v(x) = xn−h(an+an+1x+· · ·)v(x)f(x). So we put q(x) = xn−h(an + an+1x + · · ·)v(x). Then g(x) =q(x)f(x) + r(x) where r(x) = 0 or deg(r(x)) < o(f(x)). Now suppose therecould be two power series q1(x), q2(x) and two polynomials r1(x), r2(x), withg(x) = qi(x)f(x)+ri(x), where deg(ri(x)) < o(f(x)) or ri = 0, i = 1, 2. Then(q1(x) − q2(x))f(x) = r2(x) − r1(x). Clearly q1(x) = q2(x) iff r1(x) = r2(x),since K is a field and f(x) 6= 0. So suppose q2(x)−q1(x) 6= 0 6= r1(x)−r2(x).Then deg(r2(x)−r1(x)) < o(f(x)) ≤ o((q1(x)−q2(x))f(x)) = o(r2(x)−r1(x)).This is impossible, since no nonzero polynomial can have degree less than itsorder. Hence q1(x) = q2(x) and r1(x) = r2(x).

A very common technique used to obtain identities that at first glancelook impossible to prove is to calculate the coefficient on xn in two differentexpressions for the same formal power series. Here are some exercises thatoffer practice using this method.


m+1∑r=0

(−1)r(m+ 1

r

)(m+ n− r

m

)=

1, if n = 0;0, if n > 0.

Hint: Recall that

1 = (1− z)m+1 ·∞∑k=0

(m+ k

k

)zk,

and compute the coefficient on xn.

4.5. COMPOSITION OF POWER SERIES 125

Exercise: 4.4.8 Compute the coefficient of x2n in both sides of

(1− x2)2n =2n∑r=0

(−1)r(

2n

r

)x2r

to show that2n∑k=0

(−1)k(

2n

k

)2

= (−1)n(

2n

n

).


n+1∑k=0

[(n

k

)−(

n

k − 1

)]2

=2

n+ 1

(2n

n

).

Hint: Compute the coefficient of xm on both sides of

[(1− x)(1 + x)n] ·[(1− 1

x)(1 + x)n

]= (−1

x+ 2− x)(1 + x)2n.

4.5 Composition of Power Series

For each n ≥ 0 let fn(x) =∑∞i=0 cnix

i be a formal power series. Suppose thatfor fixed i, only finitely many cni could be nonzero. Say cni = 0 for n > ni.Then we can formally define

∞∑n=0

fn(x) =∞∑i=0

(ni∑n=0

cni

)xi.

By hypothesis this formal sum of infinitely many power series involvesthe sum of only finitely many terms in computing the coefficient of any givenpower of x. This definition allows the introduction of substition of one powerseries b(x) for the “variable” x of a second power series a(x), at least wheno(b(x)) ≥ 1.

If a(x) =∑∞n=0 anx

n, and if b(x) =∑∞n=1 bnx

n, i.e., b0 = 0, then thepowers bn(x) := (b(x))n satisfy the condition for formal addition, i.e.,

a(b(x)) :=∞∑n=0

anbn(x).


As an example, let a(x) := (1− x)−1, b(x) = 2x− x2. Then formallyh(x) := a(b(x)) = 1 + (2x− x2) + (2x− x2)2 + · · ·

= 1 + 2x+ 3x2 + 4x3 + · · ·= (1− x)−2.

The middle equality is a bit mysterious. It follows from (legitimate)algebraic manipulation:

a(b(x)) = (1− (2x− x2))−1 = (1− x)−2.

If we try to verify it directly we find that the coefficient on xn in 1+(2x−x2) + (2x− x2)2 + · · · is

n∑j=0

(−1)n−j22j−n(

j

2j − n

),

which we must then show is equal to n + 1. This is quite tricky to showdirectly, say by induction.

From analysis we know that 1 + 2x+ 3x2 + 4x3 + · · · is the power seriesexpansion of (1− x)−2, so this must be true in C[[x]]. Indeed we have

(1− x)h(x) =∞∑n=0

xn = (1− x)−1.

If b(x) =∑∞n=0 bnx

n is a power series with o(b(x)) = 1, i.e., b0 = 0and b1 6= 0, we can also find the (unique!) inverse function as a powerseries. We “solve” the equation b(a(x)) = x by substitution, assuming thata(x) =

∑∞n=1 anx

n with a0 = 0. Then

x = b1(∞∑n=1

anxn) + b2(

∞∑n=1

anxn)2 + b3(

∞∑n=1

anxn)3 + · · ·

From this we find b1a1 = 1, b1a2 + b2a21 = 0, b1a3 + b2(a1a2 + a2a1) +

b3(a1a1a1) = 0, etc. In general the zero coefficient on xn (for n > 1) mustequal an expression that starts with b1an and for which the other termsinvolve the coefficients b1, . . . , bn and coefficients ak with 1 ≤ k < n. Hencerecursively we may solve for an starting with a1. This compositional inverse ofb(x) will be denoted b[−1](x) to distinguish it from the multiplicative inverseb−1(x) (sometimes denoted b(x)−1) of b(x). Note that for b(x) ∈ A[[x]],b[−1](x) exists iff o(b(x)) = 1 and b(x)−1 exists iff o(b(x)) = 0 and b(0)−1

4.6. THE FORMAL DERIVATIVE AND INTEGRAL 127

exists in A. Also note that if b(a(x)) = x, and we put y = a(x), thena(b(y)) = a(b(a(x))) = a(x) = y, i.e., a(b(y)) = y, so if b is a ‘left’ inverse ofa, it is also the unique ‘right’ inverse of a.

Note that certain substitutions that make perfect sense in analysis areforbidden within the present theory. For suppose b(x) = 1+x, i.e., o(b(x)) =0. If a(x) = ex =

∑∞n=0

xn

n!, then we are not allowed to substitute 1+x in place

of x in the formula for ex to find the power series for e1+x. If we try to do thisanyway, we see that e1+x would appear as

∑∞n=0

(1+x)n

n!=∑∞n=0

1n!

∑nj=0

(nj

)xj,

which has infinitely many nonzero contributions to the coefficient on xi foreach i. This is not defined.

4.6 The Formal Derivative and Integral

Let f(x) =∑i≥0 cix

i and g(x) =∑j≥0 djx

j be two power series. We definethe formal derivative f ′(x) by:

f ′(x) =∑i≥0

(i+ 1)ai+1xi. (4.10)

It is now easy to show that the Sum Rule holds:

(f(x) + g(x))′ = f ′(x) + g′(x), (4.11)

and

(f(x)g(x))′ =∑i,j≥0(i+ j)cidjx

i+j−1

=∑i,j≥0 icidjx

i−1+j +∑i,j≥0 jcidjx

j−1+i,

which proves the Product Rule:

(f(x)g(x))′ = g(x)f ′(x) + f(x)g′(x). (4.12)

Suppose that o(g(x)) = 1, so that the composition f(g(x)) is defined.The Chain Rule

(f(g(x))′ = f ′(g(x))g′(x) (4.13)


is established first for f(x) = xn, n ≥ 1, by using induction on n along withthe product rule, and then by linear extension for any power series f(x).

If f(0) = 1, start with the equation f−1(x)f(x) = 1. Take the formalderivative of both sides, applying the product rule. It follows that (f−n(x))′ =−nf−n−1(f ′(x)), for n ≥ 1. But now given f(x) and g(x) as above withg(0) = 1, we can use this formula for (f−1(x))′ = −f ′(x)/f2(x) with theproduct rule to establish the Quotient Rule:

(f(x)/g(x))′ =g(x)f ′(x)− f(x)g′(x)

g2(x). (4.14)

If R has characteristic 0 (so we can divide by any positive integral mul-tiple of 1), the usual Taylor’s Formula in one variable (well, MacLaurin’sformula) holds:

f(x) =∑n≥0

f (n)(0)xn

n!. (4.15)

If R has characteristic zero and f , g ∈ R[[x]], then

f ′ = g′ and f(0) = g(0) ⇔ f(x) = g(x). (4.16)

In order to define the integral of f(x) we need to assume that the char-acteristic of A is zero. In that case define the formal integral Ixf(x) by

Ixf(x) =∫ x

0f(x)dx =

∑i≥1

i−1ci−1xi. (4.17)

It is easy to see that∫ x0 (f(x) + g(x))dx =

∫ x0 f(x)dx +

∫ x0 g(x)dx. The

following also follow easily:

(Ixf(x))′ = f(x); If F ′(x) = f(x), then∫ x

0f(x) = F (x)− F (0). (4.18)

4.7. LOG, EXP AND BINOMIAL POWER SERIES 129

4.7 Log, Exp and Binomial Power Series

In this section we assume that R is an integral domain with characteristic 0.The exponential series is

ex = exp(x) =∑j≥0

xj

j!. (4.19)

The logarithmic series is

log((1− x)−1) =∑j≥1

xj

j. (4.20)

If y is also an indeterminate, then the binomial series is

(1 + x)y = 1 +∑j≥1

y(y − 1) · · · (y − j + 1)xj

j!=

=∑j≥0

(yj

)xj ∈ (R[y])[[x]]. (4.21)

If o(f(x)) ≥ 1, the compositions of these functions with f are defined.So exp(f), log(1 + f) and (1 + f)y are defined. Also, any element of R[[x]]may be substituted for y in (1 + x)y. Many of the usual properties of these‘analytic’ functions over C hold in the formal setting. For example:

(ex)′ = ex

(log((1− x)−1))′ = (1− x)−1

((1 + x)y)′ = y(1 + x)y−1

The application of the chain rule to these is immediate except possiblyin the case of the logarithm.

We ask: When is it permissible to compute log(f(x)) for f(x) ∈ R((x))?

To determine this, note that log(f(x)) = log(

11−(1−f−1)

)=∑j≥1

(1−f−1)j

j,

where this latter expression is well-defined provided o(1 − f−1) ≥ 1. So inparticular we need o(f(x)) = 0 and f(0) = 1. In this case Dxlogf(x) =f(x)Dx(1− f−1(x)), and


(log(f(x))′ = f ′(x)/f(x) (4.22)

By the chain rule, Dxlog(ex) = (ex)−1ex = 1 = Dxx, and log(ex)|x=0 =

0 = x|x=0, which implies that

log(ex) = x (4.23)

Similarly, using both the product and chain rules,

Dx(1− x)exp(log((1− x)−1))

= −exp(log((1− x)−1)) + (1− x)(1− x)−1exp(log((1− x)−1)) = 0,

so that

(1− x)exp(log((1− x)−1)) = 1,

and

exp(log((1− x)−1)) = (1− x)−1. (4.24)

Again, this is because both (1−x)exp(log((1−x)−1)) and 1 have derivative0 and constant term 1.

Now consider properties of the binomial series. We have already seen thatfor positive integers n:

(1 + x)n =∑j≥0

(nj

)xj. (4.25)

This is the binomial expansion for positive integers. Thus for positiveintegers m and n, [xk] can be applied to the binomial series expansion ofthe identity (1 + x)m(1 + x)n = (1 + x)m+n, giving the VandermondeConvolution

k∑i=0

(ni

)(mk − i

)=

(m+ nk

). (4.26)

4.7. LOG, EXP AND BINOMIAL POWER SERIES 131

If f(x) is a polynomial in x of degree k, and the equation f(x) = 0has more than k roots, then f(x) = 0 identically. Thus the polynomial(y + zk

)−∑k

i=0

(yi

)(z

k − i

)in indeterminates y and z must be identi-

cally 0, since it has an infinite number of roots, namely all positive integers.Accordingly we have the binomial series identity

(1 + x)y(1 + x)z = (1 + x)y+z. (4.27)

Substitution of −y for z yields (1 + x)y(1 + x)−y = (1 + x)0 = 1, so

((1 + x)y)−1 = (1 + x)−y. (4.28)

This allows us to prove that

log((1 + x)y) = ylog(1 + x) (4.29)

by the following differential argument:

Dx(log((1 + x)y)) = (1 + x)−yy(1 + x)y−1

= y(1 + x)−1 = Dx(ylog(1 + x)),

and

log((1 + 0)y) = 0 = ylog(1 + 0).

Combining these results gives

(1 + x)yz = exp(log(1 + x)yz) = exp(zlog((1 + x)y))

= exp(zylog(1 + x)) = exp(log((1 + x)yz)),

so

(1 + x)yz = (1 + x)yz. (4.30)

By the binomial theorem,


exp(x+ y) =∑n≥0

(x+ y)n

n!=∑n≥0

n∑i=0

xi

i!

yn−i

(n− i)!,

so

exp(x+ y) = (exp x)(exp y). (4.31)

The substitution of −x for y yields exp(0) = (exp(x))(exp(−x)), and wehave

(exp(x))−1 = exp(−x). (4.32)

By making the substitution x = f , for f ∈ R[[x]] with o(f(x)) ≥ 1,and y = g for any g ∈ R[[x]], in the preceding results, we obtain many ofthe results that are familiar to us in terms of the corresponding analyticfunctions. The only results that do not hold for the formal power series arethose that correspond to making inadmissible substitutions. For example, itis not the case that exp(log(x)) = x, since log(x) does not exist as a formalpower series.

HERE ARE TWO MORE OFTEN USED POWER SERIES:

sin(x) =∑∞k=0(−1)kx2k+1/((2k + 1)!)

cos(x) =∑∞k=0(−1)kx2k/((2k)!)

It is a good exercise to check out the usual properties of these formalpower series.

4.8 Exponential Generating Functions

Recall that P (n, k) is the number of k-permutations of an n-set, and P (n, k) =n!/(n − k)! = n(n − 1) · · · (n − k − 1). The ordinary generating function ofthe sequence P (n, 0), P (n, 1), . . . is

G(x) = P (n, 0)x0 + P (n, 1)x1 + · · ·+ P (n, n)xn.

Also recall the similar binomial expansion

C(x) = C(n, 0)x0 + · · ·+ C(n, n)xn = (1 + x)n.

4.8. EXPONENTIAL GENERATING FUNCTIONS 133

But we can’t find a nice closed form for G(x). On the other hand,P (n, r) = C(n, r)r!, so the equation for C(x) can be written

P (n, 0)x0/0! + P (n, 1)x1/1! + P (n, 2)x2/2! + · · ·+ P (n, n)xn/n! = (1 + x)n,

i.e.,

n∑k=0

P (n, k)xk/k! = (1 + x)n.

So P (n, k) is the coefficient of xk/k! in (1 + x)n. This suggests anotherkind of generating function - to be called the exponential generating function- as follows: If an is a sequence, the exponential generating function for thissequence is

H(x) =∞∑n=0

anxn/n!

EXAMPLE 1. ak = 1, 1, 1, . . . has H(x) =∑xk/k! = ex as its

exponential generating function.EXAMPLE 2. From above, we already see that (1 + x)n is the expo-

nential generating function of the sequence P (n, 0), P (n, 1), · · ·. Then theexponential generating funcion of 1, α, α2, . . . is

H(x) =∞∑k=0

αkxk/k! =∞∑k=0

(αx)k/k! = eαx.

Now suppose we have k kinds of letters in an alphabet. We want to forma word with n letters using i1 of the 1st kind, i2 of the second kind, . . . , ikof the kth kind. The number of such words is

p(n; i1, . . . , ik) =

(n

i1, . . . , ik

)= n!/(i1! · · · ik!).

Consider the product:

(1 +O1x1/1! +O2

1x2/2! +O3

1x3/3! + · · ·) · · · (1 +Okx

1/1! +O2kx

2/2! + · · ·).

The term involving Oi11 O

i22 · · ·Oik

k is (if we put n = i1 + i2 + · · ·+ ik)


(Oi11 x

i1/i1!)(Oi22 x

i2/i2!) · · · (Oikk x

ik/ik!)

= Oi11 · · ·O

ikk (xi1+···+ik)/(i1! · · · ik!) = Oi1

1 · · ·Oiik (xn/n!)(n!/(i1! · · · ik!))

= Oi11 · · ·O

iik p(n; i1, · · · , ik)xn/n!

The complete coefficient on xn/n! is∑i1+···+ik=nO

i11 · · ·Oik

k

(n

i1, · · · , ik

), provided that there is no restriction on

the number of repetitions of any given object, except that i1 + · · ·+ ik = n.And the various Oj really do not illustrate the permutations, so we placeeach Oj equal to 1. Also, for the object Oj, if there are restrictions on thenumber of times Oj can appear, then for its generating function we includeonly those terms of the form xm/m! if Oj can appear m times. Specifically,let O be an object (i.e., a letter) to be used. For the exponential generatingfunction of O, use

∑∞k=0 akx

k/k!, where ak is 1 or 0 according as O can appearexactly k times or not.

EXAMPLE 3: Suppose O1 can appear 0, 2 or 3 times, O2 can appear4 or 5 times, and O3 can be used without restriction. The the exponentialgenerating functions for O1, O2, O3 are:

(1 + x2/2! + x3/3!)(x4/4! + x5/5!)(1 + x+ x2/2! + · · ·)

Theorem 4.8.1 Suppose we have k kinds of objects O1, · · · , Ok. Let fj(x)be the exponential generating function of the object Oj determined as aboveby whatever restrictions there are on the number of occurrences allowed Oj.Then the number of distinct permutations using n of the objects (i.e., words oflength n), subject to the restrictions used in determining f1(x), f2(x), . . . , fk(x),is the coefficient of xn/n! in f1(x) · · · fk(x).

EXAMPLE 4. Suppose there are k objects with no restrictions on repeti-tions. So each individual exponential generating function is

∑∞k=0 = ex. The

complete exponential generating function is then

(ex)k = ekx =∞∑n=0

(kx)n/n! =∞∑n=0

(kn)(xn/n!).

4.9. FAMOUS EXAMPLE: BERNOULLI NUMBERS 135

But we already know that kn is the number of words of length n with ktypes of objects and all possible repetitions allowed.

EXAMPLE 5. Again suppose there are k types of object, but that eachobject must appear at least once. So the exponential generating function is(ex − 1)k. The coefficient of xn/n! in

(ex − 1)k =k∑j=0

(kj

)ejx(−1)k−j =

k∑j=0

(kj

)(−1)k−j

∞∑n=0

(jx)n/n!

=∞∑n=0

k∑j=0

(kj

)(−1)k−jjn

xn/n!

is

k∑j=0

(kj

)(−1)k−jjn.

This proves the strange result:The number of permutations of n objects of k types, each type appearing

at least once isk∑j=0

(kj

)(−1)k−jjn.

Def. The symbol fegf↔ an∞0 means that the power series f is the expo-

nential generating function of the sequence an∞0 , i.e., that f =∑n≥0 an

xn

n!.

So suppose fegf↔ an∞0 . Then f ′ =

∑∞n=1 an

xn−1

(n−1)!=∑∞n=0 an+1

xn

n!, i.e.,

f ′egf↔ an+1∞0 . By induction we have an analogue to Rule 1:

Rule 1′: If fegf↔ an∞0 , then for h ∈ N , Dhf

egf↔ an+h∞n=0.

4.9 Famous Example: Bernoulli Numbers

Define Bn, n ≥ 0, byx

ex − 1=

∞∑n=0

Bnxn

n!.

The defining equation for Bn is equivalent to


1 =

( ∞∑k=0

xk

(k + 1)!

)( ∞∑k=0

Bkxk

k!

).

Recursively we can solve for the Bk using this equation. But first noticethe following: Replace x by −x in the egf for Bn:

∞∑k=0

Bk(−x)k

k!=

−xe−x − 1

=xex

ex − 1.

Sox

ex − 1− xex

ex − 1= −x =

∞∑k=0

Bk

[1− (−1)k

k!

]xk.

This implies that

−x = B0 · 0 +B1 ·2

1x+B2 · 0 · x2 +B3 ·

2

3!x3 +B4 · 0 · x4 + · · ·

which implies that

B1 = −1

2and B2k+1 = 0 for k ≥ 1.

Then recursively from above we find B0 = 1; B1 = −12; B2 = 1

6; B4 = − 1

30;

B6 = 142, . . . ,.

A famous result of Euler is the following:

ζ(2k) =∞∑n=1

1

n2k=

(−1)kπ2k · 22k−1

(2k − 1)!

(−B2k

2k

), k = 1, 2, . . . .

In particular,

ζ(2) =π2

6.

Bernoulli originally introduced the Bn to give a closed form formula for

Sn(m) = 1n + 2n + 3n + · · ·+mn.

On the one hand

4.10. FAMOUS EXAMPLE: FIBONACCI NUMBERS 137

x(emx − 1)

ex − 1=(

x

ex − 1(emx − 1

)=

( ∞∑k=0

Bkxk

k!

) ∞∑j=1

mjxj

j!

=

=∞∑n=0

[m∑i=1

Bn−i

(n− i)!· m

i

i!

]xn =

∞∑n = 0

(n∑i=1

(n

i

)Bn−im

i

)xn

n!.

(The coefficient on x0

0!is 0.)

On the other hand:

x(emx − 1)

ex − 1= x

(emx − 1

ex − 1

)= x(e(m−1)x + e(m−2)x + · · ·+ ex + 1) =

= xm−1∑j=0

∞∑r−0

jrxr

r!

=∞∑r = 0

xr+1

r!

m−1∑j=0

jr

=

=∞∑r=0

Sr(m− 1)xr+1

r!=

∞∑n=1

Sn−1(m− 1)nxn

n!.

Equating the coefficients of xn

n!we get:

n∑i=1

(n

i

)Bn−im

i = Sn−1(m− 1)n, n ≥ 1,

orn+1∑i=1

(n+ 1

i

)Bn+1−i(m+ 1)i = Sn(m) · (n+ 1), n ≥ 0.

So Bernoulli’s formula is:

Sn(m) = 1n + 2n + · · ·+mn =n+1∑

i = 1

(n+ 1

i

)Bn+1−i

(m+ 1)i

(n+ 1).

4.10 Famous Example: Fibonacci Numbers

Let Fn+1 = Fn + Fn−1 for n ≥ 0, and put F−1 = 0, F0 = 1. Put fegf↔

Fn∞0 , i.e., f =∑∞n=0 Fn

xn

n!. By Rule 1′ we have f ′

egf↔ Fn+1∞n=0 and


f ′′egf↔ Fn+2∞n=0. Use the recursion given in the form Fn+2 = Fn+1 + Fn,

n ≥ 0. So by Rule 1′ we have f ′′ = f ′ + f . From the theory of differentialequations we see that f(x) = c1e

r+x + c2er−x, where r± = 1±

√5

2, and where

c1 and c2 are to be determined by the initial conditions: f(0) = F0 = 1, andf ′(0) = 1!F1 = 1. Then f(0) = c1 + c2 = 1 and f ′(0) = r+c1 + r−c2 = 1. So(

11

)=

(1 1r+ r−

)(c1c2

)⇒

c1 =

∣∣∣∣∣ 1 11 r−

∣∣∣∣∣∣∣∣∣∣ 1 1r+ r−

∣∣∣∣∣=

r− − 1

r− − r+=

−1−√

52

−√

5=

1 +√

5

2√

5=

r+√5.

Similarly,

c2 =

∣∣∣∣∣ 1 1r+ 1

∣∣∣∣∣r− − r+

=−r−√

5.

So f = 1√5(r+e

r+x − r−er−x) = 1√

5

(r+∑∞n=0(r+)n x

n

n!− r−

∑∞n=0(r−)n x

n

n!

).

Then

Fn =[xn

n!

]f =

1√5(r+

n+1 − r−n+1). (4.33)

Suppose fegf↔ an∞0 . Then Df = f ′

egf↔ an+1∞n=0 and (xD)fegf↔

nan∞0 . f =∑an

xn

n!⇒ f ′ =

∑an

xn−1

(n−1)!⇒ xf ′ = (xD)f =

∑an

xn

(n−1)!=∑

nanxn

n!. So xf ′

egf↔ nan∞n=0. This leads easily to

4.11 Roots of a Power Series

We continue to assume that R is an integral domain with characteristic 0.

Occasionally we need to solve a polynomial equation for a a power series.As an example we consider the nth root. Let g(x) ∈ R[[x]] satisfy g(0) = αn,α ∈ R, α−1 ∈ R. We want to determine f ∈ R[[x]] such that

4.12. LAURENT SERIES AND LAGRANGE INVERSION 139

fn(x) = g(x) with f(0) = α. (4.34)

Then the unique such power series is(write α−ng(x) = ((α−ng(x)− 1) + 1))

f(x) = α(α−ng(x))1/n = α∑i≥0

(1/ni

)(α−ng(x)− 1)i, (4.35)

since (α−ng(x)− 1)i ∈ R[[x]] with o((α−ng(x)− 1)i) ≥ 1. This is a solutionsince

fn(x) = αn(α−ng(x))1/nn = αn(α−ng(x))1 = g(x),

from Eq. 4.30 and since f(0) = α. To extablish uniqueness, suppose that fand h are both solutions to Eq. 4.34, so that

0 = fn − hn = (f − h)(fn−1 + fn−2h+ · · ·+ fhn−2 + hn−1).

Since R, and therefore R[[x]], has no zero divisors, then either f − h = 0or (fn−1 + fn−2h+ · · ·+ fhn−2 + hn−1) = 0. But

(fn−1 + fn−2h+ · · ·+ fhn−2 + hn−1)|x=0 = nαn−1 6= 0

since α 6= 0 and R has characteristic 0 and no zero divisors. Thus f = h andthe f of Eq. 4.34 is the unique solution to Eq. 4.35.

This result is used most frequently when f(x) satisfies a quadratic equa-tion with a given initial condition.

4.12 Laurent Series and Lagrange Inversion

Again in this section we assume that R is a field with characteristic 0, sothat the notation and results of the preceding sections will apply. Usuallythis field is the field of quotients of an integral doman obtained by startingwith the complex numbers and adjoining polynomials and/or power series insets of commuting independent variables.


The quotient field of R[[x]] may be identified with the set R((x)) of so-called Laurent series f(x) =

∑∞n=k anx

n, where k ∈ Z is the order o(f(x))provided ak 6= 0. When k ≥ 0 this agrees with the earlier definition oforder. We give the coefficient of x−1 a name familiar from complex analysis.If a(x) =

∑∞n=k anx

n, we say that a−1 is the residue of a(x). This will bewritten as Res a(x) = [x−1]a(x).

For a Laurent series f , the multiplicative inverse exists iff o(f) < ∞. Ifo(f) = k, then f = xkg where g ∈ C((x)) has o(g) = 0. In this case we define

f−1 = x−kg−1.

Since the usual quotient formula for derivatives of power series holds, it isstraightforward to carry the theory of the derivative over to Laurent series.

The following facts are then easily proved (Exercises!):

Exercise: 4.12.1 If w(x) is a Laurent series, then(R1) Res(w′(x)) = 0;and

(R2) [x−1]w′(x)w(x)

= o(w(x)).

We have already mentioned the idea of an “inverse function” of a powerseries with order 1 and have shown how to compute its coefficients recursively.The next theorem gives an expression for the coefficients.

Theorem 4.12.2 Let W (x) = w1x+w2x2 + · · · be a power series with w1 6=

0. Let Z(w) = c1w+c2w2+· · · be a power series in w such that Z(W (x)) = x.

Then

cn = Res

(1

nW n(x)

). (4.36)

Proof: From our computations above we see that c1 = w−11 . Now apply

formal derivation to Z(W (x)) = x. This yields:

1 =∞∑k=1

kckWk−1(x)W ′(x). (4.37)


Consider the series obtained by dividing this equation by nW n(x):

[x−1]

1

nW n(x)

=

= [x−1]

cnW ′(x)

W (x)

+ [x−1]

∑k≥1:k 6=n

kcknW k−1−n(x)W ′(x)

If n 6= k, then the term W k−1−n(x)W ′(x) is a derivative by the chain rule

and hence has residue 0 by (R1). Now apply (R2) to the term with n = kto see that the residue of the R.H.S. is equal to cn · o(W (x)) = cn, provingthe theroem.

Practice using the previous theorem on the following exercises:

Exercise: 4.12.3 (i) If w = W (z) =∑∞n=1 z

n, use the previous theoremto compute z = Z(w). Check your result by expressing z and w as simplerational functions of each other.

(ii) Put w = W (z) = z(1−z)2 . Use the previous theorem to compute z =

Z(w).

(iii) Put w = W (z) = z(1−z)2 . Use the “quadratic formula” to solve for z

as a function of w. Then use the binomial expansion (for (1 + t)12 with an

appropriate t to solve for z as a function of w.

(iv) Show that the two answers you get for parts (ii) and (iii) are in factequal.

Before proving our next result we need to recall the so-called Rule ofLeibniz:

Exercise: 4.12.4 Let D denote the derivative operator, and for a functionf (in our case a formal power series or Laurent series) let f (j) denote thejth derivative of f , i.e., Dj(f) = f (j).

(i) Prove that

Dn(f · g) =n∑i=0

(n

i

)f (i)g(n−i).


(ii) Derive as a corollary to part (i) the fact that

Dj(f 2) =∑

i1+i2=j

(j

i1, i2

)f (i1)f (i2).

(iii) Now use part (i) and induction on n to prove that

Dj(fn) =∑

i1+···+in=j

(j

i1, . . . , in

)f (i1) · · · f (in).

Theorem 4.12.5 (Special Case of Lagrange Inversion Formula) Let f(z) =∑∞i=0 fiz

i with f0 6= 0, so in particular f(z)−1 ∈ C[[z]]. If w = W (z) = zf(z)

(which is a power series with o(W (z)) = 1), then we can solve for z = Z(w)as a power series in w with order 1. Specifically,

z = Z(w) =∞∑n=1

cnwn, with

cn = Res

(fn(z)

nzn

)=

1

n!

(Dn−1fn

)(0).

Proof: Since f(z) =∑∞i=0 fiz

i with f0 6= 0, in C[[z]], W (z) = z∑∞i=0

fizi =1∑∞

i=0fizi−1 = (f0z

−1 + f1 + f2z + f3z2 + · · ·)−1 =

∑∞n=1wnz

n. Here f0 6= 0

implies that w1 6= 0. By Theorem 4.12.2, z =∑∞n=1 cnw

n, where

cn = Res

1

n(

zn

fn(z)

) = Res

(fn(z)

nzn

)=

1

n[zn−1]fn(z) =

1

n

∑fi1 · · · fin .

Here the sum is over all (i1, . . . , in) for which i1 + · · · in = n − 1 and ij ≥ 0for 1 ≤ j ≤ n. But now we need to evaluate

1

n!

(Dn−1fn

)(0) =

1

n!

∑(n− 1

i1, · · · , in

)f (i1) · · · f (in)(0),

where the sum is over all (i1, . . . , in) for which i1 + · · · in = n− 1 and in ≥ 0for 1 ≤ j ≤ n. In this expression f (ij) is the (ij)th derivative of f and when


evaluated at 0 yields fij · ij! by Taylor’s formula. Hence the sum in questionequals

1

n!

∑(n− 1)!i1!fi1 · · · in!fin

i1! · · · in!

=

1

n

∑fi1 · · · fin ,

as desired.

Let

f(x) =j∑i=1

a−ix−i + a0 +

∑i≥1

aixi;

f ′(x) =j∑i=1

−ia−ix−i−1 + 0 +∑i≥1

iaixi−1.

Similarly, let

g(x) =j∑i=1

b−ix−i + a0 +

∑i≥1

bixi;

g′(x) =j∑i=1

−ib−ix−i−1 + 0 +∑i≥1

ibixi−1.

Then it is easy to compute the following:

[x−1] f(x)g′(x) =j∑i=1

ia−ibi +k∑i=1

−iaib−i;

[x−1] f ′(x)g(x) =j∑i=1

−ia−ibi +k∑i=1

iaib−i = −[x−1] f(x)g′(x) .

Note that neither a0 nor b0 affects this value. Hence we may write forf, g ∈ R((x)),

[x−1]fg′ = −[x−1]f ′(g(x)− g(0)) (4.38)

When we use this a little later, g(w) = log(φ(w)), so g′(w) = φ′(w)/φ(w),and Eq. 4.38 then appears as


[w−1]f(w) · φ′(w) · φ−1(w)

= −[w−1]

f ′(w)

[log

(φ(w)

φ(0)

)]. (4.39)

The next result allows us to change variables when computing residues,and in some ways is the main result of this section, since the full LagrangeInversion formula follows from it.

Theorem 4.12.6 (Residue Composition) Let f(x), r(x) ∈ C((x)) and sup-pose that α = o(r(x)) > 0. We want to make the substitution x = r(z).

α[x−1]f(x) = [z−1]f(r(z))r′(z).

Proof: First consider f(x) = xn, −1 6= n ∈ Z. Then [z−1]rn(z)r′(z)= (n + 1)−1[z−1]

(ddz

)rn+1(z)

= 0 by (R1), since rn+1(z) ∈ C((z)). Also,

α[x−1]xn = 0.On the other hand, if n = −1, then [z−1]r′r−1 = o(r(z)) = α > 0. It

follows that for all integers n, [z−1]rn(z)r′(z) = αδn,−1 = α[x−1]xn. Nowlet f(x) =

∑n≥k anx

n (o(f(x)) = k <∞). Since o(r(z)) > 0, f(r(z)) exists,and we have

α[x−1]f(x) = [z−1]∑n≥k anrn(z)r′(z) = [z−1]f(r(z))r′(z).

As an application of Residue Composition we present the following prob-lem: Find a closed form formula for the sum

S =n∑k=0

(2n+ 12k + 1

)(j + k2n

).

We give the major steps as “little” exercises.

1. Put f(x) = 12x(1 + x)2n+1 − (1− x)2n+1. Show that

f(x) =n∑k=0

(2n+ 12k + 1

)x2k.

2. f((1 + y)1/2) =∑nk=0(1 + y)k[x2k]f(x).


3. [y2n]

(1 + y)j

∑nk=0(1 + y)k

(2n+ 12k + 1

)=

∑k

(2n+ 12k + 1

)(j + k2n

)= S. (Hint: At one stage you will have to use

the fact that∑m

(am

)(b

n−m

)=

(a+ bn

)for appropriate choices of

a, b,m, n. You might want to prove this as a separate step if you have notalready done so.)

So at this stage we have

S = [y2n]

(1 + y)j

n∑k=0

(1 + y)k[x2k]f(x)

=

= [y−1]y−(2n+1)(1 + y)jf((1 + y)1/2)

.

At this point we want to use Residue Composition using the substitutiony = y(z) = z2(z2−2), so o(y(z)) = 2, and y′(z) = 4z(z2−1). Also, (1+y)1/2 =(1− z2). Now use f((1 + y)1/2) = f(1− z2) = 1

2(1−z2)(2− z2)2n+1 − z4n+2

and Residue Composition to obtain S =

[z−1]

2

z−(4n+2)(z2 − 2)−(2n+1)(1− z2)2j (2− z2)2n+1 − z4n+2

2(1− z2)4z(z2 − 1)

which simplifies to

[z−1]

(z2 − 1)2j

(1

(z2 − 2)2n+1+

1

z4n+2

)z

=

[z−1](z2 − 1)2jz−(4n+1)

+ 0,

since 1(z2−2)2n+1 is a power series, so when multiplied by (z2−1)2j it contributes

nothing to [z−1]. Hence

S = [z4n](z2 − 1)2j

=

(2j2n

).


Theorem 4.12.7 (Lagrange Inversion) Let φ(x) ∈ C[[x]] with o(φ(x)) = 0.Hence φ−1(x) exists and w · φ−1(w) has order 1 in w. Put t = w · φ−1(w),i.e., w = tφ(w). We want to generalize the earlier special case of LagrangeInversion that found w as a function of t. Here we let f(t) be some Laurentseries about t and find the coefficients on powers of t in f(W (t)). Specifically,we have the following:

1. If f(λ) ∈ C((λ)), then

[tn]f(W (t)) =

1n[λn−1]f ′(λ)φn(λ), for 0 6= n ≥ o(f);

[λ0]f(λ)+ [λ−1]f ′(λ)log

(φ(λ)φ(0)

), for n = 0.

2. If F (λ) ∈ C[[λ]], then

F (w)

1− w · φ′(w)

φ(w)

−1

=∑n≥0

cntn, where cn = [λn]F (λ)φn(λ).

Proof: Let Φ(w) = wφ(w)

, so t = Φ(w) and o(Φ(w)) = 1, which implies that

Φ[−1](λ) exists. Here w = Φ[−1](t) is the unique solution w of w = tφ(w). Forany integer n : [tn]f(W (t)) = [t−1]t−(n+1)f(Φ[−1](t)). Now use ResidueComposition to substitute t = Φ(w) with α = 1 = o(Φ), and f(x) of theResidue Composition theorem is nowt−(n+1)f(Φ[−1](t)). Hence

[tn]f(W (t)) = − 1

n[w−1]f(w)(Φ−n(w))′ =

=1

n[w−1]f ′(w)Φ−n(w) =

1

n[w−1]

f ′(w) · φ

n(w)

wn

.

If n = 0, [tn]f(w) = [t0]f(w) = [w−1]Φ−1(w)f(w)Φ′(w) = [w−1]f(w)φ(w)

w

(φ(w)−wφ′(w)

φ2(w)

)=

[w−1]f(w)w− f(w)

(φ′(w)φ(w)

) =

[w0]f(w)+ [w−1]f ′(w)log

(φ(w)φ(0)

).

This completes the proof of 1. Now let F (λ) ∈ C[[λ]]. It follows thatF (λ)φ−1(λ) ∈ C[[λ]]. Hence we may put f(w) =

∫ w0 F (λ)φ−1(λ)dλ and know


that f(w) ∈ C[[w]]. Also, since f ′(λ) = F (λ)φ−1(λ), we see that F (w) =f ′(w)φ(w). By 1., f(w) = f(0)+

∑n≥1

1n[λn−1]φn(λ)f ′(λ)tn. Differentiate

this latter equality with respect to t:

f ′(w) · dwdt

=∑n≥1

[λn−1]φn(λ)f ′(λ)tn−1 =∑n≥0

[λn]φn+1(λ)f ′(λ)tn.

But w = t · φ(w) implies that dwdt

= φ(w) + t · φ′(w) · dwdt

, from which we seethat

dw

dt= φ(w)[1− tφ′(w)]

−1.

Putting this all together, we find

f ′(w)φ(w)[1− tφ′(w)]−1

= F (w)[1− tφ′(w)]−1

=

=∑n≥0

[λn]φn+1(λ)f ′(λ)tn =∑n≥0

[λn]φn(λ)F (λ)tn.

We write this finally in the form:

F (W (t))

1− tφ′(W (t))=∑n≥0

[λn]φn(λ)F (λ)tn.

The following example illustrates the above ideas and gives some idea ofthe power of the method.

Example of Inversion Formula Suppose that for all n ∈ Z we havethe following relation:

bn =∑k

(k

n− k

)ak. (4.40)

Then we want to show that

nan =∑k

(2n− k − 1n− k

)(−1)n−kkbk. (4.41)

The latter formula says nothing for n = 0, but the former says thata0 = b0. Multiply Eq. 4.40 by wn and sum over n:


B(w) =∑n

bnwn =

∑n

(∑k

(k

n− k

)ak

)wn =

∑k

ak∑n

(k

n− k

)wn =

∑k

akwk∑n

(k

n− k

)wn−k =

=∑k

akwk

k∑n=0

(kn

)wn =

∑k

akwk(1 + w)k =

= A(w + w2) = A(t),

where we have put t = w + w2 = w(1 + w), or w = t ·(

11+w

). So in the

notation of the Theorem of Lagrange, φ(w) = 11+w

. So if w = W (t) we wantto find A(t) = B(W (t)), i.e., we want the coefficients of A in terms of thecoefficients of B:

∑ant

n =∑bk(W (t))k. At this stage we can say:

an = [tn]∑k

bk(W (t))k =∑k

[tn](W (t))k.

In the notation of the theorem of Lagrange, put f(u) = uk, so that f ′(u) =kuk−1 and f(W (t)) = (W (t))k. So for n > 0,

[tn](W (t))k =1

n[λn−1]f ′(λ)φn(λ) =

=1

n[λn−1]

kλk−1

(1 + λ)n

=

(k

n

)[λn−k]

1

(1 + λ)n

=

=

(k

n

)[λn−k]

∞∑i=0

(−ni

)λi =

k

n

(−nn− k

)=

k

n(−1)n−k

(2n− k − 1n− k

).

This implies that an =∑kkn(−1)n−k

(2n− k − 1n− k

)bk, as desired.

Central Trinomial Numbers We shall use the second statement inthe Theorem of Lagrange to find the generating function of the “centraltrinomial numbers” cn defined by cn = [λn](1 + λ + λ2)n. Clearly cn =[λn]F (λ)φn(λ) where F (λ) = 1, φ(λ) = 1+λ+λ2. So φ′(λ) = 1+2λ. Part2 of the Theorem of Lagrange says that

4.13. EGF: A SECOND LOOK 149

∑n≥0

cntn = F (w)1− tφ′(w)−1,

where w = tφ(w) = (t(1+w+w2)). Hence tw2+(t−1)w+t = 0, implying that

w = 1−t−√

1−2t−3t2

2t. It is easy to compute that 1 − tφ′(w) =

√1− 2t− 3t2.

Now it follows that

∑n≥0

cntn = (1− 2t− 3t2)−1/2.

Exercise: 4.12.8 Show that cn =∑

n2≤i≤n

1·3·5···(2i−1)3n−i

(n−i)!(2i−n)!2n−i =∑n2≤i≤n

(ni

)(i

n− i

). (Hint: Remember that you now have cn described

in two different ways as a coefficient of a certain term in a power seriesexpansion of some ordinary generating function.)

4.13 EGF: A Second Look

Let M denote a “type” of combinatorial structure. Let mk be the numberof ways of giving a labeled k-set such a structure. In each separate case weshall specify whether we take m0 = 0 or m0 = 1. Then define

M(x) =∞∑k=0

mkxk

k!.

Consider a few examples. If T denotes the structure “labeled tree,” thenas we saw above, T (x) =

∑∞k=0 k

k−2 xk

k!. Similarly, if S denotes the structure

“a set” (often called the “uniform structure”), then sk = 1 for all k ≥ 0, so

S(x) =∑∞k=0

xk

k!= ex. If C denotes “oriented circuit,” then ck = (k − 1)! for

k ≥ 1. Put c0 = 0. Then C(x) =∑∞k=1

xk

k= log

(1

1−x

)= −log(1 − x). If Π

denotes the structure “permutation,” then Π(x) =∑∞k=0 k!

xk

k!=∑∞k=0 x

k =1

1−x .

Suppose we wish to consider the number of ways a labeled n-set can bepartitioned into two parts, one with a structure of type A and the other with astructure of type B. The number of ways to do this is clearly

∑nk=0

(nk

)akbn−k.

It follows that if we call this a structure of type A ·B, then


(A ·B)(x) =∞∑n=0

(n∑k=0

(n

k

)akbn−k

)xn

n!= A(x) ·B(x).

Famous Example: Derangements again Let D denote the structure“derangement.” Any permutation consists of a set of fixed points (interpretedas a set) and a derangement on the remaining points. Hence we have Π(x) =S(x) ·D(x), i.e., (1− x)−1 = D(x) · ex, implying

D(x) = e−x(1− x)−1 =∞∑k=0

(−1)kxk

k!·∞∑j=0

xj =

=∞∑n=0

(n∑k=0

(−1)k

k!· 1)xn =

∞∑n=0

(n!

n∑k=0

(−1)k

k!

)xn

n!.

It follows that we get the usual formula:

dn = n!n∑k=0

(−1)k

k!.

EXAMPLE 6. In how many ways can a labeled n-set be split into anumber of pairs and a number of singletons?

First, let pn be the number of ways to split an n-set into pairs. Clearly,if n is odd, pn = 0. By convention we say that p0 = 1. Suppose n = 2k ≥ 2.Pick a first element a1 in 2k ways, and then the second element in 2k − 1ways, the third in 2k−2 ways, etc., so that a1, a2, a3, a4, · · · , a2k−1, a2kis chosen in (2k)! ways. But the same pairs could be chosen in k! orders, andeach pair in two ways, so that

p2k =(2k)!

2kk!=

(2k)(2k − 1)(2k − 2)(2k − 3) · · · 12kk!

=2kk!(2k − 1)!!

2kk!= (2k − 1)!!

Here (2k − 1)!! = (2k − 1)(2k − 3)(2k − 5) · · · 1, with (2 · 0 − 1)!! = 1 byconvention. Then we find

P (x) :=∞∑n=0

pnxn

n!=

∞∑k=0

p2kx2k

(2k)!


=∞∑k=0

(2k − 1)!!x2k

(2k)!=

∞∑k=0

x2k

2kk!

=∞∑k=0

(x2

2

)kk!

= ex2

2 .

The number of ways to pick n singletons from an n-set is 1, i.e., thecorresponding egf is S(x) = ex. Hence

(P · S)(x) = P (x) · S(x) = exp(1

2x2) · exp(x) = exp(x+

1

2x2).

We can also obtain the same result by using a recursion relation. Denotethe structure P ·S byB. In the set 1, . . . , n we can either let n be a singletonor make a pair x, n with 1 ≤ x ≤ n− 1. So bn = bn−1 +(n− 1)bn−2, n ≥ 1.As b1 = 1 by definition, and b1 = b0 according to the recursion, it must bethat b0 = 1. Multiply the recursion by xn−1

(n−1)!for n ≥ 1 and sum.

∞∑n=1

bnxn−1

(n− 1)!=

∞∑n=1

bn−1xn−1

(n− 1)!+

∞∑n=1

(n− 1)bn−2xn−1

(n− 1)!.

Also B(x) =∑∞n=0 bn

xn

n!implies

B′(x) =∞∑n=1

bnxn−1

(n− 1)!.

This implies that

B′(x) = B(x) + xB(x) = (1 + x)B(x).

Since B(0) = 1, the theory of differential equations shows that B(x) =exp(x+ 1

2x2).

Example 7. Recall (Theorem 1.7.1) that the Stirling number S(n, k) ofthe second kind, the number of partitions of an n-set into k nonempty blocks,satisfies the following recursion:

S(n, k) = kS(n−1, k)+S(n−1, k−1), n ≥ k; S(n, k) = 0 for n < k. (4.42)


Multiply Eq. 4.42 by xn−1

(n−1)!and sum over n ≥ k:

∑n≥k

S(n, k)· xn−1

(n− 1)!=∑n≥k

k(S(n−1, k))· xn−1

(n− 1)!+∑n≥k

S(n−1, k−1)xn−1

(n− 1)!.

Put Fk(x) =∑n≥k S(n, k)x

n

n!. Then F ′

k(x) =∑n≥k S(n, k) xn−1

(n−1)!and

∑n≥k

k(S(n− 1, k))xn−1

(n− 1)!= k ·

∑n≥k+1

S(n− 1, k)xn−1

(n− 1)!= k

∑n≥k

S(n, k)xn

n!.

Also

∑n≥k

S(n− 1, k − 1)xn−1

(n− 1)!=

∑n−1≥k−1

S(n− 1, k − 1)xn−1

(n− 1)!= Fk−1(x).

The preceding says that

F ′k(x) = kFk(x) + Fk−1(x). (4.43)

We now use induction on k in Eq. 4.43 to prove the following:

Theorem 4.13.1 ∑n≥k

S(n, k)xn

n!=

1

k!(ex − 1)k.

Proof: For n ≥ 1, S(n, 1) = 1. And∑n≥1 1 · xn

n!= 1

1!(ex − 1)1. So the

theorem is true for k = 1.The induction hypothesis is that for 1 ≤ t < k, Ft(x) = 1

t!(ex − 1)t. then

F ′k(x) = kFk(x) + Fk−1(x) impliesF ′k(x) = kFk(x) + 1

(k−1)!(ex − 1)k−1.

[xk]Fk(x) = [xk]∑

n≥k S(n, k)xn

n!

= S(k,k)

k!= 1

k!.

Put Gk(x) = 1k!

(ex − 1)k. Then [xk]Gk(x) = 1k!

and

G′k(x) = 1

(k−1)!(ex − 1)k−1 · ex. Also kGk(x) + Gk−1(x) = k

k!(ex − 1)k +


1(k−1)!

(ex − 1)k−1 = 1(k−1)!

(ex − 1k−1[ex − 1 + 1] = G′k(x). This is enough

to guarantee that Fk(x) = Gk(x).

Let n be a positive integer. For each k-tuple (b1, . . . , bk) of nonnegativeintegers for which b1 + 2b2 + · · · + kbk = n, we count how many ways thereare to partition an n-set into b1 parts of size 1, b2 parts of size 2, . . . , bk partsof size k. Imagine the elements of the n-set are to be placed in n positions.The positions are grouped from left to right in bunches. The first b1 buncheshave one position each; the second group of b2 bunches have b2 blocks of size2, etc. There are n! ways to order the integers in the positions. Within eachgrouping of k positions there are k! ways to permute the integers within thosepositions. So we divide by (1!)b1(2!)b2 · · · (k!)bk . But the groups of the samecardinality can be permuted without affecting the partition. So we divide byb1!b2! · · · bk!. Hence the number of partitions is:

n!

b1! · · · bk!(1!)b1 · · · (k!)bk.

Now suppose that each j-set can have nj “structures” of type N on it. Soeach partition gives (n1)

b1 · · · (nk)bk configurations. Hence the total numberof such configurations is

n!

b1! · · · bk!·(n1

1!

)b1· · ·

(nkk!

)bk.

It follows that the number of configurations on an n-set is

an =∑ n!

b1! · · · bk!·(n1

1!

)b1· · ·

(nkk!

)bk,

where the sum is over k-tuples (b1, . . . , bk) with b1+2b2+· · ·+kbk = n, bi ≥ 0;k ≥ 0. Among the tuples (b1, . . . , bk) for which b1 + 2b2 + · · · + kbk = n, welump together those for which b1+ · · ·+bk is a constant, say b1+ · · ·+bk = m,m = 0, 1, . . .. If we let A(x) =

∑∞n=0 an

xn

n!, we see that the coefficient on xn

equals

∑ 1

b1! · · · bk!

(n1

1!

)b1· · ·

(nkk!

)bk,


where the sum is as above, but we think of it as coming in parts, “part m”being the sum of those terms with b1 + b2 + · · ·+ bk = m.

Put N(x) =∑∞i=1 ni

xi

i!. What does N(x)m

m!contribute to the coefficient of

xn in∑∞m=0

N(x)m

m!? In expanding 1

m!N(x)N(x) · · ·N(x) (m factors), choose

terms of degree i, bi times, 1 ≤ i ≤ n. There are

(m

b1, . . . , bk

)ways to

choose terms of degree i, bi times, where b1 + · · ·+ bk = m. This gives a termof degree 1b1 +2b2 + · · ·+ kbk. So the contribution to the term of degree n is

∑ 1

m!

(m

b1, . . . , bk

)(n1x

1!

)b1· · ·

(nkx

k

k!

)bk=

=∑ 1

b1! · · · bk!

(n1

1!

)b1· · ·

(nkk!

)bkxb1+2b2+···+kbk .

The sum is over all k ≥ 0, and over all (b1, . . . , bk) with bi ≥ 0,∑ibi = n,∑

bi = m. Now sum over all m. (Of course the contribution is zero unlessm ≤ n.) It is clear that A(x) =

∑∞n=0 an

xn

n!= exp(N(x)), and we have proved

the following theorem.

Theorem 4.13.2 If the compound structure S(N) is obtained by splitting aset into parts, each of which then gets a structure of type N , and if a k-setgets nk structures of type N , so N(x) =

∑∞k=1 nk

xk

k!, and there are

(nk

)ways

of selecting a k-set, then

S(N)(x) = exp(N(x)).

(Keep in mind that S(x) =∑

1 · xk

k!, since there is only 1 way to impose

the structure of set on a set.)

Example 8. If we substitute the structure “oriented cycle” into the uni-form structure (set), the we are considering the compound structure consist-ing of a partition of an n-set into oriented cycles, i.e., the structure Π withπ0 = 1. So we must have Π(x) = exp(C(x)). Indeed, above we determinedthat Π(x) = (1− x)−1 and C(x) = −log(1− x).

Exercise: 4.13.3 A directed tree with all edges pointing toward one vertexcalled the root is called an arborescence. Let T (x) =

∑∞n=1 tn

xn

n!, where tn

4.14. DIRICHLET SERIES - THE FORMAL THEORY 155

is the number of labeled trees on n vertices. And let A(x) =∑∞n=1 an

xn

n!,

where an is the number of arborescences on n vertices. Since a labeled treeon n vertices can be rooted in n ways and turned into an arborescence, andthe process is reversible, clearly an = ntn, i.e., A(x) = xT ′(x). Consider alabeled tree on n+ 1 vertices as an arborescence with vertex n+ 1 as its root.Then delete the root and all incident edges. The result is a “rooted forest”on n vertices, with the roots of the individual trees being exactly the verticesthat were originally adjacent to the root n + 1. If F (x) =

∑∞n=1 fn

xn

n!, where

fn is the number of rooted forests on n vertices (and f0 = 1 by convention),then by Theorem 4.13.2, exp(A(x)) = F (x). Hence we have

exp(A(x)) =∞∑n=0

fnxn

n!=

∞∑n=0

tn+1xn

n!= T ′(x) = x−1A(x).

Use the special case of Lagrange Inversion to find cn if A(x) =∑∞n=1 cnx

n =∑∞n=1 an

xn

n!, and complete another proof of Cayley’s Theorem.

4.14 Dirichlet Series - The Formal Theory

In this brief section we just introduce the notion of Dirichlet Series.

Def.n Given a sequence an∞, the formal series

f(s) =∞∑n=1

anns

is the Dirichlet series generating function (Dsgf) of the given sequence:

f(s)Dsfg↔ an.

Suppose A(s) =∑ an

ns and B(s) =∑ bn

ns . The Dirichlet convolutionproduct is

A(s)B(s) =∞∑

m,n=1

annsbmms

=∞∑n=1

∑d|nadbn/d

1

ns.

Rule 1′′ A(s)B(s)Dsfg↔

∑d|n adbn/d

∞n=1

.

Rule 2′′ A(s)kDsfg↔

∑(n1,...,nk):n1···nk=n an1an2 · · · ank

∞n=1

.

A most famous example is given by the Riemann zeta function


ζ(s) =∞∑n=1

1

nsDsfg↔ 1∞n=1.

Theorem 4.14.1 Let f be a multiplicative arithmetic function. Then

(i) L(f, s) =∑∞n=1

f(n)ns =

∏p prime

(∑∞i=0

f(pi)pis

).

(ii) If f is completely multiplicative, then

L(f, s) =∏p prime

(1− f(p)

ps

)−1.

Proof: If the unique factorization of n is n = pe11 · · · perr , then there is a

unique term in the product that looks like∏ri=1

f(peii )

(peii )s = f(n)

ns .

Since U defined by U(n) = 1 is completely multiplicative, we may write

ζ(s) =∑ 1

ns =∏p prime

(1− 1

ps

)−1=∏

p prime (1− p−s)−1

. Hence

ζ(s)−1 =∏

p prime

(1− p−s

). (4.44)

L(µ, s) =∞∑n=1

µ(n)

ns=∏

p

( ∞∑i=0

µ(pi)

pis

)=∏

p

(1− p−s

)= ζ−1(s). (4.45)

In other words,

1

ζ(s)

Dsfg↔ µ(n)∞n=1 . (4.46)

In the present context we give another proof of the usual Mobius InversionFormula.

Theorem 4.14.2 Let F and f be arithmetic functions. Then

F (n) =∑d|nf(d) for all n ∈ N iff f(n) =

∑d|nF (d)µ(n/d) =

=∑d|nµ(d)F (n/d), for all n ∈ N .

4.14. DIRICHLET SERIES - THE FORMAL THEORY 157

Proof: suppose F (s)Dsfg↔ F (n), f(s)

Dsfg↔ f(n). Then

F (s) = f(s) · ζ(s) iff F (s)(ζ(s))−1 = f(s).

So F = f ∗ ζ iff f = F ∗ ζ−1.

Recall the following commonly used multiplictive arithmetic functions inthis context.

I(n) =

1, n = 1;0, n > 1.

So L(I, s) =∑ I(n)

ns = 1 is the multiplicative identity.

U(n) = 1 for all n ∈ N . So L(U, s) = ζ(s).

E(n) = n for all n ∈ N . So L(E, s) =∑ nns =

∑ 1ns−1 = ζ(s− 1).

τ(n) =∑d|n 1, so τ = U ∗ U is multiplicative, and

ζ2(s) =∑n(∑d|n 1 · 1) 1

ns =∑nτ(n)ns .

σ(n) =∑d|n d =

∑d|nE(d)U(n/d) = (E ∗ U)(n). Hence

σ = E ∗ U is multiplicative.

Since µ = U−1, E = σ ∗ µ, wich says n =∑d|n µ(d)σ(n/d). And

ζ(s) · ζ(s− 1) = (∑ 1

ns) · (

∑ n

ns) =

∑n

∑k|n

1 · nk

1

ns=∑ σ(n)

ns.

Similarly,

ζ(s) · ζ(s− q) =∑ 1

ns·∑ nq

ns=∑n

∑d|n

1 ·(n

d

)q 1

ns=∑n

∑d|n d

q

ns.

We give some more examples.


Example 4.14.3

ζ ′(s) =∞∑n=1

(1

ns

)′=∑ −(ns)′

n2s=−nslog(n)

n2s=

−∑ log(n)

ns⇒ ζ ′(s)

Dsfg↔ −log(n).

Example 4.14.4 The familiar identity

∑d|nφ(d) = n

says that φ ∗ U = E, from which we see φ = µ ∗ E, i.e.,

φ(n) =∑d|nµ(d) · n

d=∑d|nd · µ

(n

d

),

which is the same thing as:

ζ(s− 1)

ζ(s)= L(φ, s).

Example 4.14.5 Put f(n) = |µ(n)| for all n ∈ N . Clearly f is multiplica-tive. So

∑ f(n)

ns=∏

p

(1 +

f(p)

ps+f(p2)

p2s+ · · ·

)=∏

p

(1 +

1

ps

)=

∏p

(1− 1

p2s

)(1− 1

ps

) =∏

p

(1− 1

p2s

)·∏

p

1

1− 1ps

.

Also,

4.15. RATIONAL GENERATING FUNCTIONS 159

(ζ(2s))−1 =∏

p

(1− 1

p2s

).

Hence,

∑ |µ(n)|ns

=ζ(s)

ζ(2s).

Example 4.14.6

1 = ζ(s) · 1

ζ(s)=(∑ 1

ns

)(∑ µ(n)

ns

)=

∑n

∑d|n

1 · µ(n

d

) · 1

ns⇒∑d|nµ(d) =

1, n = 1;0, n > 1.

Example 4.14.7

ζ(s) =1

ζ(s)· ζ2(s) ⇒ 1 =

∑d|nµ(d) · τ

(n

d

).

This also follows from doing Mobius inversion on τ(n) =∑d|n 1.

4.15 Rational Generating Functions

In this section we consider the simplest general class of generating functions,namely, the rational generating functions in one variable, and their connec-tion with homogeneous linear recursions. These are generating functions ofthe form

U(x) =∑n≥0

unxn

for which there are p(x), q(x) ∈ C[x] with


U(x) =p(x)

q(x).

Here we assume q(0) 6= 0, so q(x)−1 exists in C[[x]]. Before considering theconnection between rational generating functions and homogeneous linearrecursions, we recall the notion of reverse of a polynomial.

Let f(x) = an + an−1x+ an−2x2 + · · ·+ a0x

n ∈ C[x]. The reverse f(x) off(x) is defined by

f(x) = xnf(1

x) = a0 + a1x+ · · ·+ anx

n.

If n0 is the multiplicity of 0 as a zero of f(x), i.e., an = an−1 = · · · =an−n0+1 = 0, but an−n0 6= 0, and if w1, . . . , wq are the nonzero roots of

f(x) = 0, then 1w1, . . . , 1

wqare the roots of f(x) = 0, and f(x) = a0(1 −

w1x) · · · (1− wqx). So deg(f(x)) = n− n0.Alternatively, if f(x) = (x−α1)

m1 · · · (x−αs)ms , where m1 + · · ·+ms = nand α1, . . . , αs are distinct, then

f(x) = xnf(1

x) = (1− α1x)

m1 · · · (1− αsx)ms .

If a0 · an 6= 0, so neither f(x) nor f(x) has x = 0 as a zero, thenˆf = f ,

and f(α) = 0 if and only if f(α−1) = 0.

Suppose that U(x) =∑n≥0 unx

n = p(x)q(x)

, where deg(p(x)) < deg(q(x)),

is a rational generating function. We assume q(0) 6= 0 in order that q(x)−1

exist in C[[x]], so we may assume without loss of generality that q(0) = 1.Hence q(x) = 1 + a1 + a2x

2 + · · ·+ akxk, p(x) = p0 + p1x+ · · ·+ pdx

d, d < k.From this it follows that

p0 + · · · pdxd = (1 + a1x+ · · ·+ akxk)(u0 + u1x

1 + · · ·+ unxn + · · ·).

The right hand side of this equality expands to

u0 + (u1 + a1u0)x+ (u2 + a1u1 + a2u0)x2 + · · ·

+(uk−1 + a1uk−2 + · · ·+ ak−1u0)xk−1.

And for n ≥ k,


un + a1un−1 + · · ·+ akun−k = 0, (4.47)

which is the coefficient on xn. If u0, . . . , uk−1 are given, then un is determinedrecursively for n ≥ k.

Put f(x) = q(x). Then for the complex number α, it is easily checkedthat f(α) = 0 if and only if un = αn is a solution of the recurrence of Eq. 4.47.

The polynomial f(x) is the auxiliary polynomial of the recurrence ofEq. 4.47.

Theorem 4.15.1 If U(n) = p(x)q(x)

, where deg(p(x)) < deg(q(x)), is a ratio-

nal generating function, then the sequence un∞n=0 where U(x) =∑n≥0 un,

satisfies a homogeneous linear recurrence, and the denominator q(x) is thereverse of the auxiliary polynomial of the corresponding recurrence.

Now take the converse point of view.Let c0, c1, . . . , ck−1 be given complex constants, and let a1, . . . , ak also be

given. Let U = un, n ≥ 0 be the unique sequence determined by thefollowing initial conditions and homogeneous linear recursion:

u0 = c0, u1 = c1, . . . , uk−1 = ck−1,[HLR]

un+k + a1un+k−1 + a2un+k−2 + · · ·+ akun = 0, n ≥ 0.

Theorem 4.15.2 The ordinary generating function for the sequenceun de-fined by [HLR] is

U(x) =∞∑n=0

unxn = R(x)/(1 + a1x+ · · ·+ akx

k),

where R(x) is a polynomial with degree less than k.

Proof:Consider the product:

(1 + a1x+ · · ·+ akxk)(u0 + u1x+ · · ·).

The coefficient on xn+k is

un+k + a1un+k−1 + a2un+k−2 + · · ·+ akun.


And this equals 0 for n ≥ 0 by [HLR], so the only coefficients that are possiblynonzero in the product are those on 1, x, . . . , xk−1.

Note that the coefficients of R(x) may be obtained from multiplying outthe two factors (just as we did above):

R(x) = u0 + (u1 + a1u0)x+ (u2 + a1u1 + a2u0)x2 + · · ·

+(uk−1 + a1uk−2 + · · ·+ ak−1u0)xk−1.

As u0, . . . , uk−1 are given by the initial conditions, R(x) is determined.

Theorem 4.15.3 Suppose (un) is given by [HLR] and the auxiliary polyno-mial has the form

f(t) = (t− α1)m1 · · · (t− αs)

ms .

Then

un =s∑i=1

Pi(n)αni ,

where Pi is a polynomial with degree at most mi − 1, 1 ≤ i ≤ s.

Proof: By the theory of partial fractions, U(x) can be written as the sumof s expressions of the form:

(∗) γ1/(1− αx) + γ2/(1− αx)2 + · · ·+ γm/(1− αx)m,

where in each such expression α = αi, m = mi, for some i in the range1 ≤ i ≤ s.

Recall:

(1− x)−n =∞∑k=0

(n+ k − 1

k

)xk.

So the coefficient of xk in (*) is

γ1

(1 + k − 1

k

)αk + γ2

(2 + k − 1

k

)αk + · · ·+ γm

(m+ k − 1

k

)αk

=

[γ1

(k0

)+ γ2

(k + 1

1

)+ · · ·+ γm

(m+ k − 1m− 1

)]αk = P (k)αk.


The formula

(k + ll

)= (k + l)(k + l − 1) · · · (k + 1)/l(l − 1) · · · 1

shows that

(k + ll

)is a polynomial in k with degree l. Hence P (k) is a

polynomial in k with degree at most m− 1. The theorem follows.

In practice we assume the form of the result for un and obtain the co-efficients of the polynomials Pi(n) by substituting in the initial values ofu0, u1, . . . , uk−1 and solving k equations in k unknowns.

Example 4.15.4 The Fibonacci Sequence again Put F0 = F1 = 1 andFn+2−Fn+1−Fn = 0 for n ≥ 0. So the auxiliary equation is 0 = f(t) = t2−t−1 = (t−α1)(t−α2), where α1 = 1+

√5

2, α2 = 1−

√5

2. Put F (x) =

∑n≥0 Fnx

n,

and compute F (x)(1 − x − x2) = 1, so F (x) = 11−x−x2

ops↔ Fn∞n=0. Then

F (x) = A1−α1x

+ B1−α2x

= 11−x−x2 leads to

F (x) =2

5−√

5

∑i

(α1)ixi +

2

5 +√

5

∑i

(α2)ixi.

Hence

Fn = [xn]F (x) =2(1 +

√5)n

(5−√

5)2n+

2(1−√

5)n

(5 +√

5)2n.

Exercise: 4.15.5 Let un∞n=0 be the sequence satisfying the recurrence

un+4 = 2un+3 − 2un+1 + un, n ≥ 0,

and satisfying the initial conditions

u0 = −1, u1 = +1, u2 = 0, u3 = 1.

Find a formula for un. Also find the generating function for the sequenceun∞n=0.


4.16 More Practice with Generating Func-

tions

Theorem 4.16.1 [yj]

11−x−xy

=∑k

(kj

)xk = xj

(1−x)j+1 .

Proof: For j ≥ 0, put gj(x) =∑k

(kj

)xk. Note that g0(x) = 1

1−x . We

claim gj+1(x) = x1−xgj(x), for j ≥ 0. For j ≥ 1, xgj−1(x) + xgj(x) =∑

k≥j−1

(kj−1

)xk+1+

∑k≥j

(kj

)xk+1 =

(j−1j−1

)xj+

∑k≥j

(k+1j

)xk+1 = xj+

∑k≥j+1

(kj

)xk =∑

k≥j

(kj

)xk = gj(x). Hence for j ≥ 1, gj(x) = x

1−xgj−1(x). Now put

H(x, y) =∑∞j=0 gj(x)y

j. Then∑j≥1 gj(x)y

j = x1−x

∑j≥1 gj−1(x)y

j, implying

H(x, y)−g0(x) = xy1−x

∑j≥0 gj(x)y

j = xy1−xH(x, y). HenceH(x, y)

(1− xy

1−x

)=

g0(x) = 11−x , and thus

H(x, y) =1

1− x− xy.

This forces gj(x) =∑k

(kj

)xk = [yj] H(x, y) = [yj]

1

1−x−xy

= [yj]

1

1−x

1−( x1−x)y

=

[yj]

11−x

∑i

(x

1−x

)iyi

= xj

(1−x)j+1 .

Theorem 4.16.2∑k

(k

n−k

)xn−k =

∑k

(n−kk

)xk = [yn]

1

1−y−xy2.

Proof: For n ≥ 0, put fn(x) =∑k

(n−kk

)xk (0 ≤ k ≤ n

2). We claim that

fn+2(x) = xfn(x) + fn+1(x). For,

x∑

0≤k≤n2

(n− k

k

)xk +

∑0≤k≤n+1

2

(n+ 1− k

k

)xk =

=∑

0≤k≤n2

(n− k

k

)xk+1 +

(n+ 1

0

)+

∑1≤k≤n+1

2

(n+ 1− k

k

)xk =

=∑

1≤t≤n+12

(n− t+ 1

t− 1

)xt(n+ 2

0

)+

∑1≤t≤n+1

2

(n+ 1− t

t

)xt

=

(n+ 2

0

)+

∑1≤t≤n+1

2

(n+ 2− t

t

)xt(n−

(n+2

2

)+ 1

n2

)x

n+22

4.16. MORE PRACTICE WITH GENERATING FUNCTIONS 165

=∑

0≤t≤n+22

(n+ 2− t

t

)xt = fn+2(x).

Note that f0(x) = 1; f1(x) = 1; f2(x) = 1 + x.Put G(x, y) =

∑∞n=0 fn(x)y

n. Multiply the recursion just established forfn(x) by yn+2, n ≥ 0, and sum over n.

∞∑n=0

xfn(x)yn+2 +

∞∑n=0

fn+1(x)yn+2 =

∞∑n=0

fn+2(x)yn+2

⇒ xy2G(x, y) + y (G(x, y)− f0(x)) = G(x, y)− f0(x)− f1(x)y

⇒ G(x, y) = [xy2 + y − 1] = y(1− 1)− 1 = −1.

⇒ G(x, y) =1

1− y − xy2.

Note that fn(1) =∑k

(n−kk

)= [yn]

1

1−y−y2

= Fn, the nth Fibonaccinumber.

Exercise: 4.16.3 (Ex. 10F, p.77 of van Lint & Wilson) Show that

n∑k=0

(−1)k(

2n− k

k

)22n−2k = 2n+ 1.

Exercise: 4.16.4 Evaluate the sum∑k

(n−kk

)(−1)k.

Exercise: 4.16.5 Evaluate∑k

(n−kk

).

Theorem 4.16.6 Sylvia’s Problem. Establish the identity

k∑j=2n

(−2)j(k

j

)(j − n− 1

n− 1

)=

4n(b k

2c

n

), n ≥ 1

0, n = 0.

(4.48)


Proof: It is clear that the L.H.S. in the desired equality equals 0 when n =0. So assume n ≥ 1 and note that

∑kj=2n(−2)j

(kj

)(j−n−1n−1

)= 4n

∑kj=2n

(kj

)(j−2−(n−1)

n−1

)(−2)j−2n.

So we may restate the desired result as:

k∑j=2n

(k

j

)(j − n− 1

j − 2n

)(−2)j−2n =

(b k

2c

n

), n ≥ 1

0, n = 0.

(4.49)

Put

T ∗(x, y) =∑k,n

(bk

2cn

)xkyn =

∑k

(∑n

(bk

2cn

)yn)xk

=∑k

(1 + y)bk2cxk = (1 + x) + (1 + y)(x2 + x3) + (1 + y)2(x4 + x5) + · · ·

= (1 + x)∞∑i=0

(1 + y)ix2i =1 + x

1− (1 + y)x2.

Note that [y0]T ∗(x, y) = 11−x .

Now put T (x, y) = T ∗(x, y) − 11−x = 1+x

1−(1+y)x2 − 11−x = 1−x2−1+x2+yx2

(1−x)(1−(1+y)x2)=

x2y(1−x)(1−(1+y)x2)

.

So [xkyn]

x2y(1−x)(1−(1+y)x2)

=

(b k

2c

n

), n ≥ 1;

0, n = 0.

Hence T (x, y) = x2y(1−x)(1−(1+y)x2)

is the generating function for the doubly-infinite sequence of terms on the R.H.S. of Eq. 4.48.

Put

S(x, y) =∑k,n,j

(k

j

)(j − n− 1

j − 2n

)(−2)j−2nxkyn.

Then [xkyn] S(x, y) is the desired sum (on the L.H.S. of Eq. 4.49).Hence our task is equivalent to showing that S(x, y) = T (x, y).

Make the invertible substitution (change of variables):(ts

)=

(1 −21 −1

)(jn

), i.e., t = j − 2n, s = j − n, with inverse

j = 2s− t, n = s− t. Hence we have

S(x, y) =∑k,s,t

(k

2s− t

)(s− 1

t

)(−2)txkys−t

4.17. THE TRANSFER MATRIX METHOD 167

=∑s

(∑t

(∑k

(k

2s− t

)xk)ys−t(−2)t

(s− 1

t

))

(now use Theorem 4.16.1)

=∑s

(∑t

(x2s−t

(1− x)2s−t+1

)ys−t(−2)t

(s− 1

t

))

=∑s

∑t

(s− 1

t

)((1− x)(−2)

xy

)t x2sys

(1− x)2s+1

=∑s≥1

(1− 2(1− x)

xy

)s−1( x2y

(1− x)2

)s1

1− x

∑j≥0

(xy − 2(1− x)

xy

)j (x2y

(1− x)2

)j+11

1− x

=x2y

(1− x)3

∑j≥0

((xy − 2(1− x))x

(1− x)2

)j=

x2y

(1− x)3· 1

1− xy−2(1−x))x(1−x)2

=x2y

(1− x)(1− 2x+ x2 − x2y + 2x− 2x2)=

x2y

(1− x)(1− x2 − x2y)

= T (x, y).

4.17 The Transfer Matrix Method

The Transfer Matrix Method, when applicable, is often used to show that agiven sequence has a rational generating function. Sometimes that knowledgehelps one to compute the generating function using other information.

Let A be a p × p matrix over the complex numbers C. Let f(λ) =det(λI −A) = ap−n0λ

n0 + · · ·+ a1λp−1 + λp be the characteristic polynomial

of A with ap−n0 6= 0. So the reverse polynomial f (cf. Section 4.15) is given by

f(λ) = 1+a1λ+· · · ap−n0λp−n0 . Hence det(I−λA) = λpdet

(1λI − A

)= f(λ).

We have essentially proved the following:


Lemma 4.17.1 If f(λ) = det(λI−A), then f(λ) = det(I−λA). Moreover,

if A is invertible, so n0 = 0, thenˆf = f , and f(λ) = det(λI − A) iff

f(λ) = det(I − λA).

For 1 ≤ i, j ≤ p, define the generating function

Fij(A, λ) =∑n≥0

(An)ijλn. (4.50)

Here A0 = I even if A is not invertible.

Theorem 4.17.2 Fij(A, λ) = (−1)i+jdet[(I−λA):j,i]det(I−λA)

.

Proof: Here (B : i, j) denotes the matrix obtained from B by deleting the

ith row and the jth column. Recall that (B−1)ij = (−1)i+jdet(B:j,i)det(B)

. Suppose

that B = I − λA, so B−1 = (I − λA)−1 =∑∞n=0A

nλn, and (−1)i+jdet(B:j,i)det(B)

=

(B−1)ij =∑∞n=0(A

n)ijλn = Fij(A, λ), proving the theorem.

Corollary 4.17.3 Fij is a rational function of λ whose degree is strictly lessthan the multiplicity n0 of 0 as an eigenvalue of A.

Proof: Let f(λ) = det(λI − A) as in the paragraph preceding the state-ment of Lemma 4.17.1, so f(λ) = det(I − λA) has degree p − n0, anddeg(det(I − λA) : j, i)) ≤ p− 1. Hence deg(Fij(A, λ)) ≤ (p− 1)− (p− n0) =n0 − 1 < n0.

Now write q(λ) = det(I−λA) = f(λ). If w1, . . . , wq are the nonzero eigen-

values ofA, then 1w1, . . . , 1

wqare the zeros of q(λ), so q(λ) = a

(λ− 1

w1

)· · ·

(λ− 1

wq

)for some nonzero a. From the definition of q(λ) we see that q(0) = det(I) = 1,so

q(λ) = (−1)qw1 · · ·wq(λ− 1

w1

)· · ·

(λ− 1

wq

). (4.51)

Then after computing the derivative q′(λ) we see easily that

−λq′(λ)

q(λ)= −λ

1

λ− 1w1

+ · · ·+ 1

λ− 1wq

(4.52)

=w1λ

1− w1λ+

w2λ

1− w2λ+

wqλ

1− wqλ


=q∑i=1

∞∑n=1

wni λn =

∞∑n=1

( q∑i=1

wni

)λn =

∞∑n=1

tr(An)λn.

We have proved the following corollary:

Corollary 4.17.4 If q(λ) = det(I − λA), then∑∞n=1 tr(A

n)λn = −λq′(λ)q(λ)

.

Let D = (V,E, φ) be a finite digraph, where V = v1, . . . , vp is the set ofvertices, E is a set of (directed) edges or arcs, and φ : E → V ×V determinesthe edges. If φ(e) = (u, v), then e is an edge from u to v, with initial vertexint(e) = u and final vertex fin(e) = v. If u = v, then e is a loop. A walkΓ in D of length n from u to v is a sequence e1e2 · · · en of n edges such thatint(e1) = u, fin(en) = v, and fin(ei) = int(ei+1) for 1 ≤ i < n. If alsou = v, then Γ is called a closed walk based at u. (Note: If Γ is a closed walk,then eiei+1 · · · ene1 · · · ei−1 is in general a different closed walk.)

Now let w : E → R be a weight function on E (R is some commutativering; usually R = C or R = C[x].) If Γ = e1e2 · · · en is a walk, then the weightof Γ is defined by w(Γ) = w(e1)w(e2) · · ·w(en). Fix i and j, 1 ≤ i, j ≤ p.Put Aij(n) =

∑Γw(Γ), where the sum is over all walks Γ in D of length n

from vi to vj. In particular, Aij(0) = δij. The fundamental problem treatedby the transfer matrix method (TMM) is the evaluation of Aij(n), or at leastthe determination of some generating function for the Aij(n).

Define a p× p matrix A = (Aij) by

Aij =∑e

w(e),

where the sum is over all edges with int(e) = vi and fin(e) = vj. SoAij = Aij(1). A is the adjacency matrix of D with respect to the weightfunction w.

Theorem 4.17.5 Let n ∈ N . Then the (i, j)-entry of An is equal to Aij(n).(By convention, A0 = I even if A is not invertible.)

Proof: (An)ij =∑Aii1Ai1i2 · · ·Ain−1j, where the sum is over all sequences

(i1, . . . , in−1) ∈ [p]n−1. (Here i = i0 and j = in.) The summand is zero unlessthere is a walk e1 · · · en from vi to vj with int(ek) = vik−1

(1 ≤ k ≤ n), andfin(ek) = vik (1 ≤ k ≤ n). If such a walk exists, then the summand is equalto the sum of the weights of all such walks.

We give a special case that occasionally works out in a very satisfyingway. Let CD(n) =

∑Γw(Γ), where the sum is over all closed walks Γ in D of

length n. In this case we have the following.


Corollary 4.17.6∑n≥1CD(n)λn = −λq′(λ)

q(λ), where q(λ) = det(I − λA).

Proof: Clearly CD(1) = tr(A), and by Theorem 4.17.5 we have CD(n) =

tr(An). Hence by Cor 4.17.4 we have∑n≥1CD(n)λn = −λq′(λ)

q(λ).

Often an enumeration problem can be represented as counting the numberof sequences a1a2 · · · an ∈ [p]n of integers 1, . . . , p subject to certain restric-tions on the subsequences aiai+1 that may appear. In this case we form adigraph D with vertices vi = i, 1 ≤ i ≤ p, and put an arc e = (i, j) fromi to j provided the subsequence ij is permitted. So a permitted sequenceai1ai2 · · · ain corresponds to a walk Γ = (i1, i2)(i2, i3) · · · (in−1, in) in D oflength n − 1 from i1 to in. If w(e) = 1 for all edges in D and if A is theadjacency matrix of D with respect to this particular weight function, thenclearly f(n) :=

∑pi,j=1Aij(n−1) is the number of sequences a1a2 · · · an ∈ [p]n

subject to the restrictions used in defining D. Put q(λ) = det(I − λA) andqij(λ) = det((I − λA) : j, i). Then by Theorem 4.17.2

F (λ) :=∑n≥0

f(n+ 1)λn =∑n≥0

p∑i,j=1

Aij(n)

λn (4.53)

=p∑

i,j=1

∑n≥0

Aij(n)λn =p∑

i,j=1

Fij(A, λ) =p∑

i,j=1

(−1)i+jqij(λ)

q(λ).

We state this as a corollary.

Corollary 4.17.7 If w(e) = 1 for all edges in D and f(n) is the numberof sequences a1a2 · · · an ∈ [p]n subject to the restrictions used in defining D,then ∑

n≥0

f(n+ 1)λn =p∑

i,j=1

(−1)i+jqij(λ)

q(λ). (4.54)

We give an easy example that can be checked by other more elementarymeans.

Example 1. Let f(n) be the number of sequences a1a2 · · · an ∈ [3]n withthe property that a1 = an and ai 6= ai+1 for 1 ≤ i ≤ n − 1. Then the

adjacency matrix A for this example is A =

0 1 11 0 11 1 0

.


We apply Cor. 4.17.6. It is easy to check that q(λ) = det(I − λA) =(1 + λ)2(1 − 2λ), and q′(λ) = −6λ(1 + λ). Using partial fractions, etc., wefind that

−λq′(λ)

q(λ)= −3 +

2

1 + λ+

1

1− 2λ

= −3 +∞∑n=0

2(−λ)n +∞∑n=0

2nλn

= −3 +∞∑n=0

(2n + (−1)n2)λn.

Here n = 3 gives 8 − 2 = 6. The six sequences are of the form aba witha and b arbitrary but distinct elements of 1, 2, 3.

Example 2. Let D be the complete (weighted) digraph on two vertices,i.e., p = 2, V = v0, v1, and the weight w(eij) of the edge eij from vi to vj, isthe indeterminate xij, 0 ≤ i, j ≤ 1. A sequence a0a1a2 · · · an of n+ 1 0’s and1’s corresponds to a walk of length n along edges a0a1, a1a2, . . . , an−1an, and

has weight xa0a1xa1a2 · · ·xan−1an . The adjacency matrix is A =

(x00 x01

x10 x11

).

Then (An)ij =∑

Γw(Γ), where the summation is over all walks Γ of lengthn from i to j, 0 ≤ i, j ≤ 1. At this level of generality we are in a position toconsider several different problems.

Problem 2.1 Let f(n) be the number of sequences of n 0’s and 1’s with 11never appearing as a subsequence aiai+1, i.e., x11 = 0. Then as in Cor. 4.17.7

we put x00 = x01 = x10 = 1 and we have∑n≥0 f(n+1)λn =

∑2i,j=1

(−1)i+jqij(λ)

q(λ),

where q(λ) = det

(I − λ

(1 11 0

))= 1−λ−λ2. A quick computation shows

that q11 = 1, q12 = −λ; q21 = −λ; q22 = 1 − λ. Hence∑n≥0 f(n + 1)λn =

2+λ1−λ−λ2 . We recognize that this denominator gives a Fibonacci type sequence.

If we solve 2+λ1−λ−λ2 = b

1−αλ + c1−βλ with α = 1−

√5

2and β = 1+

√5

2for b and c,

we eventually find that

2 + λ

1− λ− λ2=∑n≥0

f(n+ 1)λn

if and only if

f(n+ 1) =

(√5− 2√

5

)(1−

√5

2

)n+

(√5 + 2√

5

)(1 +

√5

2

)n.


Problem 2.2. Find the number of sequences of n + 1 0’s and 1’s with 11never appearing as a subsequence aiai+1, i.e., x11 = 0 as above, but this timeconsider only those sequences starting with a fixed i ∈ 0, 1 and endingwith a fixed j ∈ 0, 1.

For this situation we need to find the ij entry of the nth power of A =(1 11 0

). Here we diagonalize the matrix A to find its powers

An =1

4√

5

(2 −2√

5− 1√

5 + 1

) (1+√

52

)n0

0(

1−√

52

)n( 1 +

√5 2

1−√

5 2

)

=1

4√

5

(1+√

5)n+1

2n−1 − (1−√

5)n+1

2n−1(1+

√5)n

2n−2 − (1−√

5)n

2n−2

(√

5−1)(1+√

5)n+1

2n + (√

5+1)(1−√

5)n+1

2n(√

5−1)(1+√

5)n

2n−1 + (√

5+1)(1−√

5)n

2n−1

.For example the 12 entry of this matrix is the number of sequences of

n+10’s and 1’s with 11 never appearing as a subsequence aiai+1 and startingwith 0 and ending with 1. A little routine computation gives

tr(An) =(1 +

√5)n

2n+

(1−√

5)n

2n.

We could also have used Cor. 4.17.6 and calculated −λq′(λ)q(λ)

= −2 +2−λ

1−λ−λ2 = −2 + 11−αλ + 1

1−βλ . This agrees with the above for n ≥ 1, but

in the proof of Cor. 4.17.6 the term CD(0) is not accounted for.

Problem 2.3 Suppose we still require that two 1’s never appear together,but now we want to count sequences with prescribed numbers of 0’s and 1’s.

Return to the situation where A =

(x00 x01

x10 x11

). Then

∞∑n=0

An = (I − A)−1 =

(1− x00 −x01

−x10 1− x11

)−1

=

(1− x11 x01

x10 1− x00

)(1− x00)(1− x11)− x01x10

=

(1− x11 x01

x10 1− x00

)·

1

(1− x00)(1− x11

· 1

1− x01x10

(1−x00)(1−x11)

=

(1− x11 x01

x10 1− x00

) ∞∑i=0

xi01xi10

(1− x00)i+1(1− x11)i+1


=

(1− x11 x01

x10 1− x00

) ∞∑i=0

xi01xi10

∞∑j=0

∞∑k=0

(i+ j

j

)xj00

(i+ k

k

)xk11.

If we suppose that the pair 11 never appears, so x11 = 0, then xk11 = δk,0.And

∞∑n=0

An =

(1 x01

x10 1− x00

) ∞∑i,j=0

(i+ j

j

)xi01x

i10x

j00.

We now consider what this equation implies for the (i, j) position, 1 ≤i, j ≤ 2.

Case 1. (1,1) position:∑∞n=0(A

n)11 =∑∞i,j=0

(i+jj

)xi01x

i10x

j00. So there

must be(i+jj

)ways of forming walks of length 2i+ j using the edges x01 and

x10 each i times and the edge x00 j times. This corresponds to a sequence oflength 2i + j + 1 with exactly i 1’s (and i + j + 1 0’s), starting and endingwith a 0, and never having two 1’s next to each other. Another way to viewthis is as needing to fill i+ 1 boxes with 0’s (the boxes before and after each1) so that each box has at least one 0. This is easily seen to be the same as

an (i + 1)-composition of i + j + 1, of which there are(i+ji

)=(i+jj

). (See

pages 15-16 of our class notes.)


n)12 =∑∞i,j=0

(i+jj

)xi+1

01 xi10x

j00. Here there

must be(i+jj

)walks of length 2i + j + 1 using the edge x01 i + 1 times, the

edge x10 i times, and the edge x00 j times. This corresponds to a sequenceof length 2i + j + 2 with exactly i + 1 1’s (and i + j + 1 0’s), starting witha 0 and ending with a 1, and never having two 1’s next to each other. It isclear that this kind of sequence is just one from Case 1 with a 1 appendedat the end.

Case 3. (2,1) position: This is the same as Case 2., with the roles of x01

and x10 interchanged, and the 1 appended at the beginning of the sequence.


n)22 =∑∞i,j=0

(i+jj

)xi01x

i10x

j00

−∑∞i,j=0

(i+jj

)xi01x

i10x

j+100

=∞∑

i,j=0

xi01xi10x

j00

(i+ j

j

)−

∞∑i,j=0

xi01xi10x

j+100

(i+ j

j

)

=∞∑i=0

xi01xi10 +

∞∑i=0;j=1

xi01xi10x

j00

[(i+ j

j

)−(i+ j − 1

j − 1

)]


= 1 +∞∑

k,j=1

xi01xi10x

j00

(i+ j − 1

j

),

after some computation. A term xi01xi10x

j00 corresponds to a sequence of

length 2i + j + 1 = n, and n must be at least 3 before anything interestingshows up. Here i + j − 1 = n − 3 − (i − 1) and j = n − 2i − 1 so (n − 3 −(i − 1)) − (n − 2i − 1) = i − 1. Hence, the number of sequences of 0’s and

1’s of length n ≥ 3 and with 11 never appearing is∑

1≤i≤n−12

(n−3−(i−1)

i−1

)=∑

0≤k≤n−32

(n−3−k

k

). We recognize this as Fn−3, the (n−3)th Fibonacci number.

(See Section 4.11.)

Example 3. Let f(n) be the number of sequences a1 . . . an ∈ [3]n suchthat neither 11 nor 23 appear as two consecutive terms aiai+1. Determinef(n) or at least a generating function for f(n).

Solution: Let D be the digraph on V = [3] with an edge (i, j) if and onlyif j is allowed to follow i in the sequence. Also let w(e) = 1 for each edge e

of D. The corresponding adjacency matrix is A =

0 1 11 1 01 1 1

. So f(n) =

∑3i,j=1Aij(n − 1). Put q(λ) = det(I − λA), and qij(λ) = det(I − λA : j, i).

By Theorem 4.17.2, F (λ) :=∑n≥0 f(n + 1)λn =

∑n≥0(

∑3i,j=1A)ij(n)) =∑3

i,j=1(−1)i+jqij(λ)

q(λ). It is easy to work out q(λ) = λ3 − λ2 − 2λ + 1. Then

det[(I−λA)−1] = [det(I−λA)]−1. By Cor. 4.17.3, Fij(A, λ), and hence F (λ)is a rational function of λ of degree less than the multiplicity n0 of 0 as aneigenvalue of A. But q(λ) has degree 3, forcing A to have rank at least 3.But A is 3× 3, so n0 = 0. Since the denominator of F (λ) is q(λ), which hasdegree 3, the numerator of F (λ) has degree at most 2, so is determined by

its values at three points. Note: we need A2 =

2 2 11 2 12 3 2

.

Then

f(1) =3∑

i,j=1

Aij(0) = tr(I) = 3.

f(2) =3∑

i,j=1

Aij(1) = 7.


f(3) =3∑

i,j=1

Aij(2) = 16.

Then for some a0, a1, a2 ∈ Q, F (λ) = a0+a1λ+a2λ2

1−2λ−λ2 =∑n≥0 f(n + 1)λn =

f(1) + f(2)λ+ f(3)λ2 + · · ·, which implies that

(a0 + a1λ+ a2λ2) = (1− 2λ− λ2 + λ3)(3 + 7λ+ 16λ2 + · · ·) = 3 + λ− λ2.

Hence

F (λ) =∑n≥0

f(n+ 1)λn =3 + λ− λ2

1− 2λ− λ2 + λ3,

from which it follows that

∑n≥0

f(n+ 1)λn+1 =3λ+ λ2 − λ3

1− 2λ− λ2 + λ3.

Now add f(0) = 1 to both sides to get

∞∑n=0

f(n)λn =1 + λ

1− 2λ− λ3 + λ3.

The above generating function for f(n) implies that

f(n+ 3) = 2f(n+ 2) + f(n+ 1)− f(n). (4.55)

For a variation on the problem, let g(n) be the number of sequencesa1 · · · an such that neither 11 nor 23 appears as two consecutive terms aiai+1

or as ana1. So g(n) = CD(n), the number of closed walks in D of length n.

So we just need to compute −λq′(λ)q(λ)

= λ(2+2λ−3λ2)1−2λ−λ2+λ3 .

The preceding example is Example 4.7.4 from R. P. Stanley, EnumerativeCombinatorics, Vol. 1., Wadsworth & Brooks/Cole, 1986. See that referencefor further examples of applications of the transfer matrix method.

Exercise: 4.17.8 Let f(n) be the number of sequences a1a2 · · · an ∈ [3]n with[3] = 0, 1, 2 and with the property that a1 = an and 0 and 2 are never nextto each other. Use the transfer matrix method to find a generating functionfor the sequence an∞n=1, and then find a formula for fn.


4.18 A Famous NONLINEAR Recurrence

For n ≥ 3 let un be the number of ways to associate a finite sequencex1, . . . , xn. As a first example,

u3 = |x1(x2x3), (x1x2)x3| = 2.

Similarly, u4 = 5 =

|x1(x2(x3x4)), x1((x2x3)x4), (x1x2)(x3x4), (x1(x2x3))x4, ((x1x2)x3)x4|.

By convention, u1 = u2 = 1.A given associated product always looks like (x1 . . . xr)(xr+1 . . . xn), where

1 ≤ r ≤ n− 1. So un = u1un−1 + u2un−2 + · · ·+ un−1u1, n ≥ 2. Hence

un =n−1∑i=1

uiun−i.

Put f(x) =∑∞n=1 unx

n. Then

(f(x))2 =∞∑n=2

(n−1∑i=1

uiun−i

)xn =

∞∑n=2

unxn = f(x)− x.

It follows that [f(x)]2 − f(x) + x = 0, so f(x) = 12[1 ± (1 − 4x)

12 ]. We

must use the minus sign, since the constant term of f(x) is f(0) = 0. Thisleads to

f(x) =1

2− 1

2(1− 4x)

12 =

1

2− 1

2

∞∑n=0

(12

n

)(−4x)n.

Then a little computation shows that

un = (−1

2)(−1)n−1(1 · 3 · 5 · · · (2n− 3))(−1)n4n ÷ 2n · n! =

(2n− 2)!

n!(n− 1)!=

=1

n

(2(n− 1)

n− 1

)= Cn−1.

These numbers Cn = 1n+1

(2nn

)are the famous Catalan numbers. See

Chapter 14 of Wilson and van Lint for a great deal more about them.

4.19. MACMAHON’S MASTER THEOREM 177

4.19 MacMahon’s Master Theorem

4.19.1 Preliminary Results on Determinants

Theorem 4.19.2 Let R be a commutative ring with 1, and let A be an n×nmatrix over R. The characteristic polynomial of A is given by

f(x) = det(xI − A) =n∑i=0

cixn−i (4.56)

where c0 = 1, and for 1 ≤ i ≤ n, ci =∑det(B), where B ranges over all the

i× i principal submatrices of −A.

Proof: Clearly det(xI − A) is a polynomial of degree n which is monic,i.e., c0 = 1, and and with constant term det(−A) = (−1)ndet(A). Suppose1 ≤ i ≤ n − 1 and consider the coefficient ci of xn−i in the polynomialdet(xI − A). Recall that in general, if D = (dij) is an n × n matrix over acommutative ring with 1, then

det(D) =∑π∈Sn

(−1)sgn(π) · d1,π(1)d2,π(2) · · · dn,π(n).

So to get a term of degree n − i in det(xI − A) =∑π∈Sn

(−1)sgn(π)(xI −A)1,π(1) · · · (xI−A)n,π(n) we first select n− i indices j1, . . . , jn−i, with comple-mentary indices k1, . . . , ki. Then in expanding the product (xI−A)1,π(1) · · · (xI−A)n,π(n) when π fixes j1, . . . , jn−i, we select the term x from the factors(xI−A)j1,j1 , . . . , (xI−A)jn−ijn−i

, and the terms (−A)k1,π(k1), . . . , (−A)ki,π(ki)

otherwise. So if A(k1, . . . , ki) is the principal submatrix of A indexed by rowsand columns k1, . . . , ki, then det(−A(k1, . . . , ki)) is the associated contribu-tion to the coefficient of xn−i. It follows that ci =

∑det(B) where B ranges

over all the principal i× i submatrices of −A.

Suppose the permutation π ∈ Sn consists of k permutation cycles of sizesl1, . . . , lk, respectively, where

∑li = n. Then sgn(π) can be computed by

sgn(π) = (−1)l1−1+l2−1+···lk−1 = (−1)n−k = (−1)n(−1)k.

We record this formally as:

sgn(π) = (−1)n(−1)k if π ∈ Sn is the product of k disjoint cycles. (4.57)


4.19.3 Permutation Digraphs

Let A = (aij) be an n × n matrix over the commutative ring R with 1.Let Dn be the complete digraph of order n with vertices 1, 2, . . . , n, andfor which each ordered pair (i, j) is an arc of Dn. Assign to each arc (i, j)the weight aij to obtain a weighted digraph. The weight of a directed cycleγ : i1 7→ i2 7→ · · · 7→ ik 7→ i1 is defined to be

wt(γ) = −ai1i2 · ai2i3 · · · · · aik−1ik · aiki1 , (4.58)

which is the negative of the product of the weights of its arcs.

Let π ∈ Sn. The permutation digraph D(π) has vertices 1, . . . , n and arcs(i, π(i)), 1 ≤ i ≤ n. So D(π) is a spanning subgraph of Dn. The directedcycles of the graph D(π) are in 1-1 correspondence with the permutationcycles of π. Also, the arc sets of the directed cycles of D(π) partition the setof arcs of D(π).

The weight wt(D(π)) of the permutation digraph D(π) is defined to bethe product of the weights of its directed cycles. Hence if π has k permutationcycles,

wt(D(π)) = (−1)ka1,π(1)a2,π(2) · · · an,π(n). (4.59)

Then using Equations 4.56 and 4.57 we obtain

det(−A) =∑

wt(D(π)), (4.60)

where D(π) ranges over all permutation digraphs of order n.

Fix X ⊆ [n] = 1, 2, . . . , n and let σ ∈ SX . The permutation digraphD(σ) has vertex set X and is a (not necessarily spanning) subgraph of Dn

with weight equal to the product of the weights of its cycles. (If X = ∅, thecorresponding weight is defined to be 1.) If B is the principal submatrix of−A whose rows and columns are the (intersections of the) rows and columnsof −A indexed by X ⊆ [n], then det(B) =

∑σ∈SX

wt(D(σ)). If we put x = 1in Eq. 4.56 we obtain

det(In − A) =∑

σ∈SXX⊆[n]

wt(D(σ)). (4.61)

Let y1, . . . , yn be independent commuting variables over R, and put R∗ =R[y1, . . . , yn]. Replace A in the preceding discussion with AY , where Y is


the diagonal matrix with diagonal entries y1, . . . , yn. So (AY )ij = aijyj. Soif π ∈ Sn has k permutation cycles, D(π) has k directed cycles. And

wt(D(π)) = (−1)ka1,π(1)yπ(1) · · · an,π(n)yπ(n). (4.62)

From a different point of view, letH be the set of all digraphsH of order nfor which each vertex has the same indegree and outdegree, and this commonvalue is either 0 or 1. Then H consists of a number of pairwise disjointdirected cycles, and henced is a permutation digraph on a subset of [n]. Theweight wt(H) of a digraph H ∈ H is defined to be wt(H) = (−1)c(H)×(theproduct of the weights of its arcs), where c(H) is the number of directedcycles of H and the weight of an arc (i, j) of H is wt(i, j) = aijyj. So ifH ∈ H satisfies H = D(π), π ∈ SX , X ⊆ [n], then wt(H) is given byEq. 4.62. Moreover, if wt(H) =

∑H∈Hwt(H), by Eq. 4.61 we have

wt(H) =∑H∈H

wt(H) = det(In − AY ). (4.63)

4.19.4 A Class of General Digraphs

We now consider the set D of general digraphs D on vertices in [n], for whichthe arcs having i as initial vertex are linearly ordered, and such that foreach i, 1 ≤ i ≤ n, there is a nonnegative integer mi such that mi equalsboth the indegree and the outdegree of the vertex i. Recall that a loop on icontributes 1 to both the indegree and the outdegree of i. We still have then× n matrix A = (aij) and the independent indeterminates y1, . . . , yn. If Dis a general digraph, and if (i, j) is the tth arc with i as initial vertex, let atijyjbe the weight of the arc (i, j) (atij is the (i, j) entry of A with a superscript tadjoined.) The weight wt(D) of D is the product of the weights of its arcs.Each D is uniquely identified by wt(D).

Moreover, suppose that the variables y1, · · · yn commute with all the en-tries of A, but do not commute with each other. We show that each D isidentified uniquely by the word in y1, . . . , yn associated with wt(D). As anexample, suppose that

wt(D) = a111y1a

213y3a

313y3a

122y2a

131y1a

231y1a

333y3 (4.64)

= a111a

213a

313a

122a

131a

231a

333y1y3y3y2y1y1y3. (4.65)

Here in D vertex 1 has outdegree 3 and indegree 3, i.e., m1 = 3. Similarly,m2 = 1 and m3 = 3. Notice that the word y1y

23y2y1y1y3 is sufficient to


recreate the digraph D along with the linear order on its arcs. To see this,start with y1y3y3y2y1y1y3 and work from the left. For each j, 1 ≤ j ≤ 3, thenumber of yj appearing in the word is the indegree mj of j. Since m1 = 3,m2 = 1, and m3 = 3, the first 3 arcs have initial vertex 1, the 4th arc hasinitial vertex 2, the last 3 arcs have initial vertex 3.

As another example, consider the word y2y1y2y23y

21, and let D be the

associated digraph. Here m1 = 3, m2 = 2, m3 = 2. It follows that

wt(D) = a112a

211a

312a

123a

223a

131a

231y2y1y2y

23y

21.

Two digraphs D1 and D2 in D are considered the same if and only if foreach i, 1 ≤ i ≤ n, and for each t, 1 ≤ t ≤ mi, the tth arc of D1 having initialvertex i, and the tth arc of D2 having initial vertex i, both have the sameterminal vertex.

Consider the product

∏n

i=1(ai1y1 + · · ·+ ainyn)

mi . (4.66)

Label the factors in each power, say,

(ai1y1 + · · · ainyn)mi =

= (ai1y1 + · · · ainyn)1(ai1y1 + · · · ainyn)2 · · · (ai1y1 + · · · ainyn)mi,

and then write atij in place of aij in the tth factor. Then the product appearsas

(ai1y1 + · · · ainyn)(a2i1y1 + · · · a2

inyn) · · · (amii1 y1 + · · · ami

in yn) (4.67)

Consider the product as i goes from 1 to n of the product in Eq. 4.67.Each summand of the expanded product that involves a word in the y’s usingmj of the y′js, 1 ≤ j ≤ n, corresponds to (i.e., is the weight of) a uniquegeneral digraph in which vertex i has both indegree and outdegree equal tomi. If we remove the superscript t on the element atij and now assume that they’s commute, we see that if B(m1, . . . ,mn) is the coefficient of ym1ym2 · · · ymn

in the product as i goes from 1 to n of the product in Eq. 4.67, then

wt(D) =∑D∈D

wt(D) =∑

(m1,...,mn)≥(0,...,0)

B(m1, . . . ,mn)ym11 · · · ymn

n . (4.68)


To see this, let

D(m1,...,mn) = D ∈ D : mi = outdeg(i) = indegree(i) in D.

Clearly,

wt(D(m1,...,mn)) =∑

D∈D(m1,...,mn)

wt(D) = B(m1, . . . ,mn)ym11 · · · ymn

n .

4.19.5 MacMahon’s Master Theorem for Permutations

Continue with the same use of notation for A and Y .

Theorem 4.19.6 Let A(m1, . . . ,mn) be the coefficient of ym11 ym2

2 · · · ymnn in

the formal inverse det(In − AY )−1 of the polynomial det(In − AY ). LetB(m1, . . . ,mn) be the coefficient of ym1

1 ym22 · · · ymn

n in the product

n∏i=1

(ai1y1 + ai2y2 + · · ·+ ainyn)mi .

Then A(m1, . . . ,mn) = B(m1, . . . ,mn).

Proof: Put G = D × H = (D,H) : D ∈ D, H ∈ H, and define theweight of the pair (D,H) by wt(D,H) = wt(D) · wt(H). Then

wt(G) :=∑

(D,H)∈Gwt(D,H) = wt(D) · wt(H).

This implies (by Eqs. 4.63 and 4.68) that

wt(G) =

∑(m1,...,mn)≥(0,...,0)

B(m1, . . . ,mn)ym11 · · · ymn

n

·det(In−AY ). (4.69)

If we can show that wt(G) = 1, we will have proved MacMahon’s MasterTheorem.

Let ∅ denote the digraph on vertices 1, . . . , n, with an empty set of arcs.Then wt(∅, ∅) = 1. We want to define an involution on the set G \ (∅, ∅)which is sign-reversing on weights.

Given a pair (D,H) ∈ (G \ (∅, ∅)), we determine the first vertex u whoseoutdegree in either D or H is positive. Beginning at that vertex u we walk


along the arcs of D, always choosing the topmost arc( arc atij from i with tthe largest available), until one of the following occurs:

(i) We encounter a previously visited vertex (and have thus located adirected cycle γ of D).

(ii) We encounter a vertex which has a positive outdegree in H (and thusis a vertex on a directed cycle δ of H).

We note that if u is a vertex with positive outdegree in H then we areimmediately in case (ii). We also note that cases (i) and (ii) cannot occursimultaneously. If case (i) occurs, we form a new element of G by removoing γform D and putting it in H. If case (ii) occurs, we remove δ from H and putit in D in such a way that each arc of γ is put in front of (in the linear order)those with the same initial vertex. Let (D′, H ′) be the pair obtained in thisway. Then D′ is in D and H ′ is in H, and hence (D′, H ′) is in G. Moreover,since the number of directed cycles in H ′ differs from the number in H byone, it follows that wt(D′, H ′) = −wt(D,H). Define σ(D,H) = (D′, H ′) andnote that σ(D′, H ′) = (D,H). Thus σ is an involution on G \ (∅, ∅) which issign-reversing on weights. It follows that wt(G) = wt(∅, ∅) = 1. Hence theproof is complete.

We give two examples to help the reader be sure that the above proof isunderstood. Let D be the general digraph with arcs

D : a115a

123a

132a

235a

331a

153a

253.

Let X = 2, 3, 4, 6 ⊆ [6]. Let π = (2, 4, 6)(3) ∈ SX , and let H = D(π) :a24a33a46a62. Since the first vertex 1 has positive outdegree in D, we startwalking along arcs in D: first is a1

15. As 5 does not have positive outdegreein H, the next arc is a2

53. As 3 has positive outdegree in H and belongs tothe directed cycle (which is a loop) δ = a33. We put this loop into D as arca4

33, and remove it from H to obtain H ′ = a24a46a62. So σ(D,H) = (D′, H ′).We now check that σ(D′, H ′) = (D,H).

So let D be the same as D′ above, and suppose X = 2, 4, 6 and π =(2, 4, 6). So

(D,H) = (a115a

123a

132a

235a

331a

433a

153a

253, a24a46a62).

We start our walk with a115, moving to a2

53, then to a433. Since 3 is a

repeated vertex, the loop γ = 3 7→ 3 represented by a433 is removed from

D and adjoined to H as the loop a33. We have now obtained the originalelement of G.


When we specialize to n = 2 we obtain the following:

If A =

(a11 a12

a21 a22

), then

det(I − AY )−1 = (1− a11y1 − a22y2 + (a11a22 − a12a21)y1y2)−1 =

∑(m1,m2)≥(0,0)

(∑i

(m1

i

)(m2

m1 − i

)ai11a

m1−i12 am1−i

21 am2−m1+i22

)ym1

1 ym22 . (4.70)

Note: If some aij = 0, then to get a nonzero contribution the power onaij must be zero.

Computing det(I − AY )−1 directly, we get

∞∑k=0

[a11y1 + a22y2 − (a11a22 − a12a21)y1y2]k.

Then computing the coefficient of ym11 ym2

2 in this sum (and writing ∆ inplace of a11a22 − a12a21) we get

∞∑k=0

(k

k −m2, k −m1,m1 +m2 − k

)ak−m2

11 ak−m122 ∆m1+m2−k(−1)m1+m2−k.

(4.71)

This gives a variety of equalities. In particular, suppose each aij = 1.Hence ∆ = 0 so k = m1 +m2 for a nonzero contribution. Then the MasterTheorem yields the familiar equality:

∑i

(m1

i

)(m2

m1 − i

)=

(m1 +m2

m1,m2, 0

)=

(m1 +m2

m1

). (4.72)

Exercise: 4.19.7 Prove that∑k

(k

k−m,k−n,m+n−k

)(−1)m+n−k = 1.

(Hint: Compute the coefficient of am111 a

m222 in the two equations Eq. 4.70

and Eq. 4.71, which must be equal by the Master Theorem.)


4.19.8 Dixon’s Identity as an Application of the Mas-ter Theorem

Problem: Evaluate the sum S =∑nk=0(−1)k

(nk

)3.

Since each summand is the product of three binomial coefficients withupper index n, we are led to consider the expression:(

1− x

y

)n (1− y

z

)n (1− z

x

)n

=∑

0≤i,j,k≤n

(n

i

)(n

j

)(n

k

)(−1)i+j+kxi−kyj−izk−j.

To force the lower indices in the binomial coefficients to be equal, weapply the operator [x0y0z0]. From the above we see that

S = [x0y0z0]

(1− x

y

)n (1− y

z

)n (1− z

x

)n

=∑

0≤i≤n

(n

i

)3

(−1)3i.

We can see directly that this is equal to

= [xnynzn] (y − x)n(z − y)n(x− z)n ,

but the point of this exercise is to get it from the Master Theorem.

Now let A =

0 1 −1−1 0 11 −1 0

and

Y =

x 0 00 y 00 0 z

. A simple calculation shows that

I − AY =

1 −y zx 1 −z−x y 1

, and det(I − AY )−1 =

= (1+xy+yz+ zx)−1 =∑

i,j,k≥0

(−1)i+j+k(i+ j + k

i, j, k

)(xy)i(yz)j(zx)k. (4.73)


MacMahon’s Master Theorem with m1 = m2 = m3 = n applied to I−AYsays that

[xnynzn]det(I − AY )−1

= [xnynzn] (y − z)n(z − x)n(x− y)n , (4.74)

from which we obtain

S = [xnynzn] (y − z)n(z − x)n(x− y)n

= [xnynzn]∑

i,j,k≥0

(−1)i+j+k(i+ j + k

i, j, k

)(xy)i(yz)j(zx)k

=∑

i,j,k≥0

(−1)i+j+k(i+ j + k

i, j, k

)

where the sum is over all (i, j, k) for which i+ j = j + k = k + i = n. Hencei = j = k = n/2, and i, j and k are integers. From this it follows that

S =

(−1)m(3m)!(m!)−3 if n = 2m,

0 otherwise.

Exercise: 4.19.9 Apply the Master Theorem to the matrix B =

0 1 11 0 11 1 0

.

Show that

∑i

(m

i

)3

=∑n

(m+ n

m− 2n, n, n, n

)· 2m−2n.

Show that this is the number of permutations of the letters in the sequencexm1 x

m2 x

m3 such that no letter is placed in a position originally occupied by itself.

Chapter 5

Mobius Inversion on Posets

This chapter deals with locally finite partially ordered sets (posets), theirincidence algebras, and Mobius inversion on these algebras.

5.1 Introduction

Recall first that we have proved the following (see Theorem 1.5.5):

(x)n =n∑k=0

c(n, k)xk, (5.1)

where c(n, k) =

[nk

]is the number of σ ∈ Sn with k cycles. Replacing x

with −x and observing that (−x)k = (−1)k(x)k, we obtained

(x)k =n∑k=0

s(n, k)xk, (5.2)

where s(n, k) = (−1)n−kc(n, k) is a Stirling number of the first kind.

Let∏n be the set of all partitions of the set [n], and S(n, k) the number

of partitions of [n] with exactly k parts. For each function f : [n] → [m], letπf denote the partition of [n] determined by f . For σ ∈ ∏

n, let χσ(m) =|f : [n] → [m] : σ = πf| = |f : [ν(σ)] → [m] : f is one-to-one| = (m)ν(σ),where ν(σ) denotes the number of parts of σ. Given any f : [n] → [m],there is a unique σ ∈ ∏n for which f is one of the maps counted by χσ(m),

203

204 CHAPTER 5. MOBIUS INVERSION ON POSETS

i.e., σ = πf . And mn = |f : [n] → [m]|. So mn =∑σ∈∏

nχσ(m) =∑

σ∈∏

n(m)ν(σ) =

∑nk=0 S(n, k)(m)k for all n ≥ 0.

xn =n∑k=0

S(n, k)(x)k, n ≥ 0, (5.3)

where S(n, k) is a Stirling number of the second kind. If we use the sametrick of replacing x with −x again, we get

xn =n∑k=0

(−1)n−kS(n, k)(x)k. (5.4)

Here we can see that Eq. 5.1 and Eq. 5.4 are “inverses” of each other, andEq. 5.2 and Eq. 5.3 are “inverses” of each other. We proceed to make this alittle more formal.

Let Pn be the set of all polynomials of degree k, 0 ≤ k ≤ n, (along withthe zero polynomial), with coefficients in C. Then Pn is an (n+1)-dimensionalvector space.

B1 = 1, x, x2, . . . , xn,B2 = (x)0 = 1, (x)1, . . . , (x)n

andB3 = (x)0 = 1, (x)2, . . . , (x)n

are three ordered bases of Pn. Recall that if B = v1, . . . , vm and B′ =w1, . . . , wm are two bases of the same vector space over C (or over any fieldK), then there are unique scalars aij, 1 ≤ i, j ≤ m for which wj =

∑mi=1 aijvi

and unique scalars a′ij, 1 ≤ i, j ≤ m for which vj =∑mi=1 a

′ijwi. And the

matrices A = (aij) and A′ = (a′ij) are inverses of each other.

So put:

A = (aij), 0 ≤ i, j ≤ n aij = c(j, i);

B = (bij), 0 ≤ i, j ≤ n bij = s(j, i);

C = (cij), 0 ≤ i, j ≤ n, cij = S(j, i);

D = (dij), 0 ≤ i, j ≤ n, dij = (−1)j−iS(j, i).

5.2. POSETS 205

Then A and D are inverses of each other, and B and C are inverses. So

n∑k=0

S(j, k)s(k, i) =n∑k=0

bikckj = (BC)ij = δij, (5.5)

n∑k=0

(−1)j−kS(j, k)c(k, i) =n∑k=0

aikdkj = (AD)ij = δij. (5.6)

We want to see Eq. 5.5 expressed in the context of “Mobius inversion overa finite partially ordered set.” Also, when two matrices, such as A and Dabove, are recognized as being inverses of each other, b = aA iff a = bD.

Consider a second example. Let A, B, C be three subsets of a universalset E. Then |E \ (A∪B∪C)| = |E|− (|A|+ |B|+ |C|)+(|A∩B|+ |A∩C|+|B ∩ C|) − |A ∩ B ∩ C|. This is a very special case of the general principleof inclusion - exclusion that we met much earlier and which we now want toview as Mobius inversion over a certain finite partially ordered set.

As a third example, recall “Mobius inversion” as we studied it earlier:f(n) =

∑d|n g(d) for all n ∈ N iff g(n) =

∑d|n µ(d)f(n/d) for all n ∈ N ,

where µ is the classical Mobius function of elementary number theory. Thegeneral goal is to introduce the abstract theory of Mobius inversion overfinite posets and look at special applications that yield the above results andmore as special examples of this general theory. As usual, we just scratch thesurface of this broad subject. An interesting observation, however, is thatalthough special examples have been appearing at least since the 1930’s, thegeneral theory has been developed primarily by G.-C. Rota and his students,starting with Rota’s 1964 paper, On the foundations of combinatorial theoryI. Theory of Mobius functions, Z. Wahrsch. Verw. Gebiete 2(1964), 340 –368.

5.2 POSETS

A partially ordered set P (i.e., a poset P ) is a set P together with arelation “ ≤ ” on P for which (P,≤) satisfies the following:


PO1. ≤ is reflexive (x ≤ x for all x ∈ P );

PO2. ≤ is transitive (x ≤ y and y ≤ z ⇒ x ≤ z ∀x, y, z ∈ P );

PO3. ≤ is antisymmetric (x ≤ y and y ≤ x⇒ x = y ∀x, y ∈ P ).

A poset (P,≤) is a chain (or is linearly ordered) provided

PO4. For all x, y ∈ P, either x ≤ y or y ≤ x.

Given a poset (P,≤), an interval of P is a set of the form

[x, y] = z ∈ P : x ≤ z ≤ y,

where x ≤ y. So [x, x] = x, but ∅ is NOT an interval. P is called locallyfinite provided |[x, y]| < ∞ whenver x, y ∈ P , x ≤ y. An element of P iscalled a zero (resp., one) of P and denoted 0 (resp., 1) provided 0 ≤ x forall x ∈ P (resp., x ≤ 1 for all x ∈ P ). Finally, we write x < y provided x ≤ ybut x 6= y.

EXAMPLES OF LOCALLY FINITE POSETS

Example 5.2.1 P = 1, 2, . . . , with the usual linear order. Here P is achain with 0 = 1. For each n ∈ P, let [n] = 1, 2, . . . , n with the usuallinear order.

Example 5.2.2 For each n ∈ N , Bn consists of the subsets of [n] orderedby inclusion (recall that [0] = ∅). So we usually write Bn = 2[n], with S ≤ Tin Bn iff ∅ ⊆ S ⊆ T ⊆ [n].

Example 5.2.3 In general any collection of sets can be ordered by inclu-sion to form a poset. For example, let Ln(q) consist of all subspaces of ann-dimensional vector space Vn(q) over the field F = GF (q), ordered by in-clusion.

Example 5.2.4 Put D = P with ≤ defined by: For i, j ∈ D, i ≤ j iff i|j.For each n ∈ P, let Dn be the interval [1, n] = d : 1 ≤ d ≤ n and d|n. Fori, j ∈ Dn, i ≤ j iff i|j.

5.2. POSETS 207

Example 5.2.5 Let n ∈ P. The set Πn of all partitions of [n] is made intoa poset by defining π ≤ σ (for π, σ ∈ Πn) iff each block of π is contained insome block of σ. In that case we say π is a refinement of σ.

Example 5.2.6 A linear partition λ of [n] is a partition of [n] with alinear order on each block of λ. The blocks themselves are unordered, andν(λ) denotes the number of blocks of λ. Ln is the set of linear partitions of[n] with partial order “ ≤ ” defined by: η ≤ λ, for η, λ ∈ Ln, iff each block ofλ can be obtained by the juxtaposition of blocks of η.

Example 5.2.7 For n ∈ P, let Sn denote the set of permutations of theelements of [n] with the following partial order: given σ, τ ∈ Sn, we sayσ ≤ τ iff each cycle of σ (written with smallest element first) is composed ofa string of consecutive integers from some cycle of τ (also written with thesmallest element first).

For example, (12)(3) ≤ (123), (1)(23) ≤ (123), but (13)(2) 6≤ (123). The0 of Sn is 0 = (1)(2) · · · (n). As an example, for σ = (12435) ∈ S5, we givethe Hasse diagram of the interval [0, σ]. Note, for example, that (12)(435) isnot in the interval since it would appear as (12)(354).


h (124)(35)

h(12)(35)(4) h(124)(3)(5)

h (12)(3)(4)(5)

h0 = (1)(2)(3)(4)(5)

h(1)(2)(35)(4)

h(1)(24)(3)(5)

h(1243)(5)

h(12435)

h(1)(24)(35)

h(1)(243)(5)

h(1)(2435)

HHHHHH

HHHHHH

HHH

BBBBBBBBBBB

B

BB

BB

BB

BB

BB

HHHHH

HHHHHH

HHHH

""

""

""

""

""

""

""

""

"""PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP

@@

@@

@@

@@

@@

@

B

BB

BB

BB

BB

BBJ

JJ

JJ

JJ

JJ

JJ

PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP

Hasse diagram of interval [0, σ] in Sn, σ = (12435)

5.3 Vector Spaces and Algebras

Let K be any field (but K = C is the usual choice for us). Let P be any(nonempty) set. The standard way to make KP = f : P → K into avector space over K is to define vector addition by: (f + g)(p) = f(p) + g(p)for all p ∈ P and any f, g ∈ KP . And then scalar multiplication is definedby (af)(p) = a · f(p), for all a ∈ K, f ∈ KP and p ∈ P . The usual axiomsfor a vector space are then easily verified.

5.3. VECTOR SPACES AND ALGEBRAS 209

If V is any vector space over K, V is an algebra over K if there isalso a vector product which is bilinear over K. This means that for eachpair (x, y) of elements of V , there is a product vector xy ∈ V for which thefollowing bilinearity conditions hold:

(1) (x+ y)z = xz + yz and x(y + z) = xy + xz, ∀x, y, z ∈ V ;

(2) a(xy) = (ax)y = x(ay) for all a ∈ K; x, y ∈ V.

In these notes we shall be interested only in finite dimensional (linear)algebras, i.e., algebras in which the vector space is finite dimensional overK. So suppose V has a basis e1, . . . , en as a vector space over K. Theneiej is to be an element of V , so eiej =

∑nk=1 cijkek for unique scalars cijk ∈

K. The n3 elements cijk are called the multiplication constants of thealgebra relative to the chosen basis. They give the value of each producteiej, 1 ≤ i, j ≤ n. Moreover, these products determine every product in V .For suppose x =

∑ni=1 aiei and y =

∑nj=1 bjej are any two elements of V .

Then xy = (∑i aiei)(

∑j bjej) =

∑i,j(aiei)(bjej) =

∑i,j ai(ei(bjej)) = · · · =∑

i,j aibj(eiej), and hence xy is completely determined by all the productseiej. In fact, if we define eiej (any way we please!) to be some vector of V ,1 ≤ i, j ≤ n, and then define xy =

∑ni,j=1 aibj(eiej) for x and y as above,

then it is an easy exercise to show that conditions (1) and (2) hold so thatV with this product is an algebra.

An algebra V over K is said to be associative provided its multiplicationsatisfies the associative law (xy)z = x(yz) for all x, y, z ∈ V .

Theorem 5.3.1 An algebra V over K with finite basis e1, . . . , en as a vectorspace over K is associative iff (eiej)ek = ei(ejek) for 1 ≤ i, j, k ≤ n.

Proof: If V is associative, clearly (eiej)ek = ei(ejek) for all i, j, k =1, . . . , n. Conversely, suppose this holds. Let x =

∑aiei, y =

∑bjej, z =∑

ckek are any three elements of V . Then (xy)z =∑aibjck(eiej)ek and

x(yz) =∑aibjckei(ejek). Hence (xy)z = x(yz) and V is associative.

The algebras we study here are finite dimensional linear associative alge-bras.


5.4 The Incidence Algebra I(P,K)

Let (P,≤) be a locally finite poset, and let Int(P ) denote the set of intervalsof P . Let K be any field. If f ∈ KInt(P ), i.e., f : Int(P ) → K, write f(x, y)for f([x, y]).

Here is an example we will find of interest. P = Dn = d : 1 ≤ d ≤n and d|n. An interval of P is a set of the form [i, j] = k : i|k and k|j,where 1 ≤ i ≤ j ≤ n and i|j, i|n, j|n. Define µn : Int(P ) → C by µn(i, j) =µ( j

i), where µ is the classical Mobius function we studied earlier and [i, j] is

any interval of P .

Def.n The incidence algebra I(P,K) of P over K is the K-algebra of allfunctions f : Int(P ) → K, with the usual structure of a vector space overK, and where the algebra multiplication (called convolution) is defined by

(f ∗ g)(x, y) =∑

z:x≤z≤yf(x, z)g(z, y), (5.7)

for all intervals [x, y] of P and all f, g ∈ I(P,K). The sum in Eq. 5.7 is finite(so f ∗ g really is defined), since P is locally finite.

Theorem 5.4.1 I(P,K) is an associative K-algebra with two-sided identitydenoted δ (or sometimes denoted 1) and defined on intervals [x, y] by

δ(x, y) =

1, if x = y;0, if x 6= y.

Proof:This is a straightforward and worthwhile (but tedious!) exercise. It is

probably easier to establish associativity by showing (f ∗ g) ∗ h = f ∗ (g ∗ h)in general than it is to establish associativity for some specific basis andthen to use Theorem 5.3.1. And to establish that δ ∗ f = f ∗ δ = f forall f ∈ I(P,K) is almost trivial. The bilinearity conditions are also easilyestablished.

Note: It is quite helpful to think of I(P,K) as the set of all formalexpressions

f =∑

[x,y]∈Int(P )

f(x, y)[x, y] (allowing infinite linear combinations).

5.4. THE INCIDENCE ALGEBRA I(P,K) 211

Then convolution is defined by requiring that

[x, y] ∗ [z, w] =

[x,w], if y = z;0, if y 6= z,

for all [x, y], [z, w] ∈ Int(P ), and then extending to all of I(P,K) by bilin-earity. This shows that when Int(P ) is finite, one basis of I(P,K) may beobtained by setting 1[x,y] equal to the function from Int(P ) to K defined by

1[x,y](z, w) =

1, if [z, w] = [x, y];0, if [z, w] 6= [x, y] (but [x, y], [z, w] ∈ Int(P )).

Then the set 1[x,y] : [x, y] ∈ Int(P ) is a basis for I(P,K) and themultiplication constants are given by

1[x,y] ∗ 1[z,w] = δy,z1[x,w], where δy,z =

1, y = z;0, otherwise.

(5.8)

Exercise: 5.4.2 Show that if P is any finite poset, its elements can be labeledas x1, x2, . . . , xn so that xi < xj in P implies that i < j.

Suppose that P is a finite poset, say P = x1, . . . , xn. Let f =∑xi≤xj

f(xi, xj)[xi, xj] ∈KInt(P ). Then define the n× n matrix Mf by

(Mf )i,j =

f(xi, xj), if xi ≤ xj;0, otherwise.

We claim that the map f 7→Mf is an algebra isomorphism from I(P,K)to the algebra of all n × n matrices over K with (i,j) entries equal to zeroif xi 6≤ xj. It is almost obvious that f 7→ Mf is an isomorphism of vectorspaces, but we have to work a little to see that multiplication is preserved.

So suppose f, g ∈ KInt(P ). Using [x, y] ∗ [z, w] = δy,z[x,w] we have

f ∗ g =

∑xi≤xj

f(xi, xj)[xi, xj]

∗ ∑xk≤xl

f(xk, xl)[xk, xl]


=∑

xi≤xj

xk≤xl

f(xi, xj)g(xk, xl)[xi, xj] ∗ [xk, xl]

=∑xi≤xl

∑xj :xi≤xj≤xl

f(xi, xj)g(xj, xl)

[xi, xl].

So

(f ∗ g)([xi, xl]) =∑

xj :xi≤xj≤xl

f(xi, xj)g(xj, xl).

Now with matrices Mf and Mg defined as above:

(Mf ·Mg)i,l =n∑j=1

(Mf )i,j (Mg)j,l =∑

xj :xi≤xj≤xl

(Mf )i,j (Mg)j,l

=

∑xj :xi≤xj≤xl

f(xi, xj)g(xj, xl), if xi ≤ xl;

0, otherwise.

Note that if P is ordered so that xi < xj implies i < j, then the matrices areexactly all upper triangular matrices M = (mij) over K, 1 ≤ i, j ≤ n, withmij = 0 if xi 6≤ xj.

For example, if P has the (Hasse) diagram

hx1

hx3

hx2

hx4

hx5

@@

@@

@@

@@

then I(P,K) is isomorphic to the algebra of all matrices of the form:∗ 0 ∗ 0 ∗0 ∗ ∗ ∗ ∗0 0 ∗ 0 ∗0 0 0 ∗ ∗0 0 0 0 ∗

.

5.4. THE INCIDENCE ALGEBRA I(P,K) 213

Theorem 5.4.3 Let f ∈ I(P,K). Then the following are equivalent:

(i) f has a left inverse;

(ii) f has a right inverse;

(iii) f has a two-sided inverse f−1(which is necessarily the uniqueleft and right inverse of f);

(iv) f(x, x) 6= 0 ∀x ∈ P.

Moreover, if f−1 exists, then f−1(x, y) depends only on theposet [x, y].

Proof: f∗g = δ iff f(x, x)g(x, x) = 1 for all x ∈ P and 0 =∑z:x≤z≤y f(x, z)g(z, y)

whenever x < y, x, y ∈ P . The last equation is equivalent to

g(x, y) = −f(x, x)−1∑

z:x<z≤yf(x, z)(g(z, y))

whenever x < y, x, y ∈ P , and also to

f(x, y) = −g(y, y)−1∑

z:x≤z<yf(x, z)g(z, y).

It follows that if f has a right inverse g then f(x, x) 6= 0 for all x ∈P , and in this case g(x, y) = f−1(x, y) depends only on [x, y]. For theconverse, suppose this condition holds. (Note: If xi < xj implies i < j, sothat Mf is upper triangular, then Mf is invertible iff it has no zero on themain diagonal, which is iff f(x, x) 6= 0 for all x ∈ P . And of course f isinvertible iff Mf is invertible. But we give the general proof.) First defineg(y, y) = f(y, y)−1 for all y ∈ P . Then if [x, y] = x, y with x 6= y, putg(x, y) = −(f(x, x))−1f(x, y)g(y, y). If the maximum length of any chain in[x, y] is 3, say [x, y] = x, z1, . . . , zk, y with x < zi < y and zi 6≤ zj for alli 6= j, put

g(x, y) = −(f(x, x))−1

[k∑i=1

f(x, zi)g(zi, y)

]+ f(x, y)g(y, y)

.


Now suppose that the maximum length of any chain in [x, y] is 4, sayx < w < z < y is a maximal chain in [x, y]. For each u ∈ [x, y] \ x, either[u, y] = y, or [u, y] = u, y, or [u, y] = u,w, y for some w ∈ [x, y]. Inany case, g(u, y) is already defined for all u ∈ [x, y] \ x. So we may define

g(x, y) = − (f(x, x))−1∑

u∈[x,y]\xf(x, u)g(u, y).

Proceed “downward” by induction on the maximum length of chain containedin [x, y]. Since P is finite, this process will terminate in a finite number ofsteps. Clearly if f−1 exists, then f(x, y)−1(x, y) depends only on the poset[x, y].

Similarly, g has a left inverse f iff g(y, y) 6= 0 for all y ∈ P , and in this casef(x, y) = g−1(x, y) depends only on [x, y]. But here we define f(x, y) by anupward induction. Applying this argument to f (instead of g), we see f has aleft inverse h iff f(x, x) 6= 0 for all x ∈ P iff f has a right inverse g. But since∗ is associative, from f ∗g = δ = h∗f , we have g = (h∗f)∗g = h∗(f ∗g) = h.

The zeta function ζ of P is defined by

ζ(x, y) = 1 for all [x, y] ∈ Int(P ). (5.9)

The zeta function is of interest in its own right (we include an optionalpage dealing with ζ), but for us the main interest in ζ is that by Theorem 5.4.3it has an inverse µ called the Mobius function of the poset P .

One can define µ inductively without reference to the incidence algebraI(P,K). Namely, µ ∗ ζ = δ is equivalent to

µ(x, x) = 1 for all x ∈ P,

and

µ(x, y) = −∑

z:x≤z<yµ(x, z) whenever x < y. (5.10)

Similarly, ζ ∗ µ = δ is equivalent to

5.5. OPTIONAL SECTION ON ζ 215

µ(x, x) = 1 for all x ∈ P,

and

µ(x, y) = −∑

z:x<z≤yµ(z, y) whenever x < y. (5.11)

5.5 Optional Section on ζ

Start with ζ(x, y) = 1 for all [x, y] ∈ Int(P ). Then ζ2(x, y) =∑z:x≤z≤y 1 = |[x, y]|, if x ≤ y. More generally, for k ∈ P ,

ζk(x, y) =∑

(x0,...,xk):x=x0≤x1≤···≤xk=y

1

is the number of multichains of length k from x to y.

Theorem 5.5.1 (ζ − δ)(x, y) =

1, x < y;0, x = y.

Proof: Clear.

Hence for k ∈ P, (ζ − δ)k(x, y) is the number of chains x = x0 < x1 <· · · < xk = y of length k from x to y.

Theorem 5.5.2 (2δ−ζ)(x, y) =

1, if x = y;−1, if x < y.

So (2δ−ζ)−1 exists and

(2δ − ζ)−1(x, y) is equal to the total number of chains x = x0 < x1 < · · · <xk = y from x to y.

Proof: Let l be the length of the longest chain in the interval [x, y].Then (ζ − δ)l+1(u, v) = 0 whenever x ≤ u ≤ v ≤ y. Thus for x ≤ u ≤v ≤ y, (2δ − ζ)[1 + (ζ − 1) + (ζ − 1)2 + · · ·+ (ζ − 1)l](u, v) = (1 − (ζ −1))[1 + (ζ − 1) + (ζ − 1)2 + · · ·+ (ζ − 1)l](u, v) = (1 − (ζ − 1)l+1)(u, v) =δ(u, v). Hence (2δ− ζ)−1 = 1+(ζ−1)+ · · ·+(ζ−1)l, when restricted to theinterval [x, y]. But by the definition of l and theorem 5.5.1 it is clear that(1+ (ζ− 1)+ · · ·+(ζ− 1)l)(x, y) is the total number of chains from x to y.


Theorem 5.5.3 Define η : Int(P ) → K by

η(x, y) =

1, if y covers x;0, otherwise.

Then (1− η)−1(x, y) is equal to the total number of maximal chains in [x, y].

Proof: ηk(x, y) =∑

1, where the sum is over all (z0, . . . , zk) where x =z0, y = zk, and zi+1 covers zi, i = 0, . . . , k − 1. So

∑∞k=0 η

k(x, y) is the totalnumber of maximal chains in [x, y]. If l is the length of the longest maximalchain in [x, y], then (1− η)−1 =

∑∞k=0 η

k, which equals∑lk=0 η

k on [x, y].

5.6 The Action of I(P,K) and Mobius Inver-

sion

The Mobius function plays a central role in Mobius inversion, as does theincidence algebra I(P,K). But before we can make this precise we need tosee how I(P,K) acts on the vector space KP = f : P → K. Clearly KP

is a vector space over K in the usual way.For each ξ ∈ I(P,K), ξ acts in two ways as a linear transformation on

KP . On the right ξ acts by

(f · ξ)(x) =∑y≤x

f(y)ξ(y, x) for all x ∈ P and f ∈ KP . (5.12)

In particular (f · δ)(x) =∑y≤x f(y)δ(y, x) = f(x), implying that f · δ = f .

On the left ξ acts by

(ξ · f)(x) =∑y≥x

ξ(x, y)f(y), for all x ∈ P, f ∈ KP . (5.13)

Similarly, δ · f = f .

For these to be actions in the usual sense, it must be true that the fol-lowing hold:

(f · ξ1) · ξ2 = f · (ξ1 ∗ ξ2), for all f ∈ KP ; ξ1, ξ2 ∈ I(P,K), (5.14)

5.6. THE ACTION OF I(P,K) AND MOBIUS INVERSION 217

and

ξ1 · (ξ2 · f) = (ξ1 ∗ ξ2) · f for all f ∈ KP ; ξ1, ξ2 ∈ I(P,K). (5.15)

We verify Eq. 5.14 and leave Eq. 5.15 as a similar exercise. So for eachx ∈ P ,

((f · ξ1) · ξ2)(x) =∑y≤x

(f · ξ1)(y)ξ2(y, x) =∑

w,y:w≤y≤xf(w)ξ1(w, y)ξ2(y, x) =

=∑

w:w≤xf(w)(

∑y:w≤y≤x

ξ1(w, y)ξ2(y, x)) =∑

w:w≤xf(w)(ξ1 ∗ ξ2)(w, x) =

= (f · (ξ1 ∗ ξ2))(x).

Theorem 5.6.1 Mobius Inversion Formula Let P be a poset in whichevery principal order ideal Λx = y ∈ P : y ≤ x is finite. Let f, g : P → K.Then g(x) =

∑y≤x f(y) for all x ∈ P iff f(x) =

∑y≤x g(y)µ(y, x) for all

x ∈ P . Dually, if each principal dual order ideal Vx = y ∈ P : y ≥ xis finite and f, g : P → K, then g(x) =

∑y≥x f(y), for all x ∈ P , iff

f(x) =∑y≥x µ(x, y)g(y) for all x ∈ P .

Proof: The first version of Mobius inversion is just that f · ζ = g ifff = g ·µ. The second is that ζ ·f = g iff f = µ·g. These follow easily from theabove. For example, if f ·ζ = g we have g ·µ = (f ·ζ)·µ = f ·(ζ∗µ) = f ·δ = f .

Example 5.6.2 Consider the chain P = N with the usual linear ordering,so 0 = 0. Here µ(x, x) = 1 for all x ∈ P , and for x < y, µ(x, y) =−∑z:x≤z<y µ(x, z). So if y covers x, µ(x, y) = −1. If y covers z and zcovers x, then µ(x, y) = −(µ(x, x) + µ(x, z)) = −(1 − 1) = 0. If y coversz, z covers w, w covers x, then µ(x, y) = −(µ(x, x) + µ(x,w) + µ(x, z)) =−(1 + (−1) + 0) = 0. By induction,

µ(i, j) =

1, if i = j,−1 if j = i+ 1,0, otherwise.


Then Mobius inversion takes the following form: For f, g : P → K, g(n) =∑ni=0 f(i) for all n ≥ 0 iff f(n) =

∑ni=0 g(i)µ(i, n) = g(n) − g(n − 1). So

the sum operator∑

((∑ ·f)(n) =

∑ni=0 f(i)) and the difference operator 4

((4·f)(n) = f(n)−f(n−1)) are inverses of each other. This may be viewedas a finite difference analogue of the fundamental theorem of calculus.

5.7 Evaluating µ: the Product Theorem

One obstacle to applying Mobius inversion is that even when the Poset (P,≤)is fairly well understood, evaluating the Mobius function µ can be quitedifficult. In this section we prove the Product Theorem and apply it tothree well known posets. One corollary is the famous Principle of Inclusion-Exclusion.

Let P and Q be posets. Then the direct (or Cartesian) product of Pand Q is the poset P ×Q = (x, y) : x ∈ P and y ∈ Q, with (x, y) ≤ (x′, y′)in P ×Q iff x ≤ x′ in P and y ≤ y′ in Q. The direct product P ×P ×· · ·×Pwith n factors is then defined in the natural way and is denoted by P n. Thereare three examples of interest at this point.

Example 5.7.1 For integers m,n ∈ P, let Bm, Bn be the posets of all sub-sets of [m], [n], respectively, ordered by inclusion. Then Bm × Bn ' Bm+n.

If we identify Bm with [2][m] = f : [m] → [2] in the usual way, then we

have [2][m] × [2][n] ' [2]m+n. On the other hand, if 2 = 1, 2 with partialorder defined by 1 ≤ 1 ≤ 2 ≤ 2 (i.e., 2 is the set [2] with the usual linearorder), then B1 ' 2, so Bn ' B1 × · · · × B1 ' 2 × · · · × 2 ' 2n. Hence2m × 2n ' 2m+n.

Example 5.7.2 Recall that for a positive integer k, k denotes [k] with theusual linear order. Then let n1, . . . , nk ∈ N and put P = 1 + n1×· · ·×1 + nk.We may identify the elements of P with the set of k-tuples (a1, . . . , ak) ∈ N k

with 0 ≤ ai ≤ ni ordered componentwise, i.e., (a1, . . . , ak) ≤ (b1, . . . , bk)iff ai ≤ bi for i = 1, 2, . . . , k. Then if this relation holds, the interval[(a1, . . . , ak), (b1, . . . , bk)] is isomorphic to b1 − a1 + 1 × b2 − a2 + 1 × · · · ×bk − ak + 1.

5.7. EVALUATING µ: THE PRODUCT THEOREM 219

A little thought reveals that Example 5.7.2 is a straightforward general-ization of Example 5.7.1.

Example 5.7.3 Recall that∏n is the poset of partitions of [n], where two

partitions σ, π ∈ ∏n satisfy σ ≤ π iff each block of σ is contained in a singleblock of π. Now suppose that π = A1, . . . , Ak and that Ai is partitionedinto λi blocks in σ. Then in

∏n, the intereval [σ, π] is isomorphic to

∏λ1×∏

λ2× · · · ×∏

λk.

Note that∏

2 ' 2 and (∏

2)k ' (2)k. Hence if σ and π are partitions of

[n] for which π = A1, . . . , Ak has k parts, each of which breaks into twoparts in σ, then the interval [σ, π] in

∏n is isomorphic to Bk.

Theorem 5.7.4 (The Product Theorem) Let P and Q be locally finiteposets with Mobius functions µP and µQ, respectively, and let P ×Q be theirdirect product with Mobius function µP×Q. Then whenever (x, y) ≤ (x′, y′)in P ×Q, µP×Q((x, y), (x′, y′)) = µP (x, x′) · µQ(y, y′).

Proof: ∑(u,v):(x,y)≤(u,v)≤(x′,y′)

µP (x, u)µQ(y, v) =

=

∑u:x≤u≤x′

µP (x, u)

∑v:y≤v≤y′

µQ(y, v)

=

= δx,x′ · δy,y′ = δ(x,y),(x′,y′).

Also

∑(u,v):(x,y)≤(u,v)≤(x′,y′)

µP×Q((x, y), (u, v)) = δ(x,y),(x′,y′).

But since∑z:x≤z≤y µ(x, z) = δx,y in some poset with Mobius function µ

determines µ uniquely, it must be that µP×Q = µP · µQ.

Using the product theorem we can say a great deal about the Mobiusfunctions of the three examples above.


Theorem 5.7.5 If µ is the Mobius function of Bn, then µ(S, T ) = (−1)|T−S|

if S ≤ T , for all S, T ∈ Bn.

Proof: A subset S (of [n], say) corresponds to an n-tuple (s1, . . . , sn) witheach si equal to 0 or 1. Similarly, T ↔ (t1, . . . , tn). Then S ≤ T iff si ≤ tifor all i = 1, . . . , n. Then (with a natural abuse of notation) µi(si, ti) = 1or −1 according as si = ti or si 6= ti. And µ(S, T ) =

∏ni=1 µi(si, ti) =∏n

i=1(−1)ti−si = (−1)|T−S|.It follows that the two versions of Mobius inversion for Bn become:

Theorem 5.7.6 Let f, g : Bn → K. Then(i) g(S) =

∑T :T⊆S f(T ) ∀S ∈ Bn

iff f(S) =∑T :T⊆S(−1)|S−T |g(T ) ∀S ∈ Bn;

and

(ii) g(S) =∑T :T⊇S f(T ) ∀S ∈ Bn

iff f(S) =∑T :T⊇S(−1)|T−S|g(T ) ∀S ∈ Bn.

Either of these two statements is called the Principle of Inclusion -Exclusion.

The following is a standard combinatorial situation involving inclusion– exclusion. We are given a set E of objects and a set P = P1, . . . , Pnof properties. Each object in E either does or does not have each of theproperties. (We may actually think of Pi as just being a subset of E butwithout the assumption that Pi 6= Pj when i 6= j.) For each collectionT = Pi1 , . . . , Pik of the properties (i.e. for T ⊆ P ), let f(T ) = |x ∈ E :x ∈ Pi iff Pi ∈ T|. And let g(T ) = |x ∈ E : x ∈ Pi whenever Pi ∈ T|.Then g(T ) =

∑S:S≥T f(S). (This just says that an object x of E has all the

properties in T iff it has precisely the properties in S for some S ⊇ T .) Soby the second version of Mobius inversion, we have

f(T ) =∑

S:S⊇T(−1)|S−T |g(S). (5.16)


In particular, the number of objects x of E having none of the propertiesin P is given by

f(∅) =∑Y⊆P

(−1)|Y |g(Y ) (By convention g(∅) = |E|.) (5.17)

We look at three classical applications.

Application 1. Let A1, . . . , An be subsets of some (finite) set E. Thenby Eq. 5.17 the number of objects of E in none of the Ai is

|E \ (A1 ∪ · · · ∪ An)| = |E| −n∑i=1

|Ai|+∑

1≤i<j≤n|Ai ∩ Aj|+

−∑

1≤i<j<k≤n|Ai ∩ Aj ∩ Ak|+ · · ·+ (−1)n|A1 ∩ A2 ∩ · · ·An|. (5.18)

Application 2 (Derangements) Let E = Sn, the set of all permutationsof the elements of [n]. Let Ai = π ∈ Sn : π(i) = i, i = 1, . . . , n. If T is a setof j of the Ai’s, then g(T ) = |π ∈ Sn : π(i) = i for all Ai ∈ T| = (n − j)!It follows that the number D(n) of derangements in Sn (i.e., π ∈ Sn withπ(i) 6= i for all i, so π ∈ Sn \ (A1 ∪ · · · ∪ An)) is given by

D(n) =∑

T :T⊆A1,...,An(−1)|T |g(T ) =

n∑i=0

(−1)i(ni

)(n− i)! =

= (n!)n∑i=0

(−1)i

i!. (5.19)

This is a special case of a general situation: suppose f(T ) = f(T ′) when-ever |T | = |T ′| (As above, f(T ) is the number of objects of E having a prop-erty Pi if and only if Pi ∈ T .) Then also g(T ) =

∑S:S⊇T f(S) depends only on

|T |. So for each i, 0 ≤ i ≤ n, if |T | = i, let a(n−i) = f(T ) and b(n−i) = g(T ).

Then g(T ) =∑S:S⊇T f(S) becomes b(n − i) =

∑nj=i

(n− ij − i

)a(n − j), or

writing m = n− i, k = j − i, we have


b(m) =m∑k=0

(mk

)a(m− k) =

m∑i=0

(mi

)a(i). (5.20)

And f(T ) =∑S:S⊇T (−1)|S−T |f(S) becomes

a(n− i) =∑nj=i(−1)j−i

(n− ij − i

)b(n− j), which we rewrite as

a(m) =m∑k=0

(−1)k(mk

)b(m− k) =

m∑k=0

(−1)m−k(mk

)b(k). (5.21)

Hence the Mobius inversion formula says:

b(m) =m∑i=0

(mi

)a(i) for 0 ≤ m ≤ n

iff a(m) =m∑i=0

(−1)m−i(mi

)b(i), 0 ≤ m ≤ n. (5.22)

Exercise: 5.7.7 If A is the matrix whose (i, j) entry is

(ji

), 0 ≤ i, j ≤ n,

then A−1 is the matrix whose (i, j) entry is (−1)j−i(ji

). (Hint: Try putting

b(m) = (x+ 1)m and a(m) = xm for 0 ≤ m ≤ n.)

We give an explicit example of the above to illustrate the simple natureof the statememt of the result when viewed in matrix form.

1 1 1 1 10 1 2 3 40 0 1 3 60 0 0 1 40 0 0 0 1

−1

=

1 −1 1 −1 10 1 −2 3 −40 0 1 −3 60 0 0 1 −40 0 0 0 1

.

Application 3 (Euler’s phi-function φ again). Let n ∈ P and supposep1, . . . , pk are the distinct prime divisors of n. E = [n]. Ai = x ∈ E : pi|x,


i = 1, . . . , k. First note that |Ai1 ∩ Ai2 ∩ · · · ∩ Aik | = npi1

···pik. Then the

principle of inclusion-exclusion gives

φ(n) = n−∑

1≤i≤k

n

pi+

∑1≤i<j≤k

n

pipj− · · ·+ (−1)K

n

p1 · · · pk= n

k∏i=1

(1− 1

pi

).

It is easy to show that this agrees with our formula developed quite sometime ago.

Note:

1−∑ 1

pi+∑ 1

pipj− · · ·+ (−1)k

1

p1 · · · pk=∑d|n

µ(d)

d,

where µ is the classical Mobius function. So φ(n) =∑d|n µ(d)n

d. Now using

classical Mobius inversion (in reverse), n =∑d|n φ(d), a familiar equality.

We now propose to illustrate the connection between classical Mobiusinversion and our new version over posets. Recall Example 5.7.2 from thebeginning of this section: P = (n1 + 1)×· · ·×(nk + 1), as well as the Mobiusfunction for chains as worked out in Section 5.6. By the product theorem, if[(a1, . . . , ak), (b1, . . . , bk)] is an interval in P ,

µ((a1, . . . , ak), (b1, . . . , bk)) =

(−1)

∑(bi−ai), if each bi − ai = 0 or 1;

0, otherwise.(5.23)

Now suppose n is a positive integer of the form n = pn11 · · · pnk

k , wherep1, . . . , pk are distinct primes. LetDn be the poset of positive integral divisorsof n, ordered by division (i.e., i ≤ j in Dn iff i|j.) But we identify P abovewith Dn according to the following scheme: (a1, . . . , ak) ∈ P correspondsto pa1

1 · · · pakk in Dn. (Here it is convenient to let the elements of nk + 1

be 0, 1, . . . , nk.) Then Eq. 5.23, when interpreted for Dn, becomes: Forr, s ∈ Dn,

µ(r, s) =

(−1)t, if s/r is a product of t distinct primes,0, otherwise.

(5.24)


In other words, µ(r, s) is just the classical Mobius function µ(s/r). Thenour new Mobius inversion formula in Dn looks like:

g(m) =∑d|m

f(d), ∀m|n, iff f(m) =∑d|m

g(d)µ(d,m), ∀m|n.

Writing µ(md) in place of µ(d,m) gives:

g(m) =∑d|m

f(d) ∀m|n iff f(m) =∑d|m

µ(m

d

)g(d) ∀m|n.

As n is arbitrary, this is just the classical Mobius inversion formula.At this point we have seen that the classical Principle of Inclusion-Exclusion

is just Mobius inversion over Bn and the classical Mobius inversion formulais just Mobius inversion over Dn.

Exercise: 5.7.8 The Mobius Function of the Poset∏n, n ≥ 1. Recall that

to make∏n into a poset, for σ, π ∈ ∏n, we defined σ ≤ π iff each part of σ

is contained in some part of π. For σ ∈ ∏n, define ν(σ) to be the number ofparts of σ. Example: If σ = 1, 3, 5, 7, 8, 2, 4, 6, then ν(σ) = 3. Thegoal of this exercise is to compute the Mobius function of

∏n. The underlying

field of coefficients is denoted by K. The exercise is broken into ten smallsteps.

Step 1.∏n has a 0 and a 1, with ν(σ) = n iff σ = 0 and ν(σ) = 1 iff

σ = 1.

Step 2. Let σ ≤ π = B1, . . . , Bk ∈∏n. Suppose that Bi is partitioned

into λi blocks in σ. The interval [σ, π] in∏n is isomorphic to the direct prod-

uct∏λ1×∏λ2

×· · ·×∏λk. Illustrate this with π = 1, 2, 3, 4, 5, 6, 7, 8, 9,

σ = 1, 2, 3, 4, 5, 6, 7, 8, 9. As a special case, for σ ∈ ∏n, ,

[σ, 1] ' ∏ν(σ).

Step 3. For each positive integer n, let µn = µ(0, 1), where µ is theMobius function of

∏n. Then using the notation of Step 2, i.e., [σ, π] '∏

λ1×∏

λ2× · · · ×∏

λk, we have µ(σ, π) = µλ1µλ2 · · ·µλk

.

Step 4. Recall (from where ?) that xn =∑nk=0

nk

xk. Then for each

positive integer m,


mn =∑σ∈∏

n

mν(σ).

Define f :∏n → K : π 7→ mν(π), and g :

∏n → K : π 7→ mν(π). Then

g(0) =∑σ≥0 f(σ).

Step 5. For each σ ∈ ∏n, (briefly justify each step)

g(σ) = mν(σ)

=∑π∈∏

ν(σ)mν(π)

=∑π∈∏

n:π≥σm

ν(π)

=∑π∈∏

n:π≥σ f(σ).

For each σ ∈ ∏n, the poset Pσ = π ∈ ∏n : π ≥ σ = [σ, 1] is isomorphicto∏ν(σ). So σ in

∏n is 0 in Pσ. And g(0) =

∑σ≥0 f(σ) stated for Pσ says that

for all σ ∈ ∏n, g(σ) =

∑π≥σ f(π). To this we may apply Mobius inversion

to obtain

f(σ) =∑π≥σ

µ(σ, π)g(π).

Step 6. For each σ ∈ ∏n,

f(σ) =∑

π∈∏

n:π≥σ

µ(σ, π)mν(σ).

In this put σ = 0 to obtain

(m)n = f(0) =∑π∈∏

n

µ(0, π)mν(π) =∑k

∑π∈∏

n:ν(π)=k

µ(0, π)

mk.

As this holds for infinitely many m, we have a polynomial identity:


(x)n =n∑k=0

∑π∈∏

n:ν(π)=k

µ(0, π)

xk.Recall (from where?) that (x)n =

∑nk=0 s(n, k)x

k, where s(n, k) = (−1)n−kc(n, k)is a Stirling number of the first kind. So comparing the coefficients on xk wefind once again that s(n, k) =

∑π∈∏

n:ν(π)=k µ(0, π) = wn−k (which is called

the (n− k)th Whitney number of∏n of the first kind).

Step 7. For each positive integer m,

mn =n∑k=1

∑π∈∏

n:ν(π)=k

µ(0, π)

mk.

Step 8. As polynomials in x we have

xn =n∑k=1

∑π∈∏

n:ν(π)=k

µ(0, π)

xk,and

∑π∈∏

n:ν(π)=k

µ(0, π) =

[nk

](−1)n−k, 1 ≤ k ≤ n.

Step 9. µ(0, 1) = (−1)n−1[(n− 1)!].

Step 10. If π = B1, . . . , Bk and Bi breaks into λi parts in σ, then

µ(σ, π) =∏n

i=1(−1)λi−1[(λi − 1)!].

5.8 More Applications of Mobius Inversion

Consider the following three familiar sequences of polynomials.

(1) The power sequence: xn, n = 0, 1, . . ..

5.8. MORE APPLICATIONS OF MOBIUS INVERSION 227

(2) The falling factorial sequence: (x)n = x(x − 1) · · · (x − n + 1), n =0, 1, . . ..

(3) The rising factorial sequence: (x)n = x(x + 1) · · · (x + n − 1), n =0, 1, . . ..

For n = 0 we have the following conventions: x0 = (x)0 = (x)0 = 1.

Theorem 5.8.1 For m,n ∈ P we have the following:

(i) mn = |f : [n] → [m]| = |[m][n]|.

(ii) (m)n = |f : [n] → [m] : f is one-to-one|.

(iii) (m)n = |f : [n] → [m] : f is a disposition, i.e., for eachd ∈ [m], f−1(d) is assigned a linear order|.

Proof: The first two identities need no further explanation, but the thirdone probably does. A disposition may be visualized as a placing of n distin-guishable flags on m distinguishable flagpoles. The poles are not ordred, butthe flags on each pole are ordered. For the first flag there are m choices offlagpole. For the second flag there are m − 1 choices of pole other than theone flag 1 is on. On that pole there are two choices, giving a total of m + 1choices for flag 2. Similarly, it is easy to see that there is one more choice forflag k + 1 than there was for flag k. Hence the number of ways to assign alln flags is m(m+ 1) · · · (m+ n− 1) = (m)n.

Theorem 5.8.2 For each n ∈ N we have the following:

(i) xn =∑nk=0 S(n, k)(x)k. This is Theorem 1.7.2.

(i)′ (x)n =∑nk=0 s(n, k)x

k. This is Corollary 1.5.6.

(ii) (x)n =∑nk=1

n!k!

(n−1k−1

)(x)k.

(ii)′ (x)n =∑nk=1(−1)n−k n!

k!

(n−1k−1

)(x)k.

(iii) (x)n =∑k c(n, k)x

k. This is Theorem 1.5.5.

(iii)′ xn =∑k(−1)n−kS(n, k)(x)k. This is Corollary 1.7.3.


Proof: The only two parts that need proving are (ii) and (ii)′, and wenow establish (ii).

A linear partition λ is a partition of [n] together with a total order onthe numbers in each part of λ. The parts themselves are unordered. Let Lndenote the collection of all linear partitions of [n] , and let ν(λ) denote thenumber of blocks of λ. Each disposition from [n] to [m] may be thought ofas a pair consisting of a linear partition λ of [n] into k blocks and a one-to-one function g mapping [k] to [m]. Since (m)n counts the total number ofdispositions of [n] and (m)k counts the one-to-one functions from [k] to [m],we have

(m)n =∑λ∈Ln

(m)ν(λ). (5.25)

To obtain the number of linear partitions of [n] into k blocks, note that

there are n!(n−1k−1

)linear partitions of [n] with k ordered blocks. Visualize this

as a placing of k − 1 slashes into the n− 1 interior spaces of a permutation(ordered array) of [n], at most one slash per space. Then divide by k! to getunordered blocks:

n!

k!

(n− 1

k − 1

)= # linear partitions of [n] with k blocks. (5.26)

These numbers n!k!

(n−1k−1

)are called Lah numbers. Then (ii) follows im-

mediately from Eqs. 5.25 and 5.26. Now replacing x with −x interchanges(ii) and (ii)′.

We now do Mobius inversion on each of three carefully chosen posets toexplore the relationship between (a) and (a)′, for a = i, ii, and iii.

Let∏n be the set of all partitions of [n] made into a poset by: for σ, π ∈∏

n, σ ≤ π iff each part of σ is contained in some part of π. In the proof ofEq. 5.3 (which is equivalent to (i)) we obtained the following:

mn =∑σ∈∏

n

(m)ν(σ). (5.27)

Define f :∏n → K by f(π) = (m)ν(π) and define g :

∏n → K by

g(π) = mν(π). Since 0 in∏n is 0 = 1, 2, 3, . . . , n, and ν(σ) = n iff

σ = 0, we have


g(0) = mn =∑σ∈∏

n

(m)ν(σ) =∑σ≥0

f(σ). (5.28)

For each σ ∈ ∏n, the poset Pσ = π ∈ ∏n : π ≥ σ = [σ, 1] is isomorphic to∏ν(σ). So σ ∈ ∏n is 0 in Pσ. And Eq. 5.28 applied to Pσ says:

g(σ) =∑π≥σ

f(π), for all σ ∈∏

n. (5.29)

Apply Mobius inversion to Eq. 5.29 to obtain

f(σ) =∑π≥σ

µ(σ, π)g(σ) =∑π≥σ

µ(σ, π)mν(π). (5.30)

Putting σ = 0 yields

(m)n = f(0) =∑π∈∏

n

µ(0, π)mν(π) =n∑k=1

∑π∈∏

nand ν(π)=k

µ(0, π)

mk.

(5.31)

As this holds for infinitely many m, we have a polynomial identity:

(x)n =n∑k=1

∑π∈∏

nand ν(π)=k

µ(0, π)

xk. (5.32)

Comparing Eq. 5.32 with (i)′ we see that

s(n, k) =∑

π∈∏

nand ν(π)=k

µ(0, π) = wn−k, (5.33)

the (n − k)th Whitney number of∏n of the first kind. This shows that (i)

and (i)′ are related by Mobius inversion on∏n.

Putting k = 1 in Eq. 5.32 (ν(π) = 1 iff π = 1, 2, . . . , n = 1) yields


µ(0, 1) is the coefficient of x in (x)n, which is (−1)n−1(n− 1)! (5.34)

If π has type (a1, . . . , ak), i.e., π has ai parts of size i, then [0, π] ' (∏

1)a1×

(∏

2)a2 × · · · × (

∏k)ak . Hence µ(0, π) =

∏ki=1[(−1)i−1(i− 1)!]

ai . Putting thisin Eq. 5.33 yields a rather strange formula for s(n, k).

For our second example turn to the set Ln of all linear partitions of [n].For η, λ ∈ Ln, say η ≤ λ iff each block of λ can be obtained by juxtapositionof blocks of η. Then Ln is a finite poset.

Fix m ∈ P . Define f : Ln → K by

f(λ) = (m)n(λ) (5.35)

Define g : Ln → K by

g(λ) = (m)ν(λ) (5.36)

Note that for λ ∈ Ln, λ = 0 iff ν(λ) = n. Then Eq. 5.25 implies that

(m)n = g(0) =∑λ≥0

(m)ν(λ) =∑λ≥0

f(λ). (5.37)

Exercise: 5.8.3 Show that Pη = λ ∈ Ln : λ ≥ η is isomorphic to Lν(η).

So Eq. 5.37 generalizes to

g(η) =∑λ≥η

f(λ) for all η ∈ Ln. (5.38)

Then Mobius inversion gives

f(η) =∑λ≥η

g(λ)µ(η, λ) for all η ∈ Ln. (5.39)

Putting η = 0 in Eq. 5.38 yields


(m)n =∑λ∈Ln

µ(0, λ)(m)ν(λ). (5.40)

For each λ ∈ Ln, the interval Bλ = [0, λ] is Boolean. For example, if λ =1, 2, 3, 4, 5, 6, 7, there are 1+3+0 = 4 places to put slashes betweenmembers of one (ordered) part to obtain “lower” linear partitions. So theset whose subsets form the Boolean poset is the set of positions betweenmembers of a same part of λ. And µ(0, λ) must then be (−1)k, where k isthe total number of positions between members of a same part of λ. If λ hasν(λ) parts, then k = n− ν(λ). Hence from Eq. 5.40 we have

(m)n =∑λ∈Ln

µ(0, λ)(m)ν(λ) =

=∑λ∈Ln

(−1)n−ν(λ)(m)ν(λ) =n∑k=1

(−1)n−kn!

k!

(ν − 1

k − 1

)(m)k. (5.41)

This holds for all m ∈ P , so yields a polynomial identity

(x)n =n∑k=1

(−1)n−kn!

k!

(n− 1

k − 1

)(x)k. (5.42)

So Eq. 5.42, which is (ii)′, is related to (ii) by Mobius inversion on Ln.For the third example, make Sn into a poset as follows. Always write a

permutation σ ∈ Sn as a product of disjoint cycles so that in each cycle thesmallest element always comes first (furthest to the left in the cycle). Thengiven σ, τ ∈ Sn, say σ ≤ τ iff each cycle of σ is composed of a string ofconsecutive integers from some cycle of τ . For example, (12)(3) ≤ (123),(1)(23) ≤ (123), but (13)(2) 6≤ (123). See Example 5.2.7 where we gave theHasse diagram of the interval [0, σ].

Equation (iii) of Theorem 5.8.1 can be written as

(m)n =∑σ≥0

mc(σ), where c(σ) is the number of cycles of σ. (5.43)

So c(σ) = n iff σ = 0.


Fix τ = c1c2 · · · ck ∈ Sn (where, of course each cycle cj is written withits smallest element first, and if i < j, the smallest element of ci is less thanthe smallest element of cj). Then σ ≥ τ iff each cycle of σ is made up of thejuxtaposition of some cycles of τ . It follows that Pτ = σ ∈ Sn : σ ≥ τ isisomorphic to Sk. So Eq. 5.43 generalizes to

(m)c(τ) =∑σ≥τ

mc(σ). (5.44)

Define f : Sn → K and g : Sn → K by

f(σ) = (m)c(σ) and g(σ) = mc(σ), for all σ ∈ Sn. (5.45)

Then Eq. 5.44 says

f(τ) =∑σ≥τ

g(σ), for all τ ∈ Sn. (5.46)

So by Mobius inversion we have

g(τ) =∑σ≥τ

µ(τ, σ)f(σ), for all τ ∈ Sn. (5.47)

Putting τ = 0 in Eq. 5.47 gives

mn =∑σ∈Sn

µ(0, σ)(m)c(σ). (5.48)

We now wish to evaluate µ(0, σ). Say σ ∈ Sn is increasing if each of itscycles increases. So if (i1, . . . , is) is a cycle of σ, then i1 < i2 < · · · < is.

Lemma 5.8.4 The Mobius function µ for Sn satisfies the following:

For each σ ∈ Sn, µ(0, σ) =

(−1)n−c(σ), if σ is increasing;0, otherwise.

(5.49)

5.9. LATTICES AND GAUSSIAN COEFFICIENTS 233

Proof: Given σ ∈ Sn, consider the interval [0, σ]. The atoms of [0, σ]correspond to transpositions (ir, ir+1) where (ir, ir+1) is a substring of a cycleof σ and ir < ir+1. Thus if σ is increasing, the atoms of Iσ = [0, σ] correspondto all of the possible n− c(σ) transpositions. In that case Iσ is Boolean, andµ(0, σ) = (−1)n−c(s). So suppose σ is not increasing. Then some cycleof σ has a consecutive pair (· · · , ir, ir+1 · · ·) with ir > ir+1. Form a newpermutation σ∗ from σ by inserting a pair )( of parentheses between ir, ir+1

for every consecutive pair (· · · ir, ir+1 · · ·) of all cycles where ir > ir+1. Thenσ∗ ≥ τ for every atom τ of [0, σ], and σ∗ < σ. It follows that in the upperMobius algebra AV ([0, σ], K), if X is the set of atoms of [0, σ],

∏x∈X

x =∑

t≥x∀x∈Xσt has σσ∗ as a summand. (5.50)

Hence no (nonempty) product of atoms ever equals σ. Then by the dual ofTheorem 5.10.4 (with σ the 1 of [0, σ]), µ(0, σ) = 0.

Now Eqs. 5.48 and 5.49 give

mn =∑

σ∈Sn;σ increasing(−1)n−c(σ)(m)c(σ) =

∑k

(−1)n−k(m)k

∑σ increasing and c(σ)=k

1

. (5.51)

Since the number of increasing permutations with k cycles is easily seento be equal to S(n, k) (the number of partitions of [n] with k blocks), wehave derived (iii)′ from (iii) by Mobius inversion on Sn.

Recapitulation: (i) and (i)′ are related by Mobius inversion on∏n; (ii)

and (ii)′ are related by Mobius inversion on Ln; and (iii) and (iii)′ arerelated by Mobius inversion on Sn.

5.9 Lattices and Gaussian Coefficients

A lattice L is a poset with the property that any finite subset S ⊆ L has ameet (or greatest lower bound), that is, an element b ∈ L for which

1. b ≤ a for all a ∈ S, and


2. if c ≤ a for all a ∈ S, then c ≤ b.

And dually, there is a join (or least upper bound), i.e., an elementb ∈ L for which

1.′ a ≤ b for all a ∈ S, and

2.′ if a ≤ c for all a ∈ S, then b ≤ c.

The meet and join of a two element set S = x, y are denoted, respec-tively, by x ∧ y and x ∨ y. It is easily seen that ∧ and ∨ are commutative,associative, idempotent binary operations. Moreover, if all 2-element subsetshave meets and joins, then any finite subset has a meet and a join.

The lattices we will consider have the property that there are no infinitechains. Such a lattice has a (unique) least element (denoted 0 or 0L), becausethe condition that no infinite chains exist allows us to find a minimal elementm, and any minimal element m is a minimum, since if m 6≤ a, then m ∧ awould be less than m. Similarly, there is a unique largest element 1L (or 1).

For elements a and b of a poset, we say a covers b and write a ·> b,provided a > b but there are no elements c with a > c > b. For example,when U and W are linear subspaces of a vector space, then U ·> W iff U ⊇ Wand dim(U) = dim(W ) + 1. A point of a lattice with 0 is an element thatcovers 0. A copoint of a lattice with 1 is an element covered by 1.

Theorem 5.9.1 (L. Weisner 1935) Let µ be the Mobius function of a finitelattice L. And let a ∈ L with a > 0. Then

∑x:x∨a=1

µ(0, x) = 0.

Proof: Fix a. Put S :=∑x,y∈L µ(0, x)ζ(x, y)ζ(a, y)µ(y, 1). Now compute

S in two different ways.

S =∑x∈L

∑y ≥ xy ≥ a

µ(0, x)µ(y, 1) =∑x

µ(0, x) ·∑y ≥ xy ≥ a

µ(y, 1)

=∑x

µ(0, x)∑

y≥x∨aµ(y, 1)

=∑x

µ(0, x) ·∑

x∨a≤y≤1

µ(y, 1) =∑x

µ(0, x) ·

1 x ∨ a = 1;0, otherwise.


=∑

x:x∨a=1

µ(0, x), which is the sum in the theorem.

Also,

S =∑y≥a

µ(y, 1)∑

0≤x≤yµ(0, x) =

∑y≥a

µ(y, 1) · 0,

since y > 0.

Let Vn(q) denote an n-dimensional vector space over Fq = GF (q). Theterm k-subspace will denote a k-dimensional subspace. It is fairly easy tosee that the poset Ln(q) of all subspaces of Vn(q) is a lattice with 0 = 0and 1 = Vn(q). We begin with some counting.

Exercise: 5.9.2 The number of ordered bases for a k-subspace of Vn(q) is:(qk−1)(qk−q)(qk−q2) · · · (qk−qk−1). How many ordered, linearly independentsubsets of size k are there in Vn(q)?

To obtain a maximal chain (i.e., a chain of size n + 1 containing onesubspace of each possible dimension) in the poset of Ln(q) of all subspacesof Vn(q), we start with the 0-subspace. After we have chosen an i-subspaceUi, 0 ≤ i < n, we can choose an (i + 1)-subspace Ui+1 that contains Ui, inqn−qi

qi+1−qi ways, since we can take the span of Ui and any of the qn − qi vectors

not in Ui. But an (i + 1)-subspace will arise exactly qi+1 − qi times in thismanner. Hence the number of maximal chains of subspaces in Vn(q) is:

M(n, q) =(qn − q0)

(q1 − q0)· (qn − q1)

(q2 − q1)· · · (q

n − qn−1)

(qn − qn−1)=

=(qn − 1)q(qn−1 − 1)q2(qn−2 − 1) · · · qn−1(q − 1)

(q − 1)q(q − 1)q2(q − 1) · · · qn−1(q − 1)=

=(qn − 1)(qn−1 − 1) · · · (q − 1)

(q − 1)n.

This implies that

M(n, q) = (qn−1 + qn−2 + · · ·+ q + 1)(qn−2 + · · · q + 1) · · · (q + 1).


We may consider M(n, q) as a polynomial in q for each integer n. Whenthe indeterminate q is replaced by a prime power, we have the number ofmaximal chains in the poset PG(n, q).

Note: When q is replaced by 1, we have M(n, 1) = n!, which is thenumber of maximal chains in the poset of subsets of an n-set.

The Gaussian number (or Gaussian coefficient)

[nk

]q

can be de-

fined as the number of k-subspaces of Vn(q). This holds for 0 ≤ k ≤ n, where[n0

]q

= 1.

To evaluate

[nk

]q

, count the number N of pairs (U,C) where U is a

k-subspace and C is a maximal chain that contains U . Since every maximalchain contains one subspace of dimension k, clearly N = M(n, q). On theother hand, we get each maximal chain uniquely by appending to a maximalchain in the poset of subspaces of U – of which there are M(k, q) – a maximalchain in the poset of all subspaces of Vn(q) that contain U . There are M(n−k, q) of these, since the poset W : U ⊆ W ⊆ V is isomorphic to the posetof subspaces of V/U , and dim(V/U) = n− k. Hence

M(n, q) =

[nk

]q

·M(k, q) ·M(n− k, q),

which implies that

[nk

]q

=M(n, q)

M(k, q)M(n− k, q)=

[n

n− k

]q

=(qn−1 + qn−2 + · · ·+ q + 1)(qn−2 + qn−3 + · · ·+ q + 1) · · · (q + 1)

(qk−1 + · · ·+ 1) · · · (q + 1)(qn−k−1 + · · ·+ q + 1) · · · (q + 1)

=(qn−1 + · · ·+ 1)(qn−2 + · · ·+ 1) · · · (qn−k + · · ·+ 1)

(qk−1 + · · ·+ 1) · · · (q + 1)

=(qn − 1)(qn−1 − 1) · · · (qn−k+1 − 1)

(qk − 1)(qk−1 − 1) · · · (q − 1).


In fact there is a satisfactory way to generalize the notion and the notationof Gaussian coefficient to the multinomial case. (See the book by R. Stanleyfor this.) However, for our present purposes it suffices to consider just thebinomial case. Define (0)q = 1, and for a positive integer j put (j)q =1 + q + q2 + · · ·+ qj−1. Then put (0)!q = 1 and for a positive integer k, put(k)!q = (1)q(2)q · · · (k)q. So (n)!q = M(n, q). With this notation we have[

nk

]q

=(n)!q

(k)!q(n− k)!q. (5.52)

For some purposes it is better to think of

[nk

]q

as a polynomial in an

indeterminate q rather than as a of function of a prime power q. That

[nk

]q

is a polynomial in q is an easy corollary of the following exercise.

Exercise: 5.9.3 Prove the following recurrence:[nk

]q

=

[n− 1k

]q

+ qn−k[n− 1k − 1

]q

.

Exercise: 5.9.4 Prove the following recurrence:[n+ 1k

]q

=

[n

k − 1

]q

+ qk[nk

]q

.

(Hint: There is a completely elementary proof just using the formulas forthe symbols.)

Note that the relation of the previous exercise reduces to the binomialrecurrence when q = 1. However, unlike the binomial recurrence, it is not‘symmetric’.


[nk

]q

=∑l≥0 αlq

l, where αl is the number of

partitions of l into at most k parts, each of which is at most n− k.


If we regard a Gaussian coefficient

[nk

]q

as a function of the real variable

q (where n and k are fixed integers), then we find that the limit as q goes to1 of a Gaussian coefficient is a binomial coefficient.

Exercise: 5.9.6

limq→1

[nk

]q

=

(n

k

).

Exercise: 5.9.7 (The q-binomial Theorem) Prove that:

(1 + x)(1 + qx) · · · (1 + qn−1x) =n∑i=0

qi(i−1)

2

[ni

]q

xi, for n ≥ 1.

Letting q → 1, we obtain the usual Binomial Theorem.

Exercise: 5.9.8 Prove that:

[n+mk

]q

=k∑i=0

[ni

]q

[mk − i

]q

q(n−i)(k−i).

Define the Gaussian polynomials gn(x) ∈ R[x] as follows: g0(x) =1; gn(x) = (x − 1)(x − q) · · · (x − qn−1) for n > 0. Clearly the Gaussianpolynomials form a basis for R[x] as a vector space over R.

Theorem 5.9.9 The Gaussian coefficients connect the usual monomials tothe Gaussian polynomials, viz.:

(i) xn =∑nk=0

(nk

)(x− 1)k;

(ii) xn =∑nk=0

[nk

]q

gk(x).


Proof: (i) is a special case of the binomial theorem. And (ii) becomes (i)if q = 1. To prove (ii), suppose V,W are vector spaces over F = GF (q) withdim(V ) = n and |W | = r. Here r = qt is any power of q with t ≥ n. Then|HomF (V,W )| = rn.

Now classify f ∈ HomF (V,W ) according to the kernel subspace f−1(0) ⊆V . Given some subspace U ⊆ V , let u1, . . . , uk be an ordered basis ofU and extend it to an ordered basis u1, . . . , uk, uk+1, . . . , un of V . Thenf−1(0) = U iff f(ui) = 0, 1 ≤ i ≤ k, and f(uk+1), . . . , f(un) are linearlyindependent vectors in W . Now

rn =∑U⊆V

(r − 1)(r − q) · · · (r − qn−r(U)−1)

=n∑k=0

[nk

]q

(r − 1)(r − q) · · · (r − qn−k−1)

=n∑k=0

[nk

]q

(r − 1)(r − q) · · · (r − qk−1)

(Use the fact that

[nk

]q

=

[n

n− k

]q

)

=n∑k=0

[nk

]q

gk(r).

As r can be any power of q as long as r ≥ qn, the polynomials xn and∑nk=0

[nk

]q

gk(x) agree on infinitely many values of x and hence must be

identical.

The inverse connection can be obtained from the q-binomial theorem (c.f.Ex. 5.9.7).

Exercise: 5.9.10 Prove that

gn(x) =n∑i=0

[ni

]q

q(n−i2 )(−1)n−ixi.


(Hint: In Ex. 5.9.7 first replace x with −x and then replace q with q−1

and simplify.)

If an∞n=0 is a given sequence of numbers we have considered its ordi-nary generating function

∑n≥0 anx

n and its exponential generating function∑n≥0 an

xn

n!. (Also considered in Chapter 4 was the Dirichlet generating series

function.) There is a vast theory of Eulerian generating series functions de-fined by

∑n≥0 an

xn

n!q. (See the book by R. Stanley for an introduction to this

subject with several references.) The next exercise shows that two specificEulerian generating functions are inverses of each other.

Exercise: 5.9.11

(∑k≥0

(−t)kq(k2)

k!q

)(∑k≥0

tk

k!q

)= 1.

(Hint: Compute the coefficient on tn separately for n = 0 and n ≥ 1.Then use the q-binomial theorem with x = −1.)

Exercise: 5.9.12 Gauss inversion: Let ui∞i=0 and vi∞i=0 be two sequencesof real numbers. Then

vn =n∑i=0

[ni

]q

ui (n ≥ 0) ⇔ un =n∑i=0

(−1)n−iq(n−i2 )[ni

]q

vi (n ≥ 0).

(Hint: Use Exercise 5.9.11.)

See the book by R. P. Stanley and that by Goulden and Jackson for agreat deal more on the subject of q-binomial (and q-multinomial) coefficients.

We are now going to compute the Mobius function of the lattice Ln(q).

Theorem 5.9.13 The Mobius function of the lattice Ln(q) of subspaces ofa vector space of dimension n over the Galois field F = GF (q) is given by

µ(U,W ) =

(−1)kq(

k2), if U ⊆ W and k = dim(W )− dim(U);

0, if U 6⊆ W.


Proof: The idea is to use Weisner’s theorem on the interval [U,W ] viewedas isomorphic to the lattice of subspaces of the quotient space W/U . Thismeans that we need only compute µ(0, 1), where V = 1 is a space of dimen-sion n and 0 = 0. If V has dimension 1, then L1(q) is a chain with two

elements and µ(0, 1) = −1 = (−1)1 · q(12). Now suppose n = 2. Let a be a

point. By Weisner’s theorem

µ(0, 1) = −∑

p : p ∨ a = 1

p 6= 1

µ(0, p) = |p : p ∨ a = 1 and p 6= 1|

= q = (−1)2q(22).

Now suppose that our induction hypothesis is that

µ(0, V ) = (−1)kq(k2) if k = dim(V ) < n.

Let p cover 1. By Weisner’s Theorem,

µ(0, V ) = −∑

U : U ∨ p = VU 6= V

µ(0, U).

The subspaces U such that U ∨ p = V and U 6= V are those of dimensionn− 1 (i.e., hyperplanes) that do not contain p. The number of hyperplanes

on p is the number of points on a hyperplane, which is

[n− 1

1

]q

, so that

the number of hyperplanes not containing p is

[n1

]q

−[n− 1

1

]q

= qn−1.

So if dim(V ) = n, then µ(0, V ) = (−1)qn−1(−1)n−1q(n−1

2 ) = (−1)nq(n2), after

a little simplification.

As an application we count the number of linear transformations froman n-dimensional vector space Y onto an m-dimensional vector space V overF = GF (q). Clearly we must have n ≥ m if this number is to be nonzero.However, we do not make this assumption.


Theorem 5.9.14 If Y and V are vector spaces over F with dim(Y ) = nand dim(V ) = m, then

|T ∈ Hom(Y, V ) : T (Y ) = V | =m∑k=0

(−1)m−k[mk

]q

qnk+(m−k2 ).

Proof: For each subspace U of V , let f(U) = |T ∈ Hom(Y, V ) : T (Y ) =U|, and let g(U) = |T ∈ Hom(Y, V ) : T (Y ) ⊆ U|. Then g(U) = qnr ifdim(U) = r, and clearly g(U) =

∑W :W⊆U f(W ). By Mobius inversion we

have f(U) =∑W :W⊆U µ(W,U)qn·dim(W ). If U = V , by our formula for the

Mobius function on Ln(q) we have

f(V ) =∑W

µ(W,V )qn·dim(W ) =m∑k=0

(−1)m−kq(m−k

2 )[mk

]q

qnk,

which finishes the proof.

Corollary 5.9.15 The number of n×m matrices over F = GF (q) with rankr is [

mr

]q

r∑k=0

(−1)r−k[rk

]q

qnk+(r−k2 ).

Corollary 5.9.16 The number of invertible n× n matrices over GF (q) is

n∑k=0

(−1)n−k[nk

]qnk+(n−k

2 ).

Remark: There are gn(qm) = (qm − 1)(qm − q) · · · (qm − qn−1) injective

linear transformations from Vn to Vm. If m = n, then “injective” is equivalentto “onto.” Hence

gn(qn) = (qn − 1)(qn − q) · · · (qn − qn−1) =

n∑k=0

(−1)n−k[nk

]q

qnk+(n−k2 ).

Exercise: 5.9.17 It is possible to define the Gaussian coefficients for anypositive integer q, not just the prime powers. Read the following article:John Konvalina, A Unified Interpretation of the Binomial Coefficients, theStirling Numbers, and the Gaussian Coefficients, Amer. Math. Monthly, 107(2000), 901 – 910.

5.10. POSETS WITH FINITE ORDER IDEALS 243

5.10 Posets with Finite Order Ideals

Let P be a poset for which each order ideal Λx = y ∈ P : y ≤ x is finite,and let µ be the Mobius function of P . If K is any field, the (lower) Mobiusalgebra AΛ(P,K) is the algebra obtained by starting with the vector spaceKP and defining a (bilinear) multiplication on basis elements x, y ∈ P by

x · y =∑

s:s≤x and s≤y

∑t∈[s,x]∩[s,y]

µ(s, t)

s =

=∑

(s,t):s≤t≤x and s≤t≤y

µ(s, t)s =∑

t∈Λx∩Λy

∑s:s≤t

µ(s, t)s

. (5.53)

So if we put δt =∑s:s≤t µ(s, t)s, we have

x · y =∑

t∈Λx∩Λy

δt (5.54)

If we fix z ∈ P ,∑g:g≤z δt =

∑(s,t):s≤t≤z µ(s, t)s =∑

s:s≤z

(∑t:s≤t≤z µ(s, t)

)s =

∑s:s≤z δs,z · s = z. Hence:

z =∑t∈Λz

δt, and x · x =∑t∈Λx

δt = x. (5.55)

Moreover,

x · y = y iff y ≤ x. (5.56)

Note: In the above we are thinking of f ∈ KP as a formal (possiblyinfinite) linear combination of the elements of P : f =

∑x∈P f(x)x. So the

element x of P is identified with the element 1x =∑y∈P (δx,y)y = x. Then

the above discussion shows that δt : t ∈ P (as well as x : x ∈ P) is abasis for AΛ(P,K).

Let A′Λ(P,K) be the abstract algebra∏x∈P Kx with Kx isomorphic to

K. So A′Λ(P,K) is K |P | with direct product operations. Let δ′x be theidentity of Kx, so δ′x · δ′y = δx,y · δ′x. Then define a linear transformationθ : AΛ(P,K) → A′Λ(P,K) by θ(δx) = δ′x, and extend by linearity.


Theorem 5.10.1 θ is an algebra isomorphism.

Proof: For each x ∈ P , put x′ =∑y≤x δ

′y ∈ A′Λ(P,K). As θ is clearly a

vector space isomorphism with θ(x) = θ(∑

y≤x δy)

=∑y≤x δ

′y = x′, it suffices

to show that θ(x · y) = θ(x)θ(y) for all x, y ∈ P . So, θ(x · y) =

∑t:t≤x and t≤y

δ′t =∑

t≤x and s≤y

δ′tδ′s =

∑t≤x

δ′t

∑t≤y

δ′t

= x′ · y′ = θ(x)θ(y).

As a simple corollary we have the following otherwise not so obviousresult.

Theorem 5.10.2 If P is finite, δt : t ∈ P is a complete set of orthogonalidempotents, and

∑t∈P δt = δ (= 1).

(Note: In the notation for lattices, x ·y = z iff Λx∩Λy = Λz iff z = x∧y.)

Theorem 5.10.3 Let P be finite with |P | ≥ 2. Let a, x ∈ P , a 6= x. Onthe one hand,

a · δx = a ·∑t≤x

µ(t, x)t =∑t≤x

µ(t, x)(a · t) =∑d∈P

∑t:t≤x and a·t=d

µ(t, x)

d.

On the other hand,

a · δx =∑t:t≤a

δt · δx =∑t:t≤a

δt,xδx =

δx, if x ≤ a;0, if x 6≤ a.

This has the following consequences:


(i) If x 6≤ a and d ∈ P, then∑t:t≤x and a·t=d µ(t, x) = 0.

For example, if a 6= 1 ∈ P, then∑t:a·t=d µ(t, 1) = 0.

As a special case,∑t:a·t=0 µ(t, 1) = 0, if P has 0 and 1,

with a 6= 1.

(ii) If x ≤ a,∑d

(∑t:t≤x and a·t=d µ(t, x)

)d = δx =∑

d:d≤x µ(d, x)d. So

(a) If d ≤ x ≤ a, then∑t:t≤x and a·t=d µ(t, x) = µ(d, x),

and(b) If d 6=≤ x ≤ a, then

∑t:t≤x and a·t=d µ(t, x) = 0.

Theorem 5.10.4 Let P be a finite poset with 0 and 1, 0 6= 1. And letX be the set of coatoms of P (i.e., elements covered by 1). Then µ(0, 1) =∑|X|k=1(−1)kNk, where Nk is the number of subsets of X of size k whose product

is 0.

Proof: For any x ∈ X, δ − x = (∑t∈P δt) −

∑t:t≤x δt =

∑t:t6≤x δt. Hence∏

x∈X(1 − x) =∏x∈X

(∑t:t6≤x δt

)= δ1, since if t 6= s, δt · δs = 0 and δ1 is

the only idempotent appearing in all terms of the product. The coefficientof 0 in δ1 =

∑s:s≤1 µ(x, 1)s is µ(0, 1). The coefficient of 0 in

∏x∈X(1 − x) is

exactly∑k(−1)kNk.

If P is a poset for which each dual order ideal Vx = y ∈ P : y ≥ xis finite, we can dualize the construction of the (lower) Mobius algebra anddefine the upper Mobius algebra AV (P,K) which has primitive idempotentsof the form σx =

∑y≥x µ(x, y)y.

Theorem 5.10.5 Let P and Q be finite posets. If φ : P → Q is any map,then φ extends to an algebra homomorphism φ : AV (P,K) → AV (Q,K) iffthe following hold:

(i) φ is order preserving, and

(ii) for any q ∈ Q, the set p ∈ P : φ(p) ≤ q either has a maximum or isempty.(That is, if I is a principal order ideal of Q, then φ−1(I) is principalor empty).


Proof: First suppose that φ extends to a homomorphism. Since x ≤ y iffx·y = y (in AV (P,K)), it must be that x ≤ y iff x·y = y iff φ(x)·φ(y) = φ(y)iff φ(x) ≤ φ(y), so φ is order preserving. Now for a fixed q ∈ Q suppose thatp ∈ P : φ(p) ≤ q 6= ∅ and choose a p ∈ P for which φ(p) ≤ q. Then

∑y≥φ(p)

σy = φ(p) = φ

∑x≥p

σx

∑x≥p

φ(σx). (5.57)

since σy : y ∈ P and σy : y ∈ Q are bases for AV (P,K) andAV (Q,K), respectively, we see from Eq. 5.57 that σq ∈ AV (Q,K) appearsas a summand in φ(σx) for some x ≥ p. Moreover, this x is unique, becauseif σq is a summand in both φ(σx) and φ(σy), then φ(σx) · φ(σy) 6= 0. Butφ(σx) ·φ(σx) = φ(σx) ·σy = φ(0) = 0. We claim that the unique x, x ≥ p, forwhich σq is a summand of φ(σx) is x = maxp ∈ P : φ(p) ≤ q. The aboveargument at least shows that x ≥ p for each p such that φ(p) ≤ q. But as

σq is a summand of φ(σx), it is also a summand of φ(x) = φ(∑

g≥x σt)

=∑t≥x φ(σt). But φ(x) =

∑t≥φ(x) σt having σq as a summand implies that

q ≥ φ(x). We now have: x ≥ p for each p with φ(p) ≤ q and φ(x) ≤ q. Sox = supp ∈ P : φ(p) ≤ q. Hence p ∈ P : φ(p) ≤ q = φ−1(I = Λq) = Λx.

This complete the proof that when φ extends to an algebra homomor-phism both (i) and (ii) hold.

Conversely, suppose both (i) and (ii) hold. Let Q0 = q ∈ Q : q ≥φ(p) for some p ∈ = q ∈ q : φ−1(Λq) 6= ∅. If p ∈ P , then q = φ(p) isautomatically in Q0. And if q ∈ Q0, put ψ(q) = maxp ∈ P : φ(p) ≤ q. SoΛψ(q) = p ∈ P : φ(p) ≤ q.

If q = φ(x), then q = φ(ψ(q)) = φ(ψ(φ(x))) = φ(x). On the other handif q is not in the image of φ, then φ(ψ(q)) = φ(maxx : φ(x) ≤ q) ≤ q.

If φ(p1) = φ(p) with p1 6= p, put

φ∗(σp) =∑

q∈Q:φ(ψ(q))=φ(p)

σq =∑′

σq

where the sum∑′ is over all σq for which q satisfies the following: q ≥ φ(p)and

the largest x with φ(x) ≤ q has φ(x) = φ(p). In this set of q’s, φ(p) is theonly one in the image of φ. The other q’s are only “slightly” larger than φ(p).(This set of q’s is the set q : [φ(p), q] ∩ φ(P ) = φ(p).)


Fix p ∈ P . Then for each q ∈ Q with q ≥ φ(p) there is a unique x in Pfor which x is the largest element of P with φ(x) ≤ q. Necessarily x ≥ p.Similarly, if we fix x, x ≥ p, there is a well-defined set of q’s for which φ(x) isthe largest element of φ(P ) which is less than or equal to q. Hence for p ∈ P ,

φ∗(p) = φ∗

∑x:x≥p

σx

=∑x:x≥p

φ∗(σx) =

=∑

x:x≥p and x=maxy:φ(x)=φ(y)

φ∗(σx)

=∑

x:x≥p and x=maxy:φ(x)=φ(y)

∑q:q≥φ(x) and x=maxt∈P :φ(t)≤q

σq

=

=∑

q:q≥φ(p)

σq = φ(p).

Hence φ∗ is the desired extension of φ to AV (P,K).

Corollary 5.10.6 Let (P,≤) be a finite poset, and let P0 ⊆ P . Then theinjection φ : P0 → P : p 7→ p “extends to ” an algebra homomorphism ofAV (P,K) into AV (P,K) iff the restriction of each principal order ideal of Pto P0 is either empty or principal.

Documents

Applied Combinatorics â€“ Math 6409