Fundamental Principlesof Counting Estewart/c272/0201726343... · 2004-09-03 · Counting E numeration, or counting, may strike one as an obvious process that a student learns

May 12, 2003 11:28 j67-ch01 Sheet number 3 Page number 3 cyan black

3

1FundamentalPrinciples ofCounting

Enumeration, or counting, may strike one as an obvious process that a student learnswhen first studying arithmetic. But then, it seems, very little attention is paid to further

development in counting as the student turns to “more difficult” areas in mathematics, suchas algebra, geometry, trigonometry, and calculus. Consequently, this first chapter shouldprovide some warning about the seriousness and difficulty of “mere” counting.

Enumeration does not end with arithmetic. It also has applications in such areas as codingtheory, probability and statistics, and in the analysis of algorithms. Later chapters will offersome specific examples of these applications.

As we enter this fascinating field of mathematics, we shall come upon many problems thatare very simple to state but somewhat “sticky” to solve. Thus, be sure to learn and understandthe basic formulas — but do not rely on them too heavily. For without an analysis of eachproblem, a mere knowledge of formulas is next to useless. Instead, welcome the challengeto solve unusual problems or those that are different from problems you have encounteredin the past. Seek solutions based on your own scrutiny, regardless of whether it reproduceswhat the author provides. There are often several ways to solve a given problem.

1.1The Rules of Sum and Product

Our study of discrete and combinatorial mathematics begins with two basic principles ofcounting: the rules of sum and product. The statements and initial applications of theserules appear quite simple. In analyzing more complicated problems, one is often able tobreak down such problems into parts that can be solved using these basic principles. Wewant to develop the ability to “decompose” such problems and piece together our partialsolutions in order to arrive at the final answer. A good way to do this is to analyze and solvemany diverse enumeration problems, taking note of the principles being used. This is theapproach we shall follow here.

Our first principle of counting can be stated as follows:

The Rule of Sum: If a first task can be performed in m ways, while a second task canbe performed in n ways, and the two tasks cannot be performed simultaneously, thenperforming either task can be accomplished in any one of m + n ways.


4 Chapter 1 Fundamental Principles of Counting

Note that when we say that a particular occurrence, such as a first task, can come about in m

ways, these m ways are assumed to be distinct, unless a statement is made to the contrary.This will be true throughout the entire text.

EXAMPLE 1.1 Acollege library has 40 textbooks on sociology and 50 textbooks dealing with anthropology.By the rule of sum, a student at this college can select among 40 + 50 � 90 textbooks inorder to learn more about one or the other of these two subjects.

EXAMPLE 1.2 The rule can be extended beyond two tasks as long as no pair of tasks can occur simultane-ously. For instance, a computer science instructor who has, say, seven different introductorybooks each on C++, Java, and Perl can recommend any one of these 21 books to a studentwho is interested in learning a first programming language.

EXAMPLE 1.3 The computer science instructor of Example 1.2 has two colleagues. One of these col-leagues has three textbooks on the analysis of algorithms, and the other has five suchtextbooks. If n denotes the maximum number of different books on this topic that thisinstructor can borrow from them, then 5 ≤ n ≤ 8, for here both colleagues may own copiesof the same textbook(s).

The following example introduces our second principle of counting.

EXAMPLE 1.4 In trying to reach a decision on plant expansion, an administrator assigns 12 of her employeesto two committees. Committee A consists of five members and is to investigate possiblefavorable results from such an expansion. The other seven employees, committee B, willscrutinize possible unfavorable repercussions. Should the administrator decide to speak tojust one committee member before making her decision, then by the rule of sum there are12 employees she can call upon for input. However, to be a bit more unbiased, she decidesto speak with a member of committee A on Monday, and then with a member of committeeB on Tuesday, before reaching a decision. Using the following principle, we find that shecan select two such employees to speak with in 5 � 7 � 35 ways.

The Rule of Product: If a procedure can be broken down into first and second stages,and if there are m possible outcomes for the first stage and if, for each of these outcomes,there aren possible outcomes for the second stage, then the total procedure can be carriedout, in the designated order, in mn ways.

EXAMPLE 1.5 The drama club of Central University is holding tryouts for a spring play. With six men andeight women auditioning for the leading male and female roles, by the rule of product thedirector can cast his leading couple in 6 � 8 � 48 ways.

EXAMPLE 1.6 Here various extensions of the rule are illustrated by considering the manufacture of licenseplates consisting of two letters followed by four digits.


1.2 Permutations 7

If we make use of the factorial notation, the answer in Example 1.9 can be expressed inthe following more compact form:

10 � 9 � 8 � 7 � 6 � 10 � 9 � 8 � 7 � 6 �5 � 4 � 3 � 2 � 1

5 � 4 � 3 � 2 � 1�

10!5! .

Definition 1.2 Given a collection of n distinct objects, any (linear) arrangement of these objects is calleda permutation of the collection.

Starting with the letters a, b, c, there are six ways to arrange, or permute, all of the letters:abc, acb, bac, bca, cab, cba. If we are interested in arranging only two of the letters at atime, there are six such size-2 permutations: ab, ba, ac, ca, bc, cb.

If there are n distinct objects and r is an integer, with 1 ≤ r ≤ n, then by the rule ofproduct, the number of permutations of size r for the n objects is

P(n, r) � n � (n − 1) � (n − 2) � · · · � (n − r + 1)

1st 2nd 3rd r thposition position position position

� (n)(n − 1)(n − 2) · · · (n − r + 1) �(n − r)(n − r − 1) · · · (3)(2)(1)(n − r)(n − r − 1) · · · (3)(2)(1)

�n!

(n − r)! .

For r � 0, P(n, 0) � 1 � n!/(n − 0)!, so P(n, r) � n!/(n − r)! holds for all 0 ≤ r ≤ n.A special case of this result is Example 1.9, where n � 10, r � 5, and P(10, 5) � 30,240.When permuting all of the n objects in the collection, we have r � n and find that P(n, n) �n!/0! � n!.

Note, for example, that if n ≥ 2, then P(n, 2) � n!/(n − 2)! � n(n − 1). When n > 3one finds that P(n, n − 3) � n!/[n − (n − 3)]! � n!/3! � (n)(n − 1)(n − 2) · · · (5)(4).

The number of permutations of size r , where 0 ≤ r ≤ n, from a collection of n objects,is P(n, r) � n!/(n − r)!. (Remember that P(n, r) counts (linear) arrangements in whichthe objects cannot be repeated.) However, if repetitions are allowed, then by the rule ofproduct there are nr possible arrangements, with r ≥ 0.

EXAMPLE 1.10 The number of permutations of the letters in the word COMPUTER is 8!. If only five of theletters are used, the number of permutations (of size 5) is P(8, 5) � 8!/(8 − 5)! � 8!/3! �6720. If repetitions of letters are allowed, the number of possible 12-letter sequences is812 .

� 6.872 � 1010.†

EXAMPLE 1.11 Unlike Example 1.10, the number of (linear) arrangements of the four letters in BALL is12, not 4! (� 24). The reason is that we do not have four distinct letters to arrange. To getthe 12 arrangements, we can list them as in Table 1.1(a).

†The symbol “.

�” is read “is approximately equal to.”


1.2 Permutations 9

If there are n objects with n1 indistinguishable objects of a first type,n2 indistinguishableobjects of a second type, . . . , and nr indistinguishable objects of an rth type, where

n1 + n2 + · · · + nr � n, then there aren!

n1! n2! · · · nr ! (linear) arrangements of the given

n objects.

EXAMPLE 1.13 The MASSASAUGA is a brown and white venomous snake indigenous to North America.Arranging all of the letters in MASSASAUGA, we find that there are

10!4! 3! 1! 1! 1! � 25,200

possible arrangements. Among these are

7!3! 1! 1! 1! 1! � 840

in which all four A’s are together. To get this last result, we considered all arrangements ofthe seven symbols AAAA (one symbol), S, S, S, M, U, G.

EXAMPLE 1.14 Determine the number of (staircase) paths in the xy-plane from (2, 1) to (7, 4), where eachsuch path is made up of individual steps going one unit to the right (R) or one unit upward(U). The blue lines in Fig. 1.1 show two of these paths.

x

y

4

3

2

1

1 2 3 4 5 6 7

R,U,R,R,U,R,R,U

x

y

4

3

2

1

1 2 3 4 5 6 7

U,R,R,R,U,U,R,R(a) (b)

Figure 1.1

Beneath each path in Fig. 1.1 we have listed the individual steps. For example, in part(a) the list R, U, R, R, U, R, R, U indicates that starting at the point (2, 1), we first moveone unit to the right [to (3, 1)], then one unit upward [to (3, 2)], followed by two units tothe right [to (5, 2)], and so on, until we reach the point (7, 4). The path consists of five R’sfor moves to the right and three U’s for moves upward.

The path in part (b) of the figure is also made up of five R’s and three U’s. In general,the overall trip from (2, 1) to (7, 4) requires 7 − 2 � 5 horizontal moves to the right and4 − 1 � 3 vertical moves upward. Consequently, each path corresponds to a list of fiveR’s and three U’s, and the solution for the number of paths emerges as the number ofarrangements of the five R’s and three U’s, which is 8!/(5! 3!) � 56.


1.3 Combinations: The Binomial Theorem 15

this translates into

(3!) � (Number of selections of size 3 from a deck of 52)

� Number of permutations of size 3 for the 52 cards

� P(52, 3) �52!49! .

Consequently, three cards can be drawn, without replacement, from a standard deck in52!/(3! 49!) � 22,100 ways.

If we start with n distinct objects, each selection, or combination, of r of these objects,with no reference to order, corresponds to r! permutations of size r from the n objects.Thus the number of combinations of size r from a collection of size n is

C(n, r) �P(n, r)

r! �n!

r!(n − r)! , 0 ≤ r ≤ n.

In addition to C(n, r) the symbol(nr

)is also frequently used. Both C(n, r) and

(nr

)are

sometimes read “n choose r .” Note that for all n ≥ 0, C(n, 0) � C(n, n) � 1. Further, forall n ≥ 1, C(n, 1) � C(n, n − 1) � n. When 0 ≤ n < r , then C(n, r) �

(nr

)� 0.

A word to the wise! When dealing with any counting problem, we should ask ourselvesabout the importance of order in the problem. When order is relevant, we think in termsof permutations and arrangements and the rule of product. When order is not relevant,combinations could play a key role in solving the problem.

EXAMPLE 1.18 A hostess is having a dinner party for some members of her charity committee. Becauseof the size of her home, she can invite only 11 of the 20 committee members. Order is notimportant, so she can invite “the lucky 11” in C(20, 11) �

(2011

)� 20!/(11! 9!) � 167,960

ways. However, once the 11 arrive, how she arranges them around her rectangular diningtable is an arrangement problem. Unfortunately, no part of the theory of combinations andpermutations can help our hostess deal with “the offended nine” who were not invited.

EXAMPLE 1.19 Lynn and Patti decide to buy a PowerBall ticket. To win the grand prize for PowerBallone must match five numbers selected from 1 to 49 inclusive and then must also matchthe powerball, an integer from 1 to 42 inclusive. Lynn selects the five numbers (between1 and 49 inclusive). This she can do in

(495

)ways (since matching does not involve order).

Meanwhile Patti selects the powerball — here there are(

421

)possibilities. Consequently, by

the rule of product, Lynn and Patti can select the six numbers for their PowerBall ticket in(495

)(421

)� 80,089,128 ways.

EXAMPLE 1.20 a) A student taking a history examination is directed to answer any seven of 10 essayquestions. There is no concern about order here, so the student can answer the examina-tion in (

10

7

)�

10!7! 3! �

10 � 9 � 8

3 � 2 � 1� 120 ways.


1.4 Combinations with Repetition 27

column (b) of Table 1.6, we are enumerating all arrangements of 10 symbols consistingof seven x’s and three |’s, so by our correspondence the number of different purchases forcolumn (a) is

10!7! 3! �

(10

7

).

In this example we note that the seven x’s (one for each freshman) correspond to the sizeof the selection and that the three bars are needed to separate the 3 + 1 � 4 possible fooditems that can be chosen.

When we wish to select, with repetition, r of n distinct objects, we find (as in Table 1.6)that we are considering all arrangements of r x’s and n − 1 |’s and that their number is

(n + r − 1)!r!(n − 1)! �

(n + r − 1

r

).

Consequently, the number of combinations of n objects taken r at a time, with repetition,is C(n + r − 1, r).

(In Example 1.28, n � 4, r � 7, so it is possible for r to exceed n when repetitions areallowed.)

EXAMPLE 1.29 A donut shop offers 20 kinds of donuts. Assuming that there are at least a dozen of each kindwhen we enter the shop, we can select a dozen donuts inC(20 + 12 − 1, 12) � C(31, 12) �141,120,525 ways. (Here n � 20, r � 12.)

EXAMPLE 1.30 President Helen has four vice presidents: (1) Betty, (2) Goldie, (3) Mary Lou, and (4) Mona.She wishes to distribute among them $1000 in Christmas bonus checks, where each checkwill be written for a multiple of $100.

a) Allowing the situation in which one or more of the vice presidents get nothing,President Helen is making a selection of size 10 (one for each unit of $100) froma collection of size 4 (four vice presidents), with repetition. This can be done inC(4 + 10 − 1, 10) � C(13, 10) � 286 ways.

b) If there are to be no hard feelings, each vice president should receive at least $100. Withthis restriction, President Helen is now faced with making a selection of size 6 (theremaining six units of $100) from the same collection of size 4, and the choices nownumber C(4 + 6 − 1, 6) � C(9, 6) � 84. [For example, here the selection 2, 3, 3, 4,4, 4 is interpreted as follows: Betty does not get anything extra — for there is no 1 inthe selection. The one 2 in the selection indicates that Goldie gets an additional $100.Mary Lou receives an additional $200 ($100 for each of the two 3’s in the selection).Due to the three 4’s, Mona’s bonus check will total $100 + 3($100) � $400.]


1.4 Combinations with Repetition 29

Consequently, by the rule of product the transmitter can send such messages with therequired spacing in (12!)(22

12

) .� 3.097 � 1014 ways.

In the next example an idea is introduced that appears to have more to do with numbertheory than with combinations or arrangements. Nonetheless, the solution of this examplewill turn out to be equivalent to counting combinations with repetitions.

EXAMPLE 1.33 Determine all integer solutions to the equation

x1 + x2 + x3 + x4 � 7, where xi ≥ 0 for all 1 ≤ i ≤ 4.

One solution of the equation is x1 � 3, x2 � 3, x3 � 0, x4 � 1. (This is different from asolution such as x1 � 1, x2 � 0, x3 � 3, x4 � 3, even though the same four integers are beingused.) A possible interpretation for the solution x1 � 3, x2 � 3, x3 � 0, x4 � 1 is that we aredistributing seven pennies (identical objects) among four children (distinct containers), andhere we have given three pennies to each of the first two children, nothing to the third child,and the last penny to the fourth child. Continuing with this interpretation, we see that eachnonnegative integer solution of the equation corresponds to a selection, with repetition, ofsize 7 (the identical pennies) from a collection of size 4 (the distinct children), so there areC(4 + 7 − 1, 7) � 120 solutions.

At this point it is crucial that we recognize the equivalence of the following:

a) The number of integer solutions of the equation

x1 + x2 + · · · + xn � r, xi ≥ 0, 1 ≤ i ≤ n.

b) The number of selections, with repetition, of size r from a collection of size n.

c) The number of ways r identical objects can be distributed among n distinctcontainers.

In terms of distributions, part (c) is valid only when the r objects being distributed areidentical and the n containers are distinct. When both the r objects and the n containersare distinct, we can select any of the n containers for each one of the objects and get nr

distributions by the rule of product.When the objects are distinct but the containers are identical, we shall solve the problem

using the Stirling numbers of the second kind (Chapter 5). For the final case, in which bothobjects and containers are identical, the theory of partitions of integers (Chapter 9) willprovide some necessary results.

EXAMPLE 1.34 In how many ways can one distribute 10 (identical) white marbles among six distinctcontainers?

Solving this problem is equivalent to finding the number of nonnegative integer solutionsto the equation x1 + x2 + · · · + x6 � 10. That number is the number of selections of size 10,with repetition, from a collection of size 6. Hence the answer is C(6 + 10 − 1, 10) � 3003.

We now examine two other examples related to the theme of this section.


58 Chapter 2 Fundamentals of Logic

Table 2.10

p q r p ∧ (q ∨ r) (p ∧ q) ∨ (p ∧ r) p ∨ (q ∧ r) (p ∨ q) ∧ (p ∨ r)

0 0 0 0 0 0 00 0 1 0 0 0 00 1 0 0 0 0 00 1 1 0 0 1 11 0 0 0 0 1 11 0 1 1 1 1 11 1 0 1 1 1 11 1 1 1 1 1 1

Before going any further, we note that, in general, if s1, s2 are statements and s1 ↔ s2is a tautology, then s1, s2 must have the same corresponding truth values (that is, for eachassignment of truth values to the primitive statements in s1 and s2, s1 is true if and onlyif s2 is true and s1 is false if and only if s2 is false) and s1 ⇐⇒ s2. When s1 and s2 arelogically equivalent statements (that is, s1 ⇐⇒ s2), then the compound statement s1 ↔ s2 isa tautology. Under these circumstances it is also true that ¬s1 ⇐⇒ ¬s2, and ¬s1 ↔ ¬s2 isa tautology.

If s1, s2, and s3 are statements where s1 ⇐⇒ s2 and s2 ⇐⇒ s3 then s1 ⇐⇒ s3. When twostatements s1 and s2 are not logically equivalent, we may write s1 ⇐�⇒ s2 to designate thissituation.

Using the concepts of logical equivalence, tautology, and contradiction, we state thefollowing list of laws for the algebra of propositions.

The Laws of LogicFor any primitive statements p, q, r, any tautology T0, and any contradiction F0,

1) ¬¬p⇐⇒ p Law of Double Negation

2) ¬(p ∨ q)⇐⇒ ¬p ∧ ¬q DeMorgan’s Laws¬(p ∧ q)⇐⇒ ¬p ∨ ¬q

3) p ∨ q ⇐⇒ q ∨ p Commutative Lawsp ∧ q ⇐⇒ q ∧ p

4) p ∨ (q ∨ r)⇐⇒ (p ∨ q) ∨ r† Associative Lawsp ∧ (q ∧ r)⇐⇒ (p ∧ q) ∧ r

5) p ∨ (q ∧ r)⇐⇒ (p ∨ q) ∧ (p ∨ r) Distributive Lawsp ∧ (q ∨ r)⇐⇒ (p ∧ q) ∨ (p ∧ r)

6) p ∨ p⇐⇒ p Idempotent Lawsp ∧ p⇐⇒ p

7) p ∨ F0 ⇐⇒ p Identity Lawsp ∧ T0 ⇐⇒ p

†We note that because of the Associative Laws, there is no ambiguity in statements of the form p ∨ q ∨ r orp ∧ q ∧ r.


2.2 Logical Equivalence: The Laws of Logic 59

8) p ∨ ¬p⇐⇒ T0 Inverse Lawsp ∧ ¬p⇐⇒ F0

9) p ∨ T0 ⇐⇒ T0 Domination Lawsp ∧ F0 ⇐⇒ F0

10) p ∨ (p ∧ q)⇐⇒ p Absorption Lawsp ∧ (p ∨ q)⇐⇒ p

We now turn our attention to proving all of these properties. In so doing we realize thatwe could simply construct the truth tables and compare the results for the correspondingtruth values in each case — as we did in Examples 2.8 and 2.9. However, before we startwriting, let us take one more look at this list of 19 laws, which, aside from the Law ofDouble Negation, fall naturally into pairs. This pairing idea will help us after we examinethe following concept.

Definition 2.3 Let s be a statement. If s contains no logical connectives other than ∧ and ∨, then the dualof s, denoted sd, is the statement obtained from s by replacing each occurrence of ∧ and ∨by ∨ and ∧, respectively, and each occurrence of T0 and F0 by F0 and T0, respectively.

If p is any primitive statement, then pd is the same as p— that is, the dual of a primitivestatement is simply the same primitive statement. And (¬p)d is the same as ¬p. Thestatements p ∨ ¬p and p ∧ ¬p are duals of each other whenever p is primitive — and soare the statements p ∨ T0 and p ∧ F0.

Given the primitive statements p, q, r and the compound statement

s: (p ∧ ¬q) ∨ (r ∧ T0),

we find that the dual of s is

sd : (p ∨ ¬q) ∧ (r ∨ F0).

(Note that ¬q is unchanged as we go from s to sd .)We now state and use a theorem without proving it. However, in Chapter 15 we shall

justify the result that appears here.

THEOREM 2.1 The Principle of Duality. Let s and t be statements that contain no logical connectives otherthan ∧ and ∨. If s ⇐⇒ t, then sd ⇐⇒ td .

As a result, laws 2 through 10 in our list can be established by proving one of the lawsin each pair and then invoking this principle.

We also find that it is possible to derive many other logical equivalences. For example,if q, r, s are primitive statements, the results in columns 5 and 7 of Table 2.11 show us that

(r ∧ s)→ q ⇐⇒ ¬(r ∧ s) ∨ qor that [(r ∧ s)→ q] ↔ [¬(r ∧ s) ∨ q] is a tautology. However, instead of always con-structing more (and, unfortunately, larger) truth tables it might be a good idea to recall fromExample 2.7 that for primitive statements p, q, the compound statement

(p→ q)↔ (¬p ∨ q)


2.2 Logical Equivalence: The Laws of Logic 65

requires that each of the switches p, q be closed in order for current to flow from T1 to T2.Here the switches are in series; this network is represented by p ∧ q.

The switches in a network need not act independently of each other. Consider the networkshown in Fig. 2.2(a). Here the switches labeled t and ¬t are not independent. We havecoupled these two switches so that t is open (closed) if and only if ¬t is simultaneouslyclosed (open). The same is true for the switches at q, ¬q. (Also, for example, the threeswitches labeled p are not independent.)

p

T1

(a)

p p

r

q

r

t

q

t

(b)

T2t

q

r

p

T2 T1

Figure 2.2

This network is represented by the statement (p ∨ q ∨ r) ∧ (p ∨ t ∨ ¬q) ∧(p ∨ ¬t ∨ r). Using the laws of logic, we may simplify this statement as follows.

(p ∨ q ∨ r) ∧ (p ∨ t ∨ ¬q) ∧ (p ∨ ¬t ∨ r) Reasons⇐⇒ p ∨ [(q ∨ r) ∧ (t ∨ ¬q) ∧ (¬t ∨ r)] Distributive Law of ∨

over ∧⇐⇒ p ∨ [(q ∨ r) ∧ (¬t ∨ r) ∧ (t ∨ ¬q)] Commutative Law of ∧⇐⇒ p ∨ [((q ∧ ¬t) ∨ r) ∧ (t ∨ ¬q)] Distributive Law of ∨

over ∧⇐⇒ p ∨ [((q ∧ ¬t) ∨ r) ∧ (¬¬t ∨ ¬q)] Law of Double Negation⇐⇒ p ∨ [((q ∧ ¬t) ∨ r) ∧ ¬(¬t ∧ q)] DeMorgan’s Law⇐⇒ p ∨ [¬(¬t ∧ q) ∧ ((¬t ∧ q) ∨ r)] Commutative Law of ∧

(twice)⇐⇒ p ∨ [(¬(¬t ∧ q) ∧ (¬t ∧ q)) ∨ (¬(¬t ∧ q) ∧ r)] Distributive Law of ∧

over ∨⇐⇒ p ∨ [F0 ∨ (¬(¬t ∧ q) ∧ r)] ¬s ∧ s ⇐⇒ F0, for any

statement s⇐⇒ p ∨ [(¬(¬t ∧ q)) ∧ r] F0 is the identity for ∨⇐⇒ p ∨ [r ∧ ¬(¬t ∧ q)] Commutative Law of ∧⇐⇒ p ∨ [r ∧ (t ∨ ¬q)] DeMorgan’s Law and

the Law of DoubleNegation

Hence (p ∨ q ∨ r) ∧ (p ∨ t ∨ ¬q) ∧ (p ∨ ¬t ∨ r)⇐⇒ p ∨ [r ∧ (t ∨ ¬q)], and the net-work shown in Fig. 2.2(b) is equivalent to the original network in the sense that current



This exhaustive listing is an example of a proof using the technique we call, ratherappropriately, the method of exhaustion. This method is reasonable when we are dealingwith a fairly small universe. If we are confronted with a situation in which the universeis larger but within the range of a computer that is available to us, then we might write aprogram to check all of the individual cases.

(Note that for certain cases in Table 2.24 more than one answer may be possible. Forexample, we could have written 18 � 9 + 9 and 26 � 16 + 9 + 1. But this is all right. Wewere told that each positive even integer less than or equal to 26 could be written as thesum of one, two, or three perfect squares. We were not told that each such representationhad to be unique, so more than one possibility could occur. What we had to check in eachcase was that there was at least one possibility.)

In the previous example we mentioned the word theorem. We also found this term used inChapter 1 — for example, in results like the binomial theorem and the multinomial theoremwhere we were introduced to certain types of enumeration problems. Without getting overlytechnical, we shall consider theorems to be statements of mathematical interest, statementsthat are known to be true. Sometimes the term theorem is used only to describe majorresults that have many and varied consequences. Certain of these consequences that followrather immediately from a theorem are termed corollaries (as in the case of Corollary 1.1in Section 1.3). In this text, however, we shall not be so particular in our use of the wordtheorem.

Example 2.52 is a nice starting point to examine the proof of a quantified statement.Unfortunately, a great number of mathematical statements and theorems often deal withuniverses that do not lend themselves to the method of exhaustion. When faced with es-tablishing or proving a result for all integers, for example, or for all real numbers, then wecannot use a case-by-case method like the one in Example 2.52. So what can we do?

We start by considering the following rule.

The Rule of Universal Specification: If an open statement becomes true for allreplacements by the members in a given universe, then that open statement is true foreach specific individual member in that universe. (A bit more symbolically — if p(x)is an open statement for a given universe, and if �x p(x) is true, then p(a) is true foreach a in the universe.)

This rule indicates that the truth of an open statement in one particular instance follows(as a special case) from the more general (for the entire universe) truth of that universallyquantified open statement. The following examples will show us how to apply this idea.

EXAMPLE 2.53 a) For the universe of all people, consider the open statements

m(x): x is a mathematics professor c(x): x has studied calculus.

Now consider the following argument.

All mathematics professors have studied calculus.

Leona is a mathematics professor.

Therefore Leona has studied calculus.



This time the Rule of Universal Specification leads us to

p(c)→ q(c)

¬p(c)∴ ¬q(c)

where the fallacy arises because we are trying to argue by the inverse.

And now let us look back at the three parts of Example 2.53. Although the argumentspresented there involved premises that were universally quantified statements, there wasnever any instance where a universally quantified statement appeared in the conclusion. Wenow want to remedy this situation, since many theorems in mathematics have the form ofa universally quantified statement. To do so we need the following considerations.

Start with a given universe and the open statement p(x). To establish the truth of thestatement �x p(x), we must establish the truth of p(c) for each member c in the givenuniverse. But if the universe has many members or, for example, contains all the positiveintegers, then this exhaustive, if not exhausting, task of validating the truth of each p(c)becomes difficult, if not impossible. To get around this situation we shall prove that p(c)is true — but now we do it for the case where c denotes a specific but arbitrarily chosenmember from the prescribed universe.

Should the preceding open statement p(x) have the form q(x)→ r(x), for open state-ments q(x) and r(x), then we shall assume the truth of q(c) as an additional premise and tryto deduce the truth of r(c)— by using definitions, axioms, previously proven theorems, andthe logical principles we have studied. For when q(c) is false, the implication q(c)→ r(c)

is true, regardless of the truth value of r(c).The reason that the element c must be arbitrary (or generic) is to make sure that what

we do and prove about c is applicable for all the other elements in the universe. If we aredealing with the universe of all integers, for example, we cannot choose c in an arbitrarymanner by selecting c as 4, or by selecting c as an even integer. In general, we cannotmake any assumptions about our choice for c unless these assumptions are valid for all theother elements of the universe. The word generic is applied to the element c here because itindicates that our choice (for c) must share all of the common characteristics of the elementsfor the given universe.

The principle we have described in the preceding three paragraphs is named and sum-marized as follows.

The Rule of Universal Generalization: If an open statement p(x) is proved to betrue when x is replaced by any arbitrarily chosen element c from our universe, then theuniversally quantified statement �x p(x) is true. Furthermore, the rule extends beyonda single variable. So if, for example, we have an open statement q(x, y) that is provedto be true when x and y are replaced by arbitrarily chosen elements from the sameuniverse, or their own respective universes, then the universally quantified statement�x �y q(x, y) [or, �x,y q(x, y)] is true. Similar results hold for the cases of three ormore variables.

Before we demonstrate the use of this rule in any examples, we wish to look back atpart (1) of Example 2.43 in Section 2.4. It turns out that the explanation given there toestablish that

�x [p(x) ∧ (q(x) ∧ r(x))] ⇐⇒ �x [(p(x) ∧ q(x)) ∧ r(x)]


128 Chapter 3 Set Theory

EXAMPLE 3.7 Returning now to Example 3.6 we determine the number of subsets of the set C � {1, 2,3, 4}. In constructing a subset of C, we have, for each member x of C, two distinct choices:Either include it in the subset or exclude it. Consequently, there are 2 � 2 � 2 � 2 choices,resulting in 24 � 16 subsets of C. These include the empty set ∅ and the set C itself. Shouldwe need the number of subsets of two elements from C, the result is the number of waystwo objects can be selected from a set of four objects, namely, C(4, 2) or

(42

). As a result,

the total number of subsets of C, 24, is also the sum(

40

)+ (41

)+ (42

)+ (43

)+ (44

), where the

first summand is for the empty set, the second summand for the four singleton subsets, thethird summand for the six subsets of size 2, and so on. So 24 �

∑4k�0

(4k

).

Definition 3.4 If A is a set from universe �, the power set of A, denoted �(A),† is the collection (or set)of all subsets of A.

EXAMPLE 3.8 For the set C of Example 3.7, �(C)� {∅, {1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {1, 4}, {2, 3},{2, 4}, {3, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}, C}.

For any finite setAwith |A| � n≥ 0, we find thatA has 2n subsets and that |�(A)| � 2n.For any 0 ≤ k ≤ n, there are

(nk

)subsets of size k. Counting the subsets of A according

to the number, k, of elements in a subset, we have the combinatorial identity(n0

)+ (n1

)+ (n2

)+ · · · + (nn

)�∑n

k�0

(nk

)� 2n, for n≥ 0.

This identity was established earlier in Corollary 1.1(a). The presentation here is anotherexample of a combinatorial proof because the identity is established by counting the samecollection of objects (subsets of A) in two different ways.

A systematic way to represent the subsets of a given nonempty set can be accomplishedby using a coding scheme known as a Gray code. This is demonstrated in our next example.

EXAMPLE 3.9 Consider the binary strings (of 0’s and 1’s) in Fig. 3.1. In particular, examine the first columnof the strings in part (b). How did this column come about? First we see 0, then 1 — as inpart (a) of the figure. Then we see 1 followed by 0 — the reverse order (from bottom to top)of the two binary strings in part (a). Once we obtain the first column for the binary stringsin part (b), we then list two 0’s followed by two 1’s.

Continuing with the strings in part (c) of the figure, now we concentrate on the first twocolumns. The first four entries (binary strings of length 2) are precisely the four stringsin part (b). The last four entries (again, binary strings of length 2) are likewise the binarystrings in part (b) — now in reverse order (from bottom to top). For these eight strings oflength 2, we append 0 to the right of the first four and 1 to the right of the last four.

For each Gray code in parts (a), (b), (c) of the figure, as we go from one binary string (ina column) to the next binary string (in that column), there is exactly one bit that changes.For instance, in part (b), in going from 10 to 11, we find one change (from 0 to 1) in thesecond position. Furthermore, for the third and fourth strings in part (c), as we go from

†In some computer science textbooks the reader may find the notation 2A used for �(A).


3.1 Sets and Subsets 133

EXAMPLE 3.14 In Fig. 3.3 we find a part of the useful and interesting array of numbers called Pascal’striangle

00� �

10� �

20� �

30� �

40� �

50� �

11� �

21� � 2

2� �33� �

44� �

55� �

42� �

31� � 3

2� �41� � 4

3� �51� � 5

2� � 53� � 5

4� �

(n � 0)

(n � 1)

(n � 2)

(n � 3)

(n � 4)

(n � 5)

Figure 3.3

Note that in this partial listing the two triangles shown satisfy the condition that thebinomial coefficient at the bottom of the inverted triangle is the sum of the other two termsin the triangle. This result follows from the identity in Example 3.12.

When we replace each of the binomial coefficients by its numerical value, the Pascaltriangle appears as shown in Fig. 3.4.

1

1

1

1

1

1

1

2 1

1

1

1

6

3 3

4 4

5 10 10 5

(n � 0)

(n � 1)

(n � 2)

(n � 3)

(n � 4)

(n � 5)

Figure 3.4

There are certain sets of numbers that appear frequently throughout the text. Conse-quently, we close this section by assigning them the following designations.

a) Z � the set of integers � {0, 1,−1, 2,−2, 3,−3, . . .}b) N � the set of nonnegative integers or natural numbers � {0, 1, 2, 3, . . .}c) Z+ � the set of positive integers � {1, 2, 3, . . .} � {x ∈ Z | x > 0}d) Q � the set of rational numbers � {a/b | a, b ∈ Z, b �� 0}e) Q+ � the set of positive rational numbers � {r ∈ Q | r > 0}f ) Q∗ � the set of nonzero rational numbers

g) R � the set of real numbers



h) R+ � the set of positive real numbers

i) R∗ � the set of nonzero real numbers

j) C � the set of complex numbers � {x + yi | x, y ∈ R, i2 � −1}k) C∗ � the set of nonzero complex numbers

l) For each n ∈ Z+, Zn � {0, 1, 2, . . . , n− 1}m) For real numbers a, b with a < b, [a, b] � {x ∈ R | a ≤ x ≤ b},

(a, b)� {x ∈ R | a < x < b}, [a, b)� {x ∈ R | a ≤ x < b},(a, b] � {x ∈ R | a < x ≤ b}. The first set is called a closed interval, the secondset an open interval, and the other two sets half-open intervals.

EXERCISES 3.1

1. Which of the following sets are equal?

a) {1, 2, 3} b) {3, 2, 1, 3}c) {3, 1, 2, 3} d) {1, 2, 2, 3}

2. Let A� {1, {1}, {2}}. Which of the following statementsare true?

a) 1 ∈ A b) {1} ∈ Ac) {1} ⊆ A d) {{1}} ⊆ A

e) {2} ∈ A f ) {2} ⊆ A

g) {{2}} ⊆ A h) {{2}} ⊂ A

3. For A� {1, 2, {2}}, which of the eight statements in Exer-cise 2 are true?

4. Which of the following statements are true?

a) ∅ ∈ ∅ b) ∅ ⊂ ∅ c) ∅ ⊆ ∅d) ∅ ∈ {∅} e) ∅ ⊂ {∅} f ) ∅ ⊆ {∅}

5. Determine all of the elements in each of the following sets.

a) {1 + (−1)n| n ∈ N}b) {n+ (1/n)| n ∈ {1, 2, 3, 5, 7}}c) {n3 + n2| n ∈ {0, 1, 2, 3, 4}}

6. Consider the following six subsets of Z.

A� {2m+ 1|m ∈ Z} B � {2n+ 3| n ∈ Z}C � {2p − 3| p ∈ Z} D � {3r + 1| r ∈ Z}E � {3s + 2| s ∈ Z} F � {3t − 2| t ∈ Z}

Which of the following statements are true and which are false?

a) A� B b) A� C c) B � C

d) D � E e) D � F f ) E � F

7. Let A, B be sets from a universe �. (a) Write a quan-tified statement to express the proper subset relation A⊂ B.(b) Negate the result in part (a) to determine when A⊂/ B.

8. For A� {1, 2, 3, 4, 5, 6, 7}, determine the number of

a) subsets of A

b) nonempty subsets of A

c) proper subsets of A

d) nonempty proper subsets of A

e) subsets of A containing three elements

f ) subsets of A containing 1, 2

g) subsets of A containing five elements, including 1, 2

h) subsets of A with an even number of elements

i) subsets of A with an odd number of elements

9. a) If a set A has 63 proper subsets, what is |A|?b) If a set B has 64 subsets of odd cardinality, what is |B|?c) Generalize the result of part (b)

10. Which of the following sets are nonempty?

a) {x|x ∈ N, 2x + 7 � 3}b) {x ∈ Z | 3x + 5 � 9}c) {x|x ∈ Q, x2 + 4 � 6}d) {x ∈ R | x2 + 4 � 6}e) {x ∈ R | x2 + 3x + 3 � 0}f ) {x|x ∈ C, x2 + 3x + 3 � 0}

11. When she is about to leave a restaurant counter, Mrs. Al-banese sees that she has one penny, one nickel, one dime, onequarter, and one half-dollar. In how many ways can she leavesome (at least one) of her coins for a tip if (a) there are no re-strictions? (b) she wants to have some change left? (c) she wantsto leave at least 10 cents?

12. Let A� {1, 2, 3, 4, 5, 7, 8, 10, 11, 14, 17, 18}.a) How many subsets of A contain six elements?

b) How many six-element subsets of A contain four evenintegers and two odd integers?

c) How many subsets of A contain only odd integers?

13. Let S � {1, 2, 3, . . . , 29, 30}. How many subsets A of Ssatisfy (a) |A| � 5? (b) |A| � 5 and the smallest element in Ais 5? (c) |A| � 5 and the smallest element in A is less than 5?

14. a) How many subsets of {1, 2, 3, . . . , 11} contain at leastone even integer?


3.1 Sets and Subsets 135

b) How many subsets of {1, 2, 3, . . . , 12} contain at leastone even integer?

c) Generalize the results of parts (a) and (b).

15. Give an example of three sets W, X, Y such that W ∈Xand X ∈ Y but W �∈ Y .

16. Write the next three rows for the Pascal triangle shown inFig. 3.4

17. Complete the proof of Theorem 3.1.

18. For sets A, B, C ⊆ �, prove or disprove (with a counter-example), the following: If A⊆ B, B � C, then A� C.

19. In part (i) of Fig. 3.5 we have the first six rows of Pascal’striangle, where a hexagon centered at 4 appears in the last threerows. If we consider the six numbers (around 4) at the vertices ofthis hexagon, we find that the two alternating triples — namely,3, 1, 10 and 1, 5, 6 — satisfy 3 · 1 · 10 � 30 � 1 · 5 · 6. Part (ii)of the figure contains rows 4 through 7 of Pascal’s triangle. Herewe find a hexagon centered at 10, and the alternating triplesat the vertices — in this case, 4, 10, 15 and 6, 20, 5 — satisfy4 · 10 · 15 � 600 � 6 · 20 · 5.

a) Conjecture the general result suggested by these twoexamples.

b) Verify the conjecture in part (a).

1

1

1

1

1

1

1

2 1

13 3

4 6 4 1

5 10 10 5 1

1

1

1

1

1

1

1

1

3 3

6 15 20 15 6

5 10 10 5

4 6 4

(i)

(ii)

Figure 3.5

20. a) Among the strictly increasing sequences of integers thatstart with 1 and end with 7 are:

i) 1, 7 ii) 1, 3, 4, 7 iii) 1, 2, 4, 5, 6, 7

How many such strictly increasing sequences of integersstart with 1 and end with 7?

b) How many strictly increasing sequences of integers startwith 3 and end with 9?

c) How many strictly increasing sequences of integers startwith 1 and end with 37? How many start with 62 and endwith 98?

d) Generalize the results in parts (a) through (c).

21. One quarter of the five-element subsets of {1, 2, 3, . . . , n}contain the element 7. Determine n (≥ 5).

22. For a given universe �, let A⊆ � where A is finitewith |�(A)| � n. If B ⊆ �, how many subsets does B have,if (a) B � A ∪ {x}, where x ∈ � − A? (b) B � A ∪ {x, y},where x, y ∈ � − A? (c) B � A ∪ {x1, x2, . . . , xk}, wherex1, x2, . . . , xk ∈ � − A?

23. Determine which row of Pascal’s triangle contains threeconsecutive entries that are in the ratio 1 : 2 : 3.

24. Use the recursive technique of Example 3.9 to develop aGray code for the 16 binary strings of length 4. Then list eachof the 16 subsets of the ordered set {w, x, y, z} next to its cor-responding binary string.

25. Suppose that A contains the elements v, w, x, y, z and noothers. If a given Gray code for the 32 subsets ofA encodes theordered set {v, w} as 01100 and the ordered set {x, y} as 10001,write A as the corresponding ordered set.

26. For positive integers n, r show that(n+ r + 1

r

)�

(n+ r

r

)+(n+ r − 1

r − 1

)+ · · ·

+(n+ 2

2

)+(n+ 1

1

)+(n

0

)

�

(n+ r

n

)+(n+ r − 1

n

)+ · · ·

+(n+ 2

n

)+(n+ 1

n

)+(n

n

).

27. In the original abstract set theory formulated by Georg Can-tor (1845–1918), a set was defined as “any collection into awhole of definite and separate objects of our intuition or ourthought.” Unfortunately, in 1901, this definition led BertrandRussell (1872–1970) to the discovery of a contradiction — a re-sult now known as Russell’s paradox — and this struck at thevery heart of the theory of sets. (But since then several wayshave been found to define the basic ideas of set theory so thatthis contradiction no longer comes about.)

Russell’s paradox arises when we concern ourselves withwhether a set can be an element of itself. For example, the set


3.2 Set Operations and the Laws of Set Theory 139

i) (a) ⇒ (b) IfA, B are any sets, thenB ⊆ A ∪ B (as mentioned after Example 3.15).For the opposite inclusion, if x ∈ A ∪ B, then x ∈ A or x ∈ B, but since A⊆ B, ineither case we have x ∈ B. So A ∪ B ⊆ B and, since we now have both inclusions,it follows (once again from Definition 3.2) that A ∪ B � B.

ii) (b) ⇒ (c) Given sets A, B, we always have A⊇ A ∩ B (as mentioned after Ex-ample 3.15). For the opposite inclusion, let y ∈ A. With A ∪ B � B, y ∈ A⇒ y ∈A ∪ B ⇒ y ∈ B (since A ∪ B � B)⇒ y ∈ A ∩ B, so A⊆ A ∩ B and we concludethat A� A ∩ B.

iii) (c) ⇒ (d) We know that z ∈ B ⇒ z /∈ B. Now if z ∈ A ∩ B, then z ∈ B, sinceA ∩ B ⊆ B. The contradiction — namely, z /∈ B ∧ z ∈ B— tells us that z /∈ A ∩ B.Therefore, z /∈ A because A ∩ B � A. But z /∈ A⇒ z ∈ A, so B ⊆ A.

iv) (d) ⇒ (a) Last,w ∈ A⇒ w /∈ A. Ifw /∈ B, thenw ∈ B. WithB ⊆ A it then followsthatw ∈ A. This time we get the contradictionw /∈ A ∧ w ∈ A, and this tells us thatw ∈ B. Hence A⊆ B.

With a bit of theorem proving under our belts, we now introduce some of the major lawsthat govern set theory. These bear a marked resemblance to the laws of logic given in Section2.2. In many instances these set theoretic laws are similar to the arithmetic properties of thereal numbers, where “∪” plays the role of “+” and “∩” the role of “�.” However, there areseveral differences.

The Laws of Set TheoryFor any sets A, B, and C taken from a universe �

1) A� A Law of Double Complement

2) A ∪ B � A ∩ B DeMorgan’s Laws

A ∩ B � A ∪ B3) A ∪ B � B ∪ A Commutative Laws

A ∩ B � B ∩ A4) A ∪ (B ∪ C)� (A ∪ B) ∪ C Associative Laws

A ∩ (B ∩ C)� (A ∩ B) ∩ C5) A ∪ (B ∩ C)� (A ∪ B) ∩ (A ∪ C) Distributive Laws

A ∩ (B ∪ C)� (A ∩ B) ∪ (A ∩ C)6) A ∪ A� A Idempotent Laws

A ∩ A� A

7) A ∪ ∅ � A Identity Laws

A ∩ � � A

8) A ∪ A� � Inverse Laws

A ∩ A� ∅9) A ∪ � � � Domination Laws

A ∩ ∅ � ∅10) A ∪ (A ∩ B)� A Absorption Laws

A ∩ (A ∪ B)� A



BA

�

A A

� _

(a) (b)

(c) (d)

BA

��

BA

Figure 3.6

��

(a) (b)

(c) (d)

��

BA

BA BA

BA

Figure 3.7

We further illustrate the use of these diagrams by showing that for any setsA, B, C ⊆ �,

(A ∪ B) ∩ C � (A ∩ B) ∪ C.Instead of shading regions, another approach that also uses Venn diagrams numbers theregions as shown in Fig. 3.8 where, for example, region 3 is A ∩ B ∩ C and region 7 isA ∩ B ∩ C. Each region is a set of the form S1 ∩ S2 ∩ S3, where S1 is replaced by A orA, S2 by B or B, and S3 by C or C. Consequently, by the rule of product, there are eightpossible regions.

Consulting Fig. 3.8, we see thatA ∪ B comprises regions 2, 3, 5, 6, 7, 8 and that regions4, 6, 7, 8 make up set C. Therefore (A ∪ B) ∩ C comprises the regions common to A ∪ B



3.3Counting and Venn Diagrams

With all of the theoretical work and theorem proving we did in the last section, now is agood time to examine some additional counting problems.

For sets A, B from a finite universe �, the following Venn diagrams will help us obtaincounting formulas for |A| and |A ∪ B| in terms of |�|, |A|, |B|, and |A ∩ B|.

As Fig. 3.9 demonstrates,A ∪ A� � andA ∩ A� ∅, so by the rule of sum, |A| + |A| �|�| or |A| � |�| − |A|. The sets A, B, in Fig. 3.10, have empty intersection, so here therule of sum leads us to |A ∪ B| � |A| + |B| and necessitates that A, B be finite but doesnot require any condition on the cardinality of �.

A_

�

A

Figure 3.9

�

A B

Figure 3.10

Turning to the case where A, B are not disjoint, we motivate the formula for |A ∪ B|with the following example.

EXAMPLE 3.25 In a class of 50 college freshmen, 30 are studying C++, 25 are studying Java, and 10 arestudying both languages. How many freshmen are studying either computer language?

We let � be the class of 50 freshmen, A the subset of those students studying C++, andB the subset of those studying Java. To answer the question, we need |A ∪ B|. In Fig. 3.11the numbers in the regions are obtained from the given information: |A| � 30, |B| � 25,|A ∩ B| � 10. Consequently, |A ∪ B| � 45 �� 55 � 30 + 25 � |A| + |B|, because |A| +|B| counts the students in A ∩ B twice. To remedy this overcount, we subtract |A ∩ B|from |A| + |B| to obtain the correct formula: |A ∪ B| � |A| + |B| − |A ∩ B|.

A B

20 10 15

�

5

Figure 3.11

If A and B are finite sets, then |A ∪ B| � |A| + |B| − |A ∩ B|. Consequently, finitesets A and B are (mutually) disjoint if and only if |A ∪ B| � |A| + |B|.

In addition, when � is finite, from DeMorgan’s Law we have |A ∩ B| � |A ∪ B| �|�| − |A ∪ B| � |�| − |A| − |B| + |A ∩ B|.


3.3 Counting and Venn Diagrams 149

This situation extends to three sets, as the following example illustrates.

EXAMPLE 3.26 An AND gate in an ASIC (Application Specific Integrated Circuit) has two inputs: I1, I2,and one output: O. (See Fig. 3.12). Such an AND gate can have any or all of the followingdefects:

D1: The input I1 is stuck at 0.

D2: The input I2 is stuck at 0.

D3: The output O is stuck at 1.

I1

O

I2

Figure 3.12

3

4 1211

15

75

A B

C

43

�

Figure 3.13

For a sample of 100 such gates we letA, B, andC be the subsets (of these 100 gates) hav-ing defects D1, D2, and D3, respectively. With |A| � 23, |B| � 26, |C| � 30, |A ∩ B| �7, |A ∩ C| � 8, |B ∩ C| � 10, and |A ∩ B ∩ C| � 3, how many gates in the sample haveat least one of the defects D1, D2, D3?

Working backward from |A ∩ B ∩ C| � 3 to |A| � 23, we label the regions as shown inFig. 3.13 and find that |A ∪ B ∪ C| � |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| +|A ∩ B ∩ C| � 23 + 26 + 30 − 7 − 8 − 10 + 3 � 57. Thus the sample contains 57 ANDgates with at least one of the defects and 100 − 57 � 43 AND gates with no defect.

IfA, B, C are finite sets, then |A ∪ B ∪ C| � |A| + |B| + |C| − |A ∩ B| − |A ∩ C| −|B ∩ C| + |A ∩ B ∩ C|.

From the formula for |A ∪ B ∪ C| and DeMorgan’s Law, we find that if the universe� is finite, then |A ∩ B ∩ C| � |A ∪ B ∪ C| � |�| − |A ∪ B ∪ C| � |�| − |A| −|B| − |C| + |A ∩ B| + |A ∩ C| + |B ∩ C| − |A ∩ B ∩ C|.

We close this section with a problem that uses this last result.

EXAMPLE 3.27 A student visits an arcade each day after school and plays one game of either Laser Man,Millipede, or Space Conquerors. In how many ways can he play one game each day so thathe plays each of the three types at least once during a given school week?

Here there is a slight twist. The set � consists of all arrangements of size 5 taken fromthe set of three games, with repetitions allowed. The set A represents the subset of allsequences of five games played during the week without playing Laser Man. The sets Band C are defined similarly, leaving out Millipede and Space Conquerors, respectively.The enumeration techniques of Chapter 1 give |�| � 35, |A| � |B| � |C| � 25, |A ∩ B| �


3.4 A First Word on Probability 151

Under the assumption of equal likelihood, let � be the sample space for an experiment�. Each subset A of �, including the empty subset, is called an event. Each element of� determines an outcome, so if |�| � n and a ∈ �, A⊆ �, then

Pr({a})� The probability that {a} (or, a) occurs � |{a}|� � 1

n, and

Pr(A)� The probability that A occurs � |A||�| � |A|

n.

[Note: We often write Pr(a) for Pr({a}).]

We demonstrate these ideas in the following four examples.

EXAMPLE 3.28 When Daphne tosses a fair coin, what is the probability she gets a head? Here the samplespace � � {H, T} with A� {H} and we find that

Pr(A)�|A||�| �

1

2.

EXAMPLE 3.29 If Dillon rolls a fair die, what is the probability he gets (a) a 5 or a 6, (b) an even number?For either part the sample space � � {1, 2, 3, 4, 5, 6}. In part (a) we have eventA� {5, 6}and Pr(A)� |A|

|�| � 26 � 1

3 . For part (b) we consider event B � {2, 4, 6} and find that

Pr(B)� |B||�| � 3

6 � 12 .

Furthermore we also notice here that

i) Pr(�)� |�||�| � 6

6 � 1 — after all, the occurrence of the event � is a certainty; and

ii) Pr(A)� Pr({1, 2, 3, 4})� |A|� � 4

6 � 23 � 1 − 1

3 � 1 − Pr(A).

EXAMPLE 3.30 There are 20 students enrolled in Mrs. Arnold’s fourth-grade class. Hence, if she wants toselect two of her students, at random, to take care of the class rabbit, she may make herselection in

(202

)� 190 ways, so |�| � 190.

Now suppose that Kyle and Kody are two of the 20 students in the class and we let Abe the event that Kyle is one of the students selected and B be the event that the selectionincludes Kody. Consequently, upon choosing the students, at random, the probability thatMrs. Arnold selects

a) both Kyle and Kody is Pr(A ∩ B)�(

22

)/(

202

)� 1/190;

b) neither Kyle nor Kody is Pr(A ∩ B)�(

182

)/(

202

)� 153/190;

c) Kyle but not Kody is Pr(A ∩ B)�(

11

) (181

)/(

202

)� 18/190 � 9/95.

EXAMPLE 3.31 Consider drawing five cards from a standard deck of 52 cards. This can be done in(

525

)�

2,598,960 ways. Now suppose that Tanya draws five cards, at random, from a standarddeck. What is the probability she gets (a) three aces and two jacks; (b) three aces and a pair;(c) a full house (that is, three of one kind and a pair)?


3.5 The Axioms of Probability (Optional) 159

coins that we have weighed. In tossing this coin the sample space is � � {H, T}, but howcan we determine Pr(H), P r(T)? We might toss the coin n times, assuming the outcomeof each toss (after the first) is not affected by any previous outcome. If H comes up mtimes, then we can assign Pr(H)�m/n and Pr(T)� (n−m)/n� 1 − (m/n), where theaccuracy of these assigned probabilities improves as n grows larger.

Having addressed the issue of probabilities as relative frequencies, now it is time tofocus on the topic of this section — namely, the axioms of probability. One should find theseaxioms rather intuitive, especially when we look back at some of the results in Example3.29 and part (b) of Example 3.33. The axioms were first introduced in 1933 by AndreiKolmogorov and they apply to the case when the sample space � is finite.

The Axioms of ProbabilityLet � be the sample space for an experiment �. If A, B are any events — that is,∅ ⊆ A, B ⊆ � (so we now allow the empty set to be an event), then

1) Pr(A)≥ 0

2) Pr(�)� 1

3) if A, B are disjoint (or, mutually disjoint) then Pr(A ∪ B)� Pr(A)+ Pr(B).†

Using these axioms we shall now establish a number of applicable results.

THEOREM 3.7 The Rule of Complement. Let � be the sample space for an experiment �. If A is an event(that is, A⊆ �), then

Pr(A)� 1 − Pr(A).

Proof:We know that � � A ∪ AwithA ∩ A� ∅. So from axioms (2) and (3) it follows that1 � Pr(�)� Pr(A ∪ A)� Pr(A)+ Pr(A), and Pr(A)� 1 − Pr(A).

Note that when A� ∅ in Theorem 3.7 we have 1 � Pr(�)� Pr(A)� 1 − Pr(A), soPr(∅)� Pr(A)� 0, in agreement with our earlier assignment.

The result of Theorem 3.7 can help cut down on our calculations in solving certainprobability problems. This is demonstrated in the next two examples.

EXAMPLE 3.40 Suppose the letters in the word PROBABILITY are arranged in a random manner. Deter-mine Pr(A) for the event

A: The arrangement begins with one letter and ends in a different letter.

†Although our major concern in this chapter (if not the entire text) deals with � finite, when � is infiniteKolmogorov provided the fourth axiom:

4) if A1, A2, A3, . . . are events (taken from �) and Ai ∩ Aj � ∅ for all 1 ≤ i < j , then

Pr

( ∞�n�1

An

)�

∞∑n�1

Pr(An).


3.5 The Axioms of Probability (Optional) 161

c) Finally, what is the probability the team wins at least one tournament? Let us not dohere what we did in Example 3.40. If we let A be the given event, then Pr(A)�∑

8i�1

(8i

)(0.7)i(0.3)8−i . But Pr(A) is more readily determined as 1 − Pr(A), where

Pr(A)� the probability the team loses all eight tournaments � (0.3)8.

� 0.000066[as in part (a)]. Consequently, Pr(A)� 1 − (0.3)8

.� 0.999934.

Before we go on we want to examine the structure of the answer at the end of part (b)of Example 3.41. Each tournament in the example results in either a win (success) orloss (failure). Further, after the first tournament, the outcome of each later tournament isindependent of the outcome of any previous tournament. Such a two-outcome occurrenceis called a Bernoulli trial. If there are n such trials and each trial has probability p ofsuccess and probability q (� 1 − p) of failure, then the probability that there are (exactly)k successes among these n trials is(

n

k

)pkqn−k, 0 ≤ k ≤ n.

(We shall come upon this idea again in Section 16.5 when we study the application ofAbelian groups in coding theory.)

Returning now to the axioms of probability, we know from axiom (3) that, forA, B ⊆ �,if A ∩ B � ∅ then Pr(A ∪ B)� Pr(A)+ Pr(B). But what can we say if A ∩ B �� ∅?

BA

�

Figure 3.17

For the Venn diagram in Fig. 3.17 the interior of the rectangle represents the universe —here the sample space �. The shaded region in the diagram denotes the event A− B �A ∩ B. Further,

i) the events A ∩ B and B are disjoint, since (A ∩ B) ∩ B � A ∩ (B ∩ B)� A ∩ ∅ �∅; and

ii) (A ∩ B) ∪ B � (A ∪ B) ∩ (B ∪ B)� (A ∪ B) ∩ � � A ∪ B.

From these two observations and axiom (3) it follows that

(∗) P r(A ∪ B)� Pr((A ∩ B) ∪ B)� Pr(A ∩ B)+ Pr(B).

Next note that A� A ∩ � � A ∩ (B ∪ B)� (A ∩ B) ∪ (A ∩ B) where (A ∩ B) ∩(A ∩ B)� (A ∩ A) ∩ (B ∩ B)� A ∩ ∅ � ∅. So once again axiom (3) gives us

Pr(A)� Pr(A ∩ B)+ Pr(A ∩ B), or(∗∗) P r(A ∩ B)� Pr(A)− Pr(A ∩ B).The results in Eqs. (∗) and (∗∗) now establish the following.



3.6Conditional Probability: Independence

(Optional)

Throughout Sections 3.4 and 3.5 — especially prior to and at the end of Example 3.35, aswell as in and after Example 3.41 — we mentioned the idea of the independence of outcomes.There we questioned whether the occurrence of a certain outcome might somehow affect theoccurrence of another outcome. In this section we extend this idea from a single outcome toan event and make it more mathematically precise. To do so we proceed with the following.

EXAMPLE 3.45 Vincent rolls a pair of fair dice. The sample space � for this experiment is shown in Fig. 3.18,along with the events

A: The sum (on the faces) is at least 9.

B: A double is rolled.

� � � (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),

(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),

(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),

(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),

(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),

(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6) �

B

A

Figure 3.18

We see that Pr(A)� 1036 � 5

18 , P r(B)� 636 � 1

6 , and Pr(A ∩ B)� Pr(B ∩ A)� 236 � 1

18 .But now, instead of just asking about the probability of the occurrence of event B, we

go one step further. Here we want to determine the probability of the occurrence of eventB given the condition that eventA has occurred. This conditional probability is denoted byPr(B|A) and may be determined as follows.

The occurrence of event A reduces the sample space from the 36 equally-likely orderedpairs in � to the 10 equally-likely ordered pairs in A. Among the ordered pairs in A,two are also doubles — namely, (5, 5) and (6, 6). Consequently, the probability of B givenA� Pr(B|A)� 2

10 , and we notice that 210 � (2/36)

(10/36) � Pr(B ∩ A)

P r(A).

Before we suggest the result at the end of Example 3.45 as a general formula, let usconsider a second example — one where the outcomes are not equally likely.

EXAMPLE 3.46 Lindsay has a coin that is biased with Pr(H)� 23 and Pr(T)� 1

3 . She tosses this cointhree times, where the result of each toss is independent of any preceding result. The eightpossible outcomes in the sample space have the following probabilities:


3.6 Conditional Probability: Independence (Optional) 167

Pr(HHH)�(

23

)3� 8

27

Pr(HHT)� Pr(HTH)� Pr(THH)�(

23

)2 ( 13

)� 4

27

Pr(HTT)� Pr(THT)� Pr(TTH)�(

23

) (13

)2� 2

27

Pr(TTT)�(

13

)3� 1

27 .

[Note that the sum of these probabilities is 827 + 3

(427

)+ 3(

227

)+ 127 � 8 + 12 + 6 + 1

27 � 1.]Consider the events

A: The first toss results in a head[so A� {HTT, HTH, HHT, HHH} and

Pr(A)� 227 + 2

(4

27

)+ 827 � 18

27

].

B: The number of heads is even[so B � {TTT, HHT, HTH, THH} and

Pr(B)� 127 + 3

(427

)� 13

27

].

Furthermore, A ∩ B � {HTH, HHT} and Pr(B ∩ A)� Pr(A ∩ B)� 427 + 4

27 � 827 .

To determine the conditional probability ofB givenA— that is,Pr(B|A)— we’ll makeA our new sample space and redefine the probability of the four outcomes in A as follows:

Pr ′(HTT)� Pr(HTT)P r(A)

� (2/27)(18/27) � 1

9 Pr ′(HTH)� Pr(HTH)P r(A)

� (4/27)(18/27) � 2

9

Pr ′(HHT)� Pr(HHT)P r(A)

� (4/27)(18/27) � 2

9 Pr ′(HHH)� Pr(HHH)P r(A)

� (8/27)(18/27) � 4

9

[We see that Pr ′(HTT)+ Pr ′(HTH)+ Pr ′(HHT)+ Pr ′(HHH)� 19 + 2

9 + 29 + 4

9 � 1.]Among the four outcomes in A, two of them satisfy the condition given in event B—namely, HTH and HHT, the outcomes in B ∩ A. Consequently, Pr(B|A)� Pr ′(HTH)+Pr ′(HHT)� 2

9 + 29 � 4

9 � 818 � 8/27

18/27 � Pr(B∩A)P r(A)

.

Motivated by the final result in each of the last two examples, we now summarize theunderlying general procedure. We want a formula for Pr(B|A), the conditional probabilityof the occurrence of event B given the occurrence of event A. Further, this formula shouldhelp us avoid unnecessary calculations such as those in Example 3.46, where we recalculatedthe probability of each outcome in A.

Now once we know that the event A has occurred, the sample space � shrinks to theoutcomes inA. If we divide the probability of each outcome inA by Pr(A), as in Example3.46, the sum of these new probabilities sums to 1, so A can serve as the new samplespace. Further, suppose e1, e2 are two outcomes in � with Pr(e2)� kP r(e1), where k isa constant. If e1, e2 ∈ A, then within the new sample space A the probability of e1 is still ktimes that of e2.

To calculate Pr(B|A) we now consider those outcomes in event A that are in event B.This gives us the outcomes in event B ∩ A and leads us to the following.

If � is the sample space for an experiment � and A, B ⊆ �, then

the conditional probability of B given A� Pr(B|A)�Pr(B ∩ A)P r(A)

,

so long as Pr(A) �� 0.

Further,

Pr(B ∩ A)� Pr(A ∩ B)� Pr(A)P r(B|A),



and upon changing the roles of A and B we have

Pr(A ∩ B)� Pr(B ∩ A)� Pr(B)P r(A|B).The result

Pr(A)P r(B|A)� Pr(A ∩ B)� Pr(B)P r(A|B)is often called the multiplicative rule.

Without realizing it, we actually used the multiplicative rule in Example 3.39 — in thecase where the motors were not replaced after inspection. The first part of our next examplenow reinforces how we use this rule.

EXAMPLE 3.47 A cooler contains seven cans of cola and three cans of root beer. Without looking at thecontents, Gustavo reaches in and withdraws one can for his friend Jody. Then he reaches inagain to get a can for himself.

Let A, B denote the events

A: The first selection is a can of cola.

B: The second selection is a can of cola.

a) Using the multiplicative rule, the probability that Gustavo chooses two cans of cola is

Pr(A ∩ B)� Pr(A)P r(B|A)�

(7

10

) (6

9

)�

7

15.

[Here Pr(B|A)� 6/9 because after the first can of cola is removed, the cooler thencontains six cans of cola and three of root beer.]

b) The multiplicative rule and the additive rule (of Theorem 3.8) tell us that the probabilityGustavo selects two cans of cola or two cans of root beer is

Pr(A ∩ B)+ Pr(A ∩ B)� P(A)P (B|A)+ Pr(A)P r(B|A)�

(7

10

) (6

9

)+(

3

10

) (2

9

)�

8

15.

c) Finally, let us determine Pr(B). To do so we develop a new formula with the help ofthe Venn diagram (for a sample space � and events A, B) in Fig. 3.19. From the fig-ure (and the laws of set theory) we see that B � B ∩ � � B ∩ (A ∪ A)� (B ∩ A) ∪(B ∩ A), where (B ∩ A) ∩ (B ∩ A)� B ∩ (A ∩ A)� B ∩ ∅ � ∅.

Pr(B)� Pr(B ∩ A)+ Pr(B ∩ A)� Pr(A)P r(B|A)+ Pr(A)P r(B|A)�

(7

10

) (6

9

)+(

3

10

) (7

9

)�

63

90�

7

10.

AB

A

� _

B � A B � A_

Figure 3.19



The result at the end of Example 3.47 — namely, for A, B ⊆ �

Pr(B)� Pr(A)P r(B|A)+ Pr(A)P r(B|A)is referred to as the Law of Total Probability. Our next example shows how this result canbe generalized.

EXAMPLE 3.48 Emilio is a system integrator for personal computers. As such he finds himself using key-boards from three companies. Company 1 supplies 60% of the keyboards, company 2supplies 30% of the keyboards, and the remaining 10% comes from company 3. From pastexperience Emilio knows that 2% of company 1’s keyboards are defective, while the per-centages of defective keyboards for companies 2, 3 are 3% and 5%, respectively. If one ofEmilio’s computers is selected, at random, and then tested, what is the probability it has adefective keyboard?

Let A denote the event

A: The keyboard comes from company 1.

Events B, C are defined similarly for companies 2, 3, respectively. EventD, meanwhile, is

D: The keyboard is defective.

Here we are interested in Pr(D). Guided by the Venn diagram in Fig. 3.20, we see thatD �D ∩ � �D ∩ (A ∪ B ∪ C)� (D ∩ A) ∪ (D ∩ B) ∪ (D ∩ C). But here A ∩ B �A ∩ C � B ∩ C � ∅. So now, for example, the Laws of Set Theory show us that (D ∩ A)∩ (D ∩ B)�D ∩ (A ∩ B)�D ∩ ∅ � ∅. Likewise, (D ∩ A) ∩ (D ∩ C)� (D ∩ B) ∩(D ∩ C)� ∅, and (D ∩ A) ∩ (D ∩ B) ∩ (D ∩ C)� ∅. Consequently, by Theorem 3.9, wehave

Pr(D)� Pr(D ∩ A)+ Pr(D ∩ B)+ Pr(D ∩ C)� Pr(A)P r(D|A)+ Pr(B)P (D|B)+ Pr(C)P r(D|C).

(Here we have the Law of Total Probability for three sets; that is, the sample space � is theunion of three sets, any two of which are disjoint.)

A C

D

B

�

D � BD � A D � C

Figure 3.20

From the information given at the start of this example we know that

Pr(A)� 0.6 Pr(B)� 0.3 Pr(C)� 0.1Pr(D|A)� 0.02 Pr(D|B)� 0.03 Pr(D|C)� 0.05.

So Pr(D)� (0.6)(0.02)+ (0.3)(0.03)+ (0.1)(0.05)� 0.026, and this tells us that 2.6%of the personal computers integrated by Emilio will have defective keyboards.



Definition 3.13 For a sample space � and events A, B, C ⊆ �, we say that A, B, C are independent if

1) Pr(A ∩ B)� Pr(A)P r(B);

2) Pr(A ∩ C)� Pr(A)P r(C);

3) Pr(B ∩ C)� Pr(B)P r(C); and

4) Pr(A ∩ B ∩ C)� Pr(A)P r(B)P r(C).

Looking back now at Example 3.51 we see that there we verified the independence of theevents A, B, C. But did we do too much? In particular, do we really need condition (4) inDefinition 3.13? Perhaps we may feel that the first three conditions are enough to insure thefourth condition. But, perhaps, they are not enough. The next example will help us settlethis issue.

EXAMPLE 3.52 Adira tosses a fair coin four times. So in this case the sample space � �{x1x2x3x4|xi ∈ {H, T}, 1 ≤ i ≤ 4}.

Let A, B, C ⊆ � be the events:

A: Adira’s first toss is a tail (T);

B: Adira’s last toss is a tail (T); and

C: The four tosses yield two heads and two tails.

For these events we find that Pr(A)� 816 � 1

2 , P r(B)� 816 � 1

2 , and Pr(C)� 116

(42

)�

616 � 3

8 .In addition,

Pr(A ∩ B)� 416 � 1

4 �(

12

) (12

)� Pr(A)P r(B),

P r(A ∩ C)� 316 �

(12

) (38

)� Pr(A)P r(C), and

Pr(B ∩ C)� 316 �

(12

) (38

)� Pr(B)P r(C).

However, A ∩ B ∩ C � {THHT} and Pr(A ∩ B ∩ C)� 116 � 2

32 �� 332 �

(12

) (12

) (38

)�

Pr(A)P r(B)P r(C). So while the three events in Example 3.51 are independent, the threeevents in this example are (mutually) independent in pairs — but not independent.

In closing this section we provide a summary of the probability rules and laws we havelearned in this and the preceding section.

Summary of Probability Rules and Laws1) The Rule of Complement: Pr(A)� 1 − Pr(A)

2) The Additive Rule: Pr(A ∪ B)� Pr(A)+ Pr(B)− Pr(A ∩ B).When A, B are disjoint, Pr(A ∪ B)� Pr(A)+ Pr(B).

3) Conditional Probability: Pr(A|B)�Pr(A ∩ B)P r(B)

, P r(B) �� 0

4) Multiplicative Rule: Pr(A)P r(B|A)� Pr(A ∩ B)� Pr(B)P r(A|B).When A, B are independent, Pr(A ∩ B)� Pr(A)P r(B).



5) The Law of Total Probability: Pr(B)� Pr(A)P r(B|A)+ Pr(A)P r(B|A)6) The Law of Total Probability (Extended Version): IfA1, A2, . . . , An ⊆ �, wheren≥ 3, Ai ∩ Aj � ∅ for all 1 ≤ i < j ≤ n, and � � ∪ni�1Ai , then for any event B,

Pr(B)� Pr(A1)P r(B|A1)+ · · · + Pr(An)P r(B|An)�n∑i�1

Pr(Ai)P r(B|Ai).

7) Bayes’Theorem: Pr(A|B)�Pr(A ∩ B)P r(B)

�Pr(A)P r(B|A)

P r(A)P r(B|A)+ Pr(A)P r(B|A)8) Bayes’ Theorem (Extended Version): If A1, A2, . . . , An ⊆ �, where n≥ 3,Ai ∩ Aj � ∅ for all 1 ≤ i < j ≤ n, and � � ∪ni�1Ai , then for any event B, andeach 1 ≤ k ≤ n,

Pr(Ak|B)�Pr(Ak ∩ B)P r(B)

�Pr(Ak)P r(B|Ak)

P r(A1)P r(B|A1)+ · · · + Pr(An)P r(B|An)�

Pr(Ak)P r(B|Ak)∑n

i�1 Pr(Ai)P r(B|Ai) .

EXERCISES 3.6

1. Recall that in a standard deck of 52 cards there are 12 pic-ture cards — four each of jacks, queens, and kings. Kevin drawsone card from the deck. Find the probability his card is a kingif we know that the card drawn is an ace or a picture card.

2. Let A, B be events taken from a sample space �.If Pr(A)� 0.6, P r(B)� 0.4, and Pr(A ∪ B)� 0.7, findPr(A|B) and Pr(A|B).3. If Coach Mollet works his football team throughoutAugust,

then the probability the team will be the division champion is0.75. The probability the coach will work his team throughoutAugust is 0.80. What is the probability Coach Mollet works histeam throughout August and the team finishes as the divisionchampion?

4. The 420 freshmen at an engineering college take eithercalculus or discrete mathematics (but not both). Further, bothcourses are offered providing either an introduction to a CAS(computer algebra system) or using such a system extensivelythroughout the course. The results in Table 3.6 summarize howthe 420 freshmen are distributed.

Table 3.6CAS CAS

(Introduction) (ExtensiveCoverage)

Calculus 170 120Discrete Mathematics 80 50

a) If Sandrine is taking calculus, what is the probabilityher class is only being introduced to the use of a CAS?

b) Derek’s class is making extensive use of the CAS. Whatis the probability Derek is taking discrete mathematics?

5. Let � be the sample space for an experiment � and letA, Bbe events from �. If A, B are independent, prove that

Pr(A ∪ B)� Pr(A)+ Pr(A)P r(B)

� Pr(B)+ Pr(B)P r(A).

6. Ceilia tosses a fair coin five times. What is the probabil-ity she gets three heads, if the first toss results in (a) a head;(b) a tail?

7. One bag contains 15 identical (in shape) coins — nine ofsilver and six of gold. A second bag contains 16 more of thesecoins — six silver and 10 gold. Bruno reaches in and selects onecoin from the first bag and then places it in the second bag. ThenMadeleine selects one coin from this second bag.

a) What is the probability Madeleine selected a gold coin?

b) If Madeleine’s coin is gold, what is the probabilityBruno had selected a gold coin?

8. A coin is loaded so that Pr(H)� 2/3 and Pr(T)� 1/3.Todd tosses this coin twice.

Let A, B be the events

A: The first toss is a tail. B: Both tosses are the same.

Are A, B independent?

9. Suppose that A, B are independent with Pr(A ∪ B)� 0.6and Pr(A)� 0.3. Find Pr(B).

10. Alice tosses a fair coin seven times. Find the probabilityshe gets four heads given that (a) her first toss is a head; (b) herfirst and last tosses are heads.


194 Chapter 4 Properties of the Integers: Mathematical Induction

When we try to do likewise for the rational and real numbers, however, we find that

Q+ � {x ∈Q|x > 0} and R+ � {x ∈ R|x > 0},but we cannot represent Q+ or R+ using ≥ as we did for Z+.

The set Z+ is different from the sets Q+ and R+ in that every nonempty subset X ofZ+ contains an integer a such that a ≤ x, for all x ∈X— that is, X contains a least (orsmallest) element. This is not so for either Q+ or R+. The sets themselves do not contain leastelements. There is no smallest positive rational number or smallest positive real number. Ifq is a positive rational number, then since 0< q/2< q, we would have the smaller positiverational number q/2.

These observations lead us to the following property of the set Z+ ⊂ Z.

The Well-Ordering Principle: Every nonempty subset of Z+ contains a smallestelement. (We often express this by saying that Z+ is well ordered.)

This principle serves to distinguish Z+ from Q+ and R+. But does it lead anywhere thatis mathematically interesting or useful? The answer is a resounding “Yes!” It is the basisof a proof technique known as mathematical induction. This technique will often help us toprove a general mathematical statement involving positive integers when certain instancesof that statement suggest a general pattern.

We now establish the basis for this induction technique.

THEOREM 4.1 The Principle of Mathematical Induction. Let S(n) denote an open mathematical statement(or set of such open statements) that involves one or more occurrences of the variable n,which represents a positive integer.

a) If S(1) is true; and

b) If whenever S(k) is true (for some particular, but arbitrarily chosen, k ∈ Z+), thenS(k + 1) is true;

then S(n) is true for all n ∈ Z+.

Proof: Let S(n) be such an open statement satisfying conditions (a) and (b), and let F �{t ∈ Z+|S(t) is false}. We wish to prove that F � ∅, so to obtain a contradiction we assumethat F �� ∅. Then by the Well-Ordering Principle, F has a least element m. Since S(1)is true, it follows that m �� 1, so m> 1, and consequently m− 1 ∈ Z+. With m− 1 /∈ F ,we have S(m− 1) true. So by condition (b) it follows that S((m− 1)+ 1)� S(m) is true,contradicting m ∈ F . This contradiction arose from the assumption that F �� ∅. Conse-quently, F � ∅.

We have now seen how the Well-Ordering Principle is used in the proof of the Principle ofMathematical Induction. It is also true that the Principle of Mathematical Induction is usefulif one wants to prove the Well-Ordering Principle. However, we shall not concern ourselveswith that fact right now. In this section our major goal will center on understanding andusing the Principle of Mathematical Induction. (But in the exercises for Section 4.2 we shallexamine how the Principle of Mathematical Induction is used to prove the Well-OrderingPrinciple.)


250 Chapter 5 Relations and Functions

N

EN

E

N

E

N

E

N

E

First set Second set Third set(when needed)

(*)

(**)

Figure 5.2

Tree diagrams are examples of a general structure called a tree. Trees and graphs areimportant structures that arise in computer science and optimization theory. These will beinvestigated in later chapters.

For the cross product of two sets, we find the subsets of this structure of great interest.

Definition 5.2 For sets A, B, any subset of A � B is called a (binary) relation from A to B. Any subsetof A � A is called a (binary) relation on A.

Since we will primarily deal with binary relations, for us the word “relation” will meanbinary relation, unless something otherwise is specified.

EXAMPLE 5.5 With A, B as in Example 5.1, the following are some of the relations from A to B.

a) ∅ b) {(2, 4)}c) {(2, 4), (2, 5)} d) {(2, 4), (3, 4), (4, 4)}e) {(2, 4), (3, 4), (4, 5)} f ) A � B

Since |A � B| � 6, it follows from Definition 5.2 that there are 26 possible relationsfrom A to B (for there are 26 possible subsets of A � B).

For finite sets A, B with |A| � m and |B| � n, there are 2mn relations from A to B,including the empty relation as well as the relation A � B itself.

There are also 2nm(� 2mn) relations from B to A, one of which is also ∅ and anotherof which is B � A. The reason we get the same number of relations from B to A as wehave from A to B is that any relation �1 from B to A can be obtained from a uniquerelation �2 from A to B by simply reversing the components of each ordered pair in�2 (and vice versa).

EXAMPLE 5.6 For B � {1, 2}, let A � �(B) � {∅, {1}, {2}, {1, 2}}. The following is an example of arelation on A: � � {(∅, ∅), (∅, {1}), (∅, {2}), (∅, {1, 2}), ({1}, {1}), ({1}, {1, 2}),({2}, {2}), ({2}, {1, 2}), ({1, 2}, {1, 2})}.We can say that the relation � is the subset relationwhere (C, D) ∈ � if and only if C, D ⊆ B and C ⊆D.


5.2 Functions: Plain and One-to-One 253

We often write f (a) � b when (a, b) is an ordered pair in the function f . For (a, b) ∈ f ,b is called the image of a under f , whereas a is a preimage of b. In addition, the definitionsuggests that f is a method for associating with each a ∈ A the unique element f (a) �b ∈ B. Consequently, (a, b), (a, c) ∈ f implies b � c.

EXAMPLE 5.9 For A � {1, 2, 3} and B � {w, x, y, z}, f � {(1, w), (2, x), (3, x)} is a function, and con-sequently a relation, from A to B. �1 � {(1, w), (2, x)} and �2 � {(1, w), (2, w), (2, x),(3, z)} are relations, but not functions, from A to B. (Why?)

Definition 5.4 For the function f : A→ B, A is called the domain of f and B the codomain of f . Thesubset of B consisting of those elements that appear as second components in the orderedpairs of f is called the range of f and is also denoted by f (A) because it is the set ofimages (of the elements of A) under f.

In Example 5.9, the domain of f � {1, 2, 3}, the codomain of f � {w, x, y, z}, and therange of f � f (A) � {w, x}.

A pictorial representation of these ideas appears in Fig. 5.4. This diagram suggests that amay be regarded as an input that is transformed by f into the corresponding output, f (a).In this context, a C++ compiler can be thought of as a function that transforms a sourceprogram (the input) into its corresponding object program (the output).

BA

f (a) � ba

f f (A)

Figure 5.4

EXAMPLE 5.10 Many interesting functions arise in computer science.

a) A common function encountered is the greatest integer function, or floor function.This function f : R → Z, is given by

f (x) � �x� � the greatest integer less than or equal to x.

Consequently, f (x) � x, if x ∈ Z; and, when x ∈ R − Z, f (x) is the integer to theimmediate left of x on the real number line.

For this function we find that

1) �3.8� � 3, �3� � 3, �−3.8� � −4, �−3� � −3;

2) �7.1 + 8.2� � �15.3� � 15 � 7 + 8 � �7.1� + �8.2�; and

3) �7.7 + 8.4� � �16.1� � 16 � 15 � 7 + 8 � �7.7� + �8.4�.


5.2 Functions: Plain and One-to-One 255

EXAMPLE 5.12 In Sections 4.1 and 4.2 we were introduced to the concept of a sequence in conjunctionwith our study of recursive definitions. We should now realize that a sequence of realnumbers r1, r2, r3, . . . can be thought of as a function f : Z+ → R where f (n) � rn, for alln ∈ Z+. Likewise, an integer sequence a0, a1, a2, . . . can be defined by means of a functiong: N → Z where g(n) � an, for all n ∈ N.

In Example 5.9 there are 212 � 4096 relations from A to B. We have examined onefunction among these relations, and now we wish to count the total number of functionsfrom A to B.

For the general case, let A, B be nonempty sets with |A| � m, |B| � n. Consequently,if A � {a1, a2, a3, . . . , am} and B � {b1, b2, b3, . . . , bn}, then a typical functionf : A→ B can be described by {(a1, x1), (a2, x2), (a3, x3), . . . , (am, xm)}. We canselect any of the n elements of B for x1 and then do the same for x2. (We can se-lect any element of B for x2 so that the same element of B may be selected for both x1

and x2.) We continue this selection process until one of the n elements of B is finallyselected for xm. In this way, using the rule of product, there are nm � |B||A| functionsfrom A to B.

Therefore, for A, B in Example 5.9, there are 43 � |B||A| � 64 functions from A to B,and 34 � |A||B| � 81 functions from B to A. In general, we do not expect |A||B| to equal|B||A|. Unlike the situation for relations, we cannot always obtain a function from B to A

by simply interchanging the components in the ordered pairs of a function from A to B (orvice versa).

Now that we have the concept of a function as a special type of relation, we turn ourattention to a special type of function.

Definition 5.5 A function f : A→ B is called one-to-one, or injective, if each element of B appears atmost once as the image of an element of A.

If f : A→ B is one-to-one, with A, B finite, we must have |A| ≤ |B|. For arbitrary setsA, B, f : A→ B is one-to-one if and only if for all a1, a2 ∈ A, f (a1) � f (a2)⇒ a1 � a2.

EXAMPLE 5.13 Consider the functionf : R → R wheref (x) � 3x + 7 for allx ∈ R. Then for allx1, x2 ∈ R,we find that

f (x1) � f (x2)⇒ 3x1 + 7 � 3x2 + 7 ⇒ 3x1 � 3x2 ⇒ x1 � x2,

so the given function f is one-to-one.On the other hand, suppose that g: R → R is the function defined by g(x) � x4 − x for

each real number x. Then

g(0) � (0)4 − 0 � 0 and g(1) � (1)4 − (1) � 1 − 1 � 0.

Consequently,g is not one-to-one, sinceg(0) � g(1) but 0 � 1 — that is,g is not one-to-onebecause there exist real numbers x1, x2 where g(x1) � g(x2) ⇒ x1 � x2.



EXAMPLE 5.14 Let A � {1, 2, 3} and B � {1, 2, 3, 4, 5}. The function

f � {(1, 1), (2, 3), (3, 4)}is a one-to-one function from A to B;

g � {(1, 1), (2, 3), (3, 3)}is a function from A to B, but it fails to be one-to-one because g(2) � g(3) but 2 � 3.

ForA, B in Example 5.14 there are 215 relations fromA toB and 53 of these are functionsfrom A to B. The next question we want to answer is how many functions f : A→ B areone-to-one. Again we argue for general finite sets.

With A � {a1, a2, a3, . . . , am}, B � {b1, b2, b3, . . . , bn}, and m≤ n, a one-to-onefunction f : A→ B has the form {(a1, x1), (a2, x2), (a3, x3), . . . , (am, xm)}, wherethere are n choices for x1 (that is, any element of B), n− 1 choices for x2 (that is,any element of B except the one chosen for x1), n− 2 choices for x3, and so on, finish-ing with n− (m− 1) � n−m+ 1 choices for xm. By the rule of product, the numberof one-to-one functions from A to B is

n(n− 1)(n− 2) · · · (n−m+ 1) �n!

(n−m)! � P(n, m) � P(|B|, |A|).

Consequently, for A, B in Example 5.14, there are 5 · 4 · 3 � 60 one-to-one functionsf : A→ B.

Definition 5.6 If f : A→ B and A1 ⊆ A, then

f (A1) � {b ∈ B|b � f (a), for some a ∈ A1},and f (A1) is called the image of A1 under f.

EXAMPLE 5.15 For A � {1, 2, 3, 4, 5} and B � {w, x, y, z}, let f : A→ B be given by f � {(1, w),

(2, x), (3, x), (4, y), (5, y)}. Then forA1 � {1}, A2 � {1, 2}, A3 � {1, 2, 3}, A4 � {2, 3},and A5 � {2, 3, 4, 5}, we find the following corresponding images under f :f (A1) � {f (a)|a ∈ A1} � {f (a)|a ∈ {1}} � {f (a)|a � 1} � {f (1)} � {w};f (A2) � {f (a)|a ∈ A2} � {f (a)|a ∈ {1, 2}} � {f (a)|a � 1 or 2}

� {f (1), f (2)} � {w, x};f (A3) � {f (1), f (2), f (3)} � {w, x}, andf (A3) � f (A2) because f (2) � x � f (3);f (A4) � {x}; andf (A5) � {x, y}.

EXAMPLE 5.16 a) Let g: R → R be given by g(x) � x2. Then g(R) � the range of g � [0,+∞). Theimage of Z under g is g(Z) � {0, 1, 4, 9, 16, . . .}, and for A1 � [−2, 1] we getg(A1) � [0, 4].


5.3 Onto Functions: Stirling Numbers of the Second Kind 263

b) If Teresa does more than just the most expensive account, the assignments can be madein

∑4k�0(−1)k

(4

4−k

)(4 − k)6 � 1560 ways. (1560 � the number of onto functions

g: C →D with |C| � 6, |D| � 4.)

Consequently, the assignments can be given under the prescribed conditions in 540 +1560 � 2100 ways. [We mentioned earlier that the answer would not be 8400, but it is(1/4)(8400) � (1/|B|)(8400), where 8400 is the number of onto functions f : A→ B,with |A| � 7 and |B| � 4. This is no coincidence, as we shall learn when we discussTheorem 5.3.]

We now continue our discussion with the distribution of distinct objects into containerswith none left empty, but now the containers become identical.

EXAMPLE 5.26 If A � {a, b, c, d} and B � {1, 2, 3}, then there are 36 onto functions from A to B or,equivalently, 36 ways to distribute four distinct objects into three distinguishable containers,with no container empty (and no regard for the location of objects in a given container).Among these 36 distributions we find the following collection of six (one of six such possiblecollections of six):

1) {a, b}1 {c}2 {d}3 2) {a, b}1 {d}2 {c}3

3) {c}1 {a, b}2 {d}3 4) {c}1 {d}2 {a, b}3

5) {d}1 {a, b}2 {c}3 6) {d}1 {c}2 {a, b}3,

where, for example, the notation {c}2 means that c is in the second container. Now ifwe no longer distinguish the containers, these 6 � 3! distributions become identical, sothere are 36/(3!) � 6 ways to distribute the distinct objects a, b, c, d among three identicalcontainers, leaving no container empty.

For m≥ n there are∑n

k�0(−1)k(

nn−k

)(n− k)m ways to distribute m distinct objects into

n numbered (but otherwise identical) containers with no container left empty. Removingthe numbers on the containers, so that they are now identical in appearance, we findthat one distribution into these n (nonempty) identical containers corresponds with n!such distributions into the numbered containers. So the number of ways in which it ispossible to distribute the m distinct objects into n identical containers, with no containerleft empty, is

1

n!n∑

k�0

(−1)k(

n

n− k

)(n− k)m.

This will be denoted by S(m, n) and is called a Stirling number of the second kind.We note that for |A| � m≥ n � |B|, there are n! · S(m, n) onto functions from A

to B.

Table 5.1 lists some Stirling numbers of the second kind.

EXAMPLE 5.27 For m≥ n,∑n

i�1 S(m, i) is the number of possible ways to distribute m distinct objectsinto n identical containers with empty containers allowed. From the fourth row of Table 5.1



Table 5.1

S(m, n)

mn 1 2 3 4 5 6 7 8

1 12 1 13 1 3 14 1 7 6 15 1 15 25 10 16 1 31 90 65 15 17 1 63 301 350 140 21 18 1 127 966 1701 1050 266 28 1

we see that there are 1 + 7 + 6 � 14 ways to distribute the objects a, b, c, d among threeidentical containers, with some container(s) possibly empty.

We continue now with the derivation of an identity involving Stirling numbers of thesecond kind. The proof is combinatorial in nature.

THEOREM 5.3 Let m, n be positive integers with 1 < n≤m. Then

S(m+ 1, n) � S(m, n− 1)+ nS(m, n).

Proof: Let A � {a1, a2, . . . , am, am+1}. Then S(m+ 1, n) counts the number of ways inwhich the objects of A can be distributed among n identical containers, with no containerleft empty.

There are S(m, n− 1) ways of distributing a1, a2, . . . , am among n− 1 identical con-tainers, with none left empty. Then, placing am+1 in the remaining empty container resultsin S(m, n− 1) of the distributions counted in S(m+ 1, n)— namely, those distributionswhere am+1 is in a container by itself. Alternatively, distributing a1, a2, . . . , am among then identical containers with none left empty, we have S(m, n) distributions. Now, however,for each of these S(m, n) distributions the n containers become distinguished by their con-tents. Selecting one of the n distinct containers for am+1, we have nS(m, n) distributionsof the total S(m+ 1, n)— namely, those where am+1 is in the same container as anotherobject from A. The result then follows by the rule of sum.

To illustrate Theorem 5.3 consider the triangle shown in Table 5.1. Here the largest num-ber corresponds with S(m+ 1, n), for m � 7 and n � 3, and we see that S(7 + 1, 3) �966 � 63 + 3(301) � S(7, 2)+ 3S(7, 3). The identity in Theorem 5.3 can be used to ex-tend Table 5.1 if necessary.

If we multiply the result in Theorem 5.3 by (n− 1)! we have(1

n

)[n!S(m+ 1, n)] � [(n− 1)!S(m, n− 1)] + [n!S(m, n)] .


5.5 The Pigeonhole Principle 273

Table 5.7

Grams of % of RDAa of % of RDA of % of RDA ofCode Name Sugar per Vitamin A per Vitamin C per Protein per

of Cereal 1-oz Serving 1-oz Serving 1-oz Serving 1-oz Serving

U 1 25 25 6V 7 25 2 4W 12 25 2 4X 0 60 40 20Y 3 25 40 10Z 2 25 40 10

aRDA � recommended daily allowance

Table 5.8

Vitamin Vitamin Present Amount of Vitamin Dosage: No. of CapsulesCapsule in Capsule in Capsule in IUa Capsules / Day per Bottle

1 A 10,000 1 1001 D 400 1 1001 E 30 1 1002 A 4,000 1 2502 D 400 1 2502 E 15 1 250

aIU � international units

5.5The Pigeonhole Principle

A change of pace is in order as we introduce an interesting distribution principle. Thisprinciple may seem to have nothing in common with what we have been doing so far, butit will prove to be helpful nonetheless.

In mathematics one sometimes finds that an almost obvious idea, when applied in arather subtle manner, is the key needed to solve a troublesome problem. On the list of suchobvious ideas many would undoubtedly place the following rule, known as the pigeonholeprinciple.

The Pigeonhole Principle: If m pigeons occupy n pigeonholes and m > n, then atleast one pigeonhole has two or more pigeons roosting in it.

One situation for 6 (� m) pigeons and 4 (� n) pigeonholes (actually birdhouses) is shownin Fig. 5.7. The general result readily follows by the method of proof by contradiction. Ifthe result is not true, then each pigeonhole has at most one pigeon roosting in it — for atotal of at most n (< m) pigeons. (Somewhere we have lost at least m− n pigeons!)

But now what can pigeons roosting in pigeonholes have to do with mathematics —discrete, combinatorial, or otherwise? Actually, this principle can be applied in variousproblems in which we seek to establish whether a certain situation can actually occur. We


7.2 Computer Recognition: Zero-One Matrices and Directed Graphs 347

EXAMPLE 7.22 Let A � {1, 2, 3, 4} and � � {(1, 2), (1, 3), (2, 4), (3, 2)}, as in Example 7.19. Keepingthe order of the elements in A fixed, we define the relation matrix for � as follows: M(�)

is the 4 � 4 (0, 1)-matrix whose entries mij , for 1 ≤ i,j ≤ 4, are given by

mij �

{1, if (i, j) ∈ �,

0, otherwise.

In this case we find that

M(�) �

0 1 1 00 0 0 10 1 0 00 0 0 0

.

Now how can this be of any use? If we compute (M(�))2 using the convention that1 + 1 � 1, then we find that

(M(�))2 �

0 1 0 10 0 0 00 0 0 10 0 0 0

,

which happens to be the relation matrix for � ◦ � � �2. (Check Example 7.19.) Further-more,

(M(�))4 �

0 0 0 00 0 0 00 0 0 00 0 0 0

,

which is also the relation matrix for the relation �4 — that is, (M(�))4 � M(�4). Also,recall that �4 � ∅, as we learned in Example 7.19.

What has happened here carries over to the general situation. We now state some resultsabout relation matrices and their use in studying relations.

Let A be a set with |A| � n and � a relation on A. If M(�) is the relation matrix for�, then

a) M(�) � 0 (the matrix of all 0’s) if and only if � � ∅b) M(�) � 1 (the matrix of all 1’s) if and only if � � A � A

c) M(�m) � [M(�)]m, for m ∈ Z+

Using the (0, 1)-matrix for a relation, we now turn to the recognition of the reflex-ive, symmetric, antisymmetric, and transitive properties. To accomplish this we need theconcepts introduced in the following three definitions.

Definition 7.11 Let E � (eij )m�n, F � (fij )m�n be two m � n (0, 1)-matrices. We say that E precedes, oris less than, F, and we write E ≤ F , if eij ≤ fij , for all 1 ≤ i ≤ m, 1 ≤ j ≤ n.


358 Chapter 7 Relations: The Second Time Around

ii) If � is transitive and G contains a directed cycle (of length greater than or equal tothree),

then the relation � cannot be antisymmetric, so (A, �) fails to be a partial order.

EXAMPLE 7.37 Consider the directed graph for the partial order in Example 7.35. Figure 7.17(a) is thegraphical representation of �. In part (b) of the figure, we have a somewhat simpler dia-gram, which is called the Hasse diagram for �.

2 3

1

4

(a)

2 3

1

4

(b)

Figure 7.17

When we know that a relation � is a partial order on a set A, we can eliminate the loopsat the vertices of its directed graph. Since � is also transitive, having the edges (1, 2) and(2, 4) is enough to insure the existence of edge (1, 4), so we need not include that edge. Inthis way we obtain the diagram in Fig. 7.17(b), where we have not lost the directions onthe edges — the directions are assumed to go from the bottom to the top.

In general, if � is a partial order on a finite set A, we construct a Hasse diagram for� on A by drawing a line segment from x up to y, if x, y ∈ A with x � y and, mostimportant, if there is no other element z ∈ A such that x � z and z � y. (So there isnothing “in between” x and y.) If we adopt the convention of reading the diagram frombottom to top, then it is not necessary to direct any edges.

EXAMPLE 7.38 In Fig. 7.18 we have the Hasse diagrams for the following four posets. (a) With � � {1, 2, 3}and A � �(�), � is the subset relation on A. (b) Here � is the “(exactly) divides” relation

�1,2,3�

�1,3�

�1� �3��2�

8

4

2

1 2 3 5 7 2 3

12

6 35

5 7

385

11

(a) (b) (c) (d)

�1,2� �2,3�

0/

Figure 7.18



Topological Sorting Algorithm(for a partial order � on a set A with |A| � n)

Step 1: Set k � 1. Let H1 be the Hasse diagram of the partial order.

Step 2: Select a vertex vk in Hk such that no (implicitly directed) edge in Hk startsat vk .

Step 3: If k � n, the process is completed and we have a total order

�: vn < vn−1 < · · · < v2 < v1

that contains �.

If k < n, then remove from Hk the vertex vk and all (implicitly directed) edges of Hk

that terminate at vk . Call the result Hk+1. Increase k by 1 and return to step (2).

Here we have presented our algorithm as a precise list of instructions, with no concernabout the particulars of the pseudocode used in earlier chapters and with no reference to itsimplementation in a particular computer language.

Before we apply this algorithm† to the problem at hand, we should observe the deliberateuse of “a” before the word “vertex” in step (2). This implies that the selection need not beunique and that we can get several different total orders � containing �.Also, in step (3), forvertices vi−1 where 2 ≤ i ≤ n, the notation vi < vi−1 is used because it is more suggestiveof “vi before vi−1” than is the notation vi � vi−1.

In Fig. 7.21, we show the Hasse diagrams that evolve as we apply the topological sortingalgorithm to the partial order in Fig. 7.20. Below each diagram, the total order is listed asit evolves.

G F D

C

A

B E

G F

C

A

B E

G

C

A

B E

C

A

B E

A

B E B E E

(k � 1) H1 (k � 2) H2 (k � 3) H3 (k � 4) H4 (k � 5) H5 (k � 6) H6 (k � 7) H7

D F < D G < F < D C < G< F < D

A < C< G< F < D

B < A < C< G < F < D

E < B < A < C< G < F < D

Figure 7.21

If the toy manufacturer writes the instructions in a list as 1-E, 2-B, 3-A, 4-C, 5-G, 6-F,7-D, he or she will have a total order that preserves the partial order needed for correctassembly. This total order is one of 12 possible answers.

†Here we are only concerned with applying this algorithm. Hence we are assuming that it works and we shallnot present a proof of that fact. Furthermore, we may operate similarly with other algorithms we encounter.



To accomplish this, let us start with the following observations.

a) If two states in a machine are not 2-equivalent, could they possibly be 3-equivalent?(or k-equivalent, for k ≥ 4?)

The answer is no. If s1, s2 ∈ S and s1 �E2 s2 (that is, s1 and s2 are not 2-equivalent),then there is at least one string xy ∈ �2 such that ω(s1, xy) � v1v2 �� w1w2 �ω(s2, xy), where v1, v2, w1, w2 ∈ �. So with regard to E3, we find that s1 �E3 s2 be-cause for any z ∈ �, ω(s1, xyz) � v1v2v3 �� w1w2w3 � ω(s2, xyz).

In general, to find states that are (k + 1)-equivalent, we look at states that arek-equivalent.

b) Now suppose that s1, s2 ∈ S and s1 E2 s2. We wish to determine whether s1 E3 s2.That is, does ω(s1, x1x2x3) � ω(s2, x1x2x3) for all strings x1x2x3 ∈ �3? Con-sider what happens. First we get ω(s1, x1) � ω(s2, x1), because s1 E2 s2 ⇒ s1 E1 s2.Then there is a transition to the states ν(s1, x1) and ν(s2, x1). Consequently,ω(s1, x1x2x3) � ω(s2, x1x2x3) if ω(ν(s1, x1), x2x3) � ω(ν(s2, x1), x2x3) [that is,if ν(s1, x1) E2 ν(s2, x1)].

In general, for s1, s2, ∈ S, where s1 Ek s2, we find that s1 Ek+1 s2 if (and only if)ν(s1, x) Ek ν(s2, x) for all x ∈ �.

With these observations to guide us, we now present an algorithm for the minimization ofa finite state machine M .

Step 1: Set k � 1. We determine the states that are 1-equivalent by examining therows in the state table for M . For s1, s2 ∈ S it follows that s1 E1 s2 when s1, s2 havethe same output rows.

Let P1 be the partition of S induced by E1.

Step 2: Having determined Pk , we obtain Pk+1 by noting that if s1 Ek s2, thens1 Ek+1 s2 when ν(s1, x) Ek ν(s2, x) for all x ∈ �. We have s1 Ek s2 if s1, s2 arein the same cell of the partition Pk . Likewise, ν(s1, x) Ek ν(s2, x) for each x ∈ �,if ν(s1, x) and ν(s2, x) are in the same cell of the partition Pk . In this way Pk+1 isobtained from Pk .

Step 3: If Pk+1 � Pk , the process is complete. We select one state from each equiv-alence class and these states yield a minimal realization of M .

If Pk+1 �� Pk , we increase k by 1 and return to step (2).

We illustrate the algorithm in the following example.

EXAMPLE 7.60 With � � � � {0, 1}, let M be given by the state table shown in Table 7.1. Looking at theoutput rows, we see that s3 and s4 are 1-equivalent, as are s2, s5, and s6. Here E1 partitionsS as follows:

P1: {s1}, {s2, s5, s6}, {s3, s4}.For each s ∈ S and each k ∈ Z+, s Ek s, so as we continue this process to determine P2, weshall not concern ourselves with equivalence classes of only one state.

Since s3 E1 s4, there is a chance that we could have s3 E2 s4. Here ν(s3, 0) � s2,

ν(s4, 0) � s5 with s2 E1 s5, and ν(s3, 1) � s4, ν(s4, 1) � s3 with s4 E1 s3. Hence ν(s3, x) E1

ν(s4, x), for all x ∈ �, and s3 E2 s4. Similarly, ν(s2, 0) � s5, ν(s5, 0) � s2 with s5 E1 s2,and ν(s2, 1) � s2, ν(s5, 1) � s5 with s2 E1 s5. Thus s2 E2 s5. Finally, ν(s5, 0) � s2 and

June 5, 2003 11:32 j67-appb Sheet number 4 Page number 14 cyan black

A-14 Appendix 2 Matrices, Matrix Operations, and Determinants

In general, if a � (ai)1≤i≤n is a 1 � n row vector and b � (bi)1≤i≤n is an n � 1 column vector,then ab �

∑n

i�1 aibi . This result, which is a real number, is called the scalar product of the vectors(or matrices) a and b. This idea is the key we need for the following definition.

Definition A2.5 Given the matrices A � (aij )m�n and B � (bjk)n�p , the (matrix) product AB is the matrix C �(cik)m�p , where

cik � ai1b1k + ai2b2k + · · · + ainbnk �n∑

t�1

ait btk, for all 1 ≤ i ≤ m, 1 ≤ k ≤ p.

Hence the entry cik in the ith row and kth column of the m � p matrix C is obtained from the scalarproduct of the ith row (vector) of A and the j th column (vector) of B.

The following demonstrates the result given by Definition A2.5.

AB �

a11 a12 · · · a1n

a21 a22 · · · a2n

· · · · · · · · · · · ·ai1 ai2 · · · ain

· · · · · · · · · · · ·am1 am2 . . . amn

b11 b12 · · · b1k · · · b1p

b21 b22 · · · b2k · · · b2p

· · · · · · · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · ·bn1 bn2 · · · bnk · · · bnp

� C �

c11 c12 · · · c1k · · · c1p

c21 c22 · · · c2k · · · c2p

· · · · · · · · · · · · · · · · · ·ci1 ci2 · · · cik · · · cip· · · · · · · · · · · · · · · · · ·cm1 cm2 · · · cmk · · · cmp

EXAMPLE A2.6 a) Consider the matrices A � (aij )2�3 �

[1 2 13 0 4

]and B � (bjk)3�3 �

1 2 7

1 3 30 1 1

.

Then AB � C � (cik)2�3 �

[c11 c12 c13

c21 c22 c23

], where

c11 � 1 · 1 + 2 · 1 + 1 · 0 � 3

[1 2 13 0 4

] 1 2 7

1 3 30 1 1

,

c12 � 1 · 2 + 2 · 3 + 1 · 1 � 9

[1 2 13 0 4

] 1 2 7

1 3 30 1 1

,

c13 � 1 · 7 + 2 · 3 + 1 · 1 � 14

[1 2 13 0 4

] 1 2 7

1 3 30 1 1

,

c21 � 3 · 1 + 0 · 1 + 4 · 0 � 3

[1 2 13 0 4

] 1 2 7

1 3 30 1 1

,

Documents

Fundamental Principlesof Counting Estewart/c272/0201726343... · 2004-09-03 · Counting E numeration, or counting, may strike one as an obvious process that a student learns