Runs of Identical Outcomes in a Sequence of Bernoulli Trials

Western Kentucky UniversityTopSCHOLAR®

Masters Theses & Specialist Projects Graduate School

Spring 2018

Runs of Identical Outcomes in a Sequence ofBernoulli TrialsMatthew RiggleWestern Kentucky University, [email protected]

Follow this and additional works at: https://digitalcommons.wku.edu/theses

Part of the Applied Mathematics Commons, and the Discrete Mathematics and CombinatoricsCommons

This Thesis is brought to you for free and open access by TopSCHOLAR®. It has been accepted for inclusion in Masters Theses & Specialist Projects byan authorized administrator of TopSCHOLAR®. For more information, please contact [email protected].

Recommended CitationRiggle, Matthew, "Runs of Identical Outcomes in a Sequence of Bernoulli Trials" (2018). Masters Theses & Specialist Projects. Paper2451.https://digitalcommons.wku.edu/theses/2451

https://digitalcommons.wku.edu?utm_source=digitalcommons.wku.edu%2Ftheses%2F2451&utm_medium=PDF&utm_campaign=PDFCoverPages

https://digitalcommons.wku.edu/theses?utm_source=digitalcommons.wku.edu%2Ftheses%2F2451&utm_medium=PDF&utm_campaign=PDFCoverPages

https://digitalcommons.wku.edu/Graduate?utm_source=digitalcommons.wku.edu%2Ftheses%2F2451&utm_medium=PDF&utm_campaign=PDFCoverPages

https://digitalcommons.wku.edu/theses?utm_source=digitalcommons.wku.edu%2Ftheses%2F2451&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/115?utm_source=digitalcommons.wku.edu%2Ftheses%2F2451&utm_medium=PDF&utm_campaign=PDFCoverPages



RUNS OF IDENTICAL OUTCOMES IN A SEQUENCE OF BERNOULLITRIALS

A ThesisPresented to

The Faculty of the Department of MathematicsWestern Kentucky University

Bowling Green, Kentucky

In Partial FulfillmentOf the Requirements for the Degree

Master of Science

ByMatthew Riggle

May 2018

Dedicated to Mom and Dad and Michael

ACKNOWLEDGMENTS

There are many individuals who contributed to the success of this project.

First and foremost, I would like to thank my advisor, Dr. David Neal. He has

provided a wealth of guidance and insight throughout the course of the project,

and it is an honor to have written the last master’s thesis that he will advise.

(Happy retirement!)

Next, I would like to thank the other readers of my committee, Dr. Melanie

Autin and Dr. Ngoc Nguyen. Both of them have been supportive during the

project, and were great professors for the many statistics classes I was able to

take during my time at WKU.

In addition to those named above, I would also like to thank the many

other mathematics professors who have taught and advised me during my under-

graduate and graduate career. I have enjoyed working with all of you, and I am

grateful for all of the challenges and opportunities that helped me develop as a

mathematician.

Lastly, I would like to thank my friends and family who have supported

me during my academic journey. In particular, Kathleen Bell deserves special

recognition for doing countless homework assignments together, sharing in the

experience of being calculus TAs, being a great office mate, and finally for joining

me for several “Thesis Thursday” sessions. Also, I would like to thank Emily

Keith, who came all the way from Illinois to attend my defense.

iv

CONTENTS

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 2. Expectation for Obtaining a Run of Length 2 . . . . . . . . . . . . . . . . . . 3

2.1. The Direct Counting Approach for E[T2,2] . . . . . . . . . . . . . . . . . . . . . . . . . . .3

2.2. The Direct Counting Approach for E[T2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3. Conditional Average for E[T2,2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4. Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Chapter 3. Markov Chains and Recursive Sequences . . . . . . . . . . . . . . . . . . . . . . 18

3.1. Markov Chain Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2. A Larger Markov Chain Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3. Pseudo-Fibonacci Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Chapter 4. Expectation for Obtaining a Run of Length n . . . . . . . . . . . . . . . . . 30

4.1. Deriving E[Tn] Recursively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30

4.2. Deriving E[Tn,m] Recursively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32

4.3. Relationships Between E[Tn] and E[Tn,m] . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.4. Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.5. Applications of Conditional Runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Chapter 5. Simulations with Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Chapter 6. Generalizations and Further Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.1. Obtaining a Run with a Non-Constant Probability . . . . . . . . . . . . . . . . . 49

6.2. Obtaining a Run in a Finite Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

v

6.3. Obtaining a Run with a Categorical Distribution . . . . . . . . . . . . . . . . . . .53

6.4. Obtaining a Run from Multiple Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 55

Appendix A Code for Theorem 4.1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Appendix B Code for Theorem 4.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58

Appendix C Code for Theorems 4.4.1/4.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Appendix D Code for Conjecture 6.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

vi

LIST OF TABLES

Table 3.1 Selected Pseudo-Fibonacci Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Table 4.1 Probabilities for a Fair Run Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Table 5.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48

vii

RUNS OF IDENTICAL OUTCOMES IN A SEQUENCE OF BERNOULLITRIALS

Matthew Riggle May 2018 61 Pages

Directed by: David Neal, Melanie Autin, and Ngoc Nguyen

Department of Mathematics Western Kentucky University

The Bernoulli distribution is a basic, well-studied distribution in proba-

bility. In this thesis, we will consider repeated Bernoulli trials in order to study

runs of identical outcomes. More formally, for t ∈ N, we let Xt ∼ Bernoulli(p),

where p is the probability of success, q = 1 − p is the probability of failure, and

all Xt are independent. Then Xt gives the outcome of the tth trial, which is 1 for

success or 0 for failure. For n,m ∈ N, we define Tn to be the number of trials

needed to first observe n consecutive successes (where the nth success occurs on

trial XTn). Likewise, we define Tn,m to be the number of trials needed to first

observe either n consecutive successes or m consecutive failures.

We shall primarily focus our attention on calculating E[Tn] and E[Tn,m].

Starting with the simple cases of E[T2] and E[T2,2], we will use a variety of

techniques, such as counting arguments and Markov chains, in order to derive

the expectations. When possible, we shall also provide closed-form expressions

for the probability mass function, cumulative distribution function, variance, and

other values of interest. Eventually we will work our way to general formulas

for E[Tn] and E[Tn,m]. We will also derive formulas for conditional averages,

and discuss how famous results from probability such as Wald’s Identity apply to

our problem. Numerical examples will also be given in order to supplement the

discussion and clarify the results.

viii

CHAPTER 1

INTRODUCTION

Throughout, we letXt ∼ Bernoulli(p), where p is the probability of success,

q = 1 − p is the probability of failure, and all Xt are independent. Xt gives the

outcome of the tth trial, which is 1 for success and 0 for failure. For integers n,m ≥

2, we define Tn to be the number of trials needed to first observe n consecutive

successes (where the nth success occurs on trial XTn). Likewise, we define the

stopping time Tn,m to be the number of trials needed to first observe either n

consecutive successes or m consecutive failures. What are E[Tn] and E[Tn,m]?

Motivation for the main problem can be found in a variety of applications.

In particular, this problem is closely related to the concept of the “hot-hand

fallacy”, which was discussed in a famous psychology paper by Gilovich, Vallone,

and Tversky [2]. The idea is that basketball fans have a mental picture of a player

who makes several consecutive shots as being “on fire”, when in reality most such

streaks can be ascribed to random chance. The hot-hand fallacy is closely related

to the gambler’s fallacy [8], which falsely assumes that even for independent

events, long streaks of identical outcomes are less likely than probability would

dictate.

While we are largely interested in the average number of trials needed to

observe a run of a given length, other work has been done on related problems.

Several papers by Schilling discuss the average length of the longest run in a given

finite number of Bernoulli trials [9], [10], [11]. These papers provide examples of

some of the techniques we will use for our problem, including counting arguments

and recursive methods.

Another application of studying runs is related to the concept of “optimal

stopping,” which seeks to identify when to take a certain action so as to maximize

1

(or minimize) some value. An example of a gambling application will be presented

in this project, in which the player will quit after some number of consecutive

wins or losses. We will investigate the probability of coming out ahead and the

associated conditional winnings. Other research related to optimal stopping and

the Bernoulli distribution can be found in [6] and [12].

Finally, while not the main focus of this project, it is possible to closely

examine the probability mass function for the case T2 in order to develop recursive

“Fibonacci-like” sequences. It turns out that using p = 1/2 returns precisely the

Fibonacci sequence, and varying the probability parameter will provide a different

recursive sequence. Using a modified version of Binet’s Formula for the Fibonacci

numbers allows us to define such sequences as a side result. Similar recursive

sequences are examined in [1]. The two trivial cases, where p = 0 or p = 1, can

be handled with little effort. If p = 1, we will obtain a success on every trial, so

it takes n trials to obtain a run of n successes. Thus E[Tn] = n and E[Tn,m] = n.

If instead p = 0, we will obtain a failure on every trial, so it takes m trials to

obtain a run of m failures and E[Tn,m] = m. However, we will never obtain a run

of n successes, so E[Tn] =∞. We will thus focus on the case where 0 < p < 1.

We begin in Chapter 2 by studying the case where we wish to obtain a run

of length 2, in order to illustrate the techniques used to derive formulas for the

regular and conditional expectations. In Chapter 3, we use Markov chains as an

alternate approach to the problem, which allows use to highlight some nice results

related to the Fibonacci sequence. In Chapter 4, we generalize the problem for

runs of length n or m where n,m ≥ 2 and compare the results with those obtained

in Chapter 2. Chapter 5 describes simulation programs used to verify the results

obtained in previous chapters. Finally, we present further generalizations and

avenues for future study in Chapter 6.

2

CHAPTER 2

EXPECTATION FOR OBTAINING A RUN OF LENGTH 2

In this chapter, we utilize a variety of approaches to calculate E[T2] and

E[T2,2], beginning with a direct counting approach. Surprisingly, this approach

turns out to be more straightforward for the calculation of E[T2,2], so we begin

there.

2.1 - The Direct Counting Approach for E[T2,2]

We begin by deriving the probability mass function f2,2(t) = P (T2,2 = t),

defined for t ≥ 2. We note that we must end either by obtaining a run of 2

successes (which occurs with probability p2) or by obtaining a run of 2 failures

(which occurs with probability q2). Also, since we cannot obtain a run of length

2 before the observed run, all previous trials must alternate between success and

failure. This means that for any given value of t, there are two valid options,

depending on the parity of t. If t is even, then the only valid options are 1 0 1 0

1 0 ... 0 1 1 or 0 1 0 1 0 1 ... 1 0 0, where 1 indicates a success and 0 indicates a

failure. If t is odd, then the two valid options are 1 0 1 0 1 0 ... 1 0 0 or 0 1 0 1

0 1 ... 0 1 1. Thus, the outcome of the starting trial will depend on whether t is

even or odd.

We may then find the probability for any particular value of t, depend-

ing on whether it is even or odd. If t is even, then there will be either t2

+ 1

successes and t2− 1 failures, or vice versa. Also, each valid sequence of trials

has only one possible ordering. The corresponding probabilities are pt2+1q

t2−1 and

pt2−1q

t2+1, so the total probability is p

t2+1q

t2−1 +p

t2−1q

t2+1, which can be rewritten

as (pq)t2−1(p2 + q2).

3

If instead t is odd, then there will be either t+12

successes and t−12

failures,

or vice versa. The corresponding probabilities are pt+12 q

t−12 and p

t−12 q

t+12 , so the

total probability is pt+12 q

t−12 + p

t−12 q

t+12 , which can be simplified to (pq)

t−12 .

Thus the probability mass function f2,2(t) for T2,2 is as follows:

f2,2(t) =

p

t2−1q

t2−1(p2 + q2), if t is even

(pq)t−12 , if t is odd.

(2.1)

Summing f2,2(t) over all possible values of t allows us to prove the following result.

Theorem 2.1.1. For 0 ≤ p ≤ 1, we will obtain a run of length 2 almost surely.

Proof. For p = 0 and p = 1, it is guaranteed that the run will occur in the first

two trials. For 0 < p < 1, we let even t = 2k and odd t = 2k + 1, and consider

the piecewise sum of f2,2(t):

P (T2,2 <∞) =∞∑k=1

(pq)k−1(p2 + q2) +∞∑k=1

(pq)k

=∞∑k=0

(pq)k(p2 + q2) +∞∑k=0

(pq)k+1

=p2 + q2

1− pq+

pq

1− pq

=p2 + q2 + pq

1− pq

=p2 + q(q + p)

1− p(1− p)

=p2 + q

1− p+ p2

=p2 + q

q + p2

= 1.

Since the sum of f2,2(t) over all values of t is equal to 1, we will obtain a run of

4

two successes or two failures almost surely.

Theorem 2.1.1 holds even though there are two sequences which never give

a run of 2; i.e., those which forever alternate between success and failure. It is

not difficult to show that these two sequences occur with probability 0. However,

just because T2,2 is finite-valued, it is not obvious that E[T2,2] is finite.

As a corollary to our work thus far, it is now convenient to utilize the

probability mass function to develop a formula for the cumulative distribution

function F2,2(t) = P (T2,2 ≤ t). We note that for even t = 2k, the even sum will

contain an extra term. For odd t = 2k + 1, we have

F2,2(t) =k∑

i=1

(pq)i−1(p2 + q2) +k∑

i=1

(pq)i

=(p2 + q2)(1− (pq)k)

1− pq+pq − (pq)k+1

1− pq

=(p2 + q2)(1− (pq)k)

1− pq+pq − (pq)k+1

1− pq

=(p2 + q2)(1− (pq)k) + pq − (pq)k+1

1− pq

=(p2 + q2)(1− (pq)k) + pq(1− (pq)k)

1− pq

=(p2 + q2 + pq)(1− (pq)k)

1− (1− q)q

=(p(p+ q) + q2)(1− (pq)k)

1− q + q2

=(p+ q2)(1− (pq)k)

p+ q2

= 1− (pq)k,

and for even t = 2k, we have

F2,2(t) =k∑

i=1

(pq)i−1(p2 + q2) +k−1∑i=1

(pq)i

5

=(p2 + q2)(1− (pq)k)

1− pq+pq − (pq)k

1− pq

=(p2 + q2)(1− (pq)k) + pq − (pq)k

1− pq

=(p2 + q2)(1− (pq)k) + pq − 1 + (1− (pq)k)

1− pq

=(p2 + q2 + 1)(1− (pq)k) + pq − 1

1− pq

=(p2 + q2 + 1)(1− (pq)k)

1− pq− 1

=(p2 + (1− p)2 + 1)(1− (pq)k)

1− p(1− p)− 1

=(p2 + 1− 2p+ p2 + 1)(1− (pq)k)

1− p+ p2− 1

=(2p2 − 2p+ 2)(1− (pq)k)

1− p+ p2− 1

= 2(1− (pq)k)− 1

= 1− 2(pq)k.

Then we reach the following corollary.

Corollary 2.1.2. The cumulative mass function F2,2(t), which gives the proba-

bility of obtaining a run of two identical trials within t trials, is given by

F2,2(t) =

1− 2(pq)

t2 , if t is even

1− (pq)t−12 , if t is odd.

Returning to the probability mass function allows us to calculate E[T2,2].

Theorem 2.1.3. Let T2,2 be the number of trials required to obtain either a run

of two successes or a run of two failures. Then E[T2,2] =2 + pq

1− pq.

Proof. We use two sums, one for even t and one for odd t. For even t = 2k, we

6

have

∞∑k=1

(2k)(pq)k−1(p2 + q2),

and for odd t = 2k + 1, we have

∞∑k=1

(2k + 1)(pq)k.

Combining the even and odd sums, we obtain:

E[T2,2] =∞∑k=1

(2k)(pq)k−1(p2 + q2) +∞∑k=1

(2k + 1)(pq)k.

Applying the formulas∑∞

k=0 rk =

1

1− rand

∑∞k=1 kr

k−1 =1

(1− r)2and the fact

that p+ q = 1 allows us to simplify the sum:

E[T2,2] =∞∑k=1

(2k)(pq)k−1(p2 + q2) +∞∑k=1

(2k + 1)(pq)k

= 2(p2 + q2)∞∑k=1

k(pq)k−1 + 2pq∞∑k=1

k(pq)k−1 + pq∞∑k=0

(pq)k

=2(p2 + q2)

(1− pq)2+

2pq

(1− pq)2+

pq

(1− pq)

=2p2 + 2q2 + 2pq + pq(1− pq)

(1− pq)2

=2p2 + 2q2 + 3pq − p2q2

(1− pq)2

=(2p2 + 2pq) + (2q2 + 2pq)− pq − p2q2

(1− pq)2

=(2p2 + 2pq) + (2q2 + 2pq)− pq − p2q2

(1− pq)2

=2p(p+ q) + 2q(p+ q)− pq − p2q2

(1− pq)2

7

=2− pq − p2q2

(1− pq)2

=(2 + pq)(1− pq)

(1− pq)2

=2 + pq

1− pq.

We note that the formula is also valid for p = 0 or p = 1; in either case, we obtain

E[T2,2] = 2.

2.2 - The Direct Counting Approach for E[T2]

We now consider the probability mass function f2(t) = P (T = t), defined

for t ≥ 2, for first obtaining a run of two successes ending on the tth trial. The

first few values of f2(t) are straightforward to calculate: t = 2 corresponds to a

run of two successes, so f2(2) = p2. Likewise, for t = 3, we must obtain successes

on trials 2 and 3, and a failure on trial 1 (or else we would have achieved the run

at t = 2). Thus f2(3) = p2q. It is clear that for t > 3, the last three trials must

be failure, success, and success, in that order, but there are more possibilities for

the first t− 3 trials (hereafter referred to as the “pre-run”). The only restriction

is that we cannot obtain two consecutive successes. Note that the number of

successes in the pre-run cannot exceed⌊t−22

⌋, since otherwise we would have a

run of two successes earlier than the tth trial.

We now consider the choice of successes within the pre-run. If the pre-

run contains no successes, this can only occur in one way (we will write this as(t−20

)for reasons that will soon be apparent). If the pre-run contains exactly

1 success, there are(t−31

)ways this may occur. If the pre-run contains more

than 1 success, then each success we place prior to the last will necessarily be

preceded by a failure, which removes 1 position from consideration. Thus a pre-

run that contains exactly 2 successes can occur in(t−42

)ways, and a pre-run that

8

contains exactly k successes can occur in(t−2−k

k

)ways. Finally, note that a pre-

run of length t−3 with k successes occurs with probability pkqt−3−k, so the entire

observed sequence occurs with probability pk+2qt−2−k. Thus, the probability mass

function can be written as

f2(t) =

b t−22 c∑

k=0

(t− 2− k

k

)pk+2qt−2−k, defined for integers t ≥ 2, (2.2)

which leads to the following expected value

E[T2] =∞∑t=2

t

b t−22 c∑

k=0

(t− 2− k

k

)pk+2qt−2−k. (2.3)

As can be seen, the formula in (2.3) is quite unwieldy, so we now pursue

a different method of enumeration. Instead of developing a formula based on the

number of trials t, we will consider the number of failures. We know that the

sequence must end with two successes, but before that, there may be any number

of failures k ∈ {0, 1, 2, ...}. Between any two failures or before the first failure,

there can be at most one success. Thus there are i ∈ {0, 1, 2, ..., k} successes, in

addition to the two successes in the run. Thus the probability of our sequence

containing exactly k failures is

k∑i=0

(k

i

)p2+iqk = p2qk

k∑i=0

(k

i

)pi = p2qk(p+ 1)k. (2.4)

Using this probability allows us to prove the following result.

Theorem 2.2.1. For 0 < p ≤ 1, we will obtain a run of two successes almost

surely.

Proof. For p = 1, it is guaranteed that the run will occur in the first two trials.

9

For 0 < p < 1, we have

P (T2 <∞) =∞∑k=0

p2qk(p+ 1)k

= p2∞∑k=0

[q(p+ 1)]k

=p2

1− q(p+ 1)

=p2

1− pq − q

=p2

p− pq

=p2

p(1− q)

= 1.

As the total probability is equal to 1, we will obtain the run almost surely.

In this case, there are infinitely many sequences which never give a run of

2 successes; however, the union of all such sequences has probability 0.

We can also use the probability function in (2.4) to obtain a formula for

E[T2].

Theorem 2.2.2. Let T2 be the number of trials required to obtain a run of 2

successes. Then E[T2] =1 + p

p2.

Proof. We sum on the number of failures k, for k ≥ 0. Since we might have a

success before each failure, there are i successes before the run, for some 0 ≤ i ≤ k.

Together with the run of 2 successes, there is a total of 2 + k + i trials. Then

E[T2] =∞∑k=0

k∑i=0

(k

i

)p2+iqk(2 + k + i)

10

= p2∞∑k=0

qkk∑

i=0

(k

i

)pi(2 + k + i)

= p2∞∑k=0

qk

((2 + k)

k∑i=0

(k

i

)pi +

k∑i=0

(k

i

)pii

)

= p2∞∑k=0

qk((2 + k)(p+ 1)k + kp(p+ 1)k−1

)= p2

(∞∑k=0

qk(2 + k)(p+ 1)k +∞∑k=0

qkkp(p+ 1)k−1

)

= p2

(2∞∑k=0

qk(p+ 1)k +∞∑k=0

qk(p+ 1)kk + pq∞∑k=0

kqk−1(p+ 1)k−1

)

= p2(

2

1− q(p+ 1)+

(p+ 1)q

(1− q(p+ 1))2+

pq

(1− q(p+ 1))2

)=p2(2− 2q(p+ 1) + (p+ 1)q + pq)

(1− q(p+ 1))2

=p2(2− q(p+ 1) + pq)

(1− qp− q)2

=p2(2− qp− q + pq)

(p− pq)2

=p2(2− q)

(p(1− q))2

=2− qp2

=1 + p

p2.

The formula also works for p = 1, as E[T2] = 2.

2.3 - Conditional Average for E[T2,2]

We may also be interested in obtaining a run of n successes before we

obtain a run of m failures. In order to derive the conditional expectation of the

number of trials needed in this case, we will need the probability of first obtaining

a run of successes. As before, we first seek a derivation for the case where n = 2

and m = 2.

11

Theorem 2.3.1. Let P (S2) be the probability of obtaining a run of 2 successes

before obtaining a run of 2 failures. Then P (S2) =p(1− q2)

1− pq.

Proof. Suppose we obtain the run of 2 successes first and that it takes t trials.

Then the sequence of trials must end with two successes and be preceded by al-

ternating successes and failures. If t is even, we will have two successes (occurring

with probability p2), plus an equal number of successes and failures in the pre-

ceding t− 2 trials. If instead t is odd, we will have two successes (occurring with

probability p2), and there will be one additional success in the preceding t − 2

trials. Again, by summing on the number of failures k, we obtain

P (S2) = p2∞∑k=0

(pq)k + qp2∞∑k=0

(pq)k

=p2

1− pq+

qp2

1− pq

=p2 + qp2

1− pq

=p2(1 + q)

1− pq

=p(1− q)(1 + q)

1− pq

=p(1− q2)

1− pq.

By symmetry, we also obtain the analogous result.

Corollary 2.3.2. Let P (F2) be the probability of obtaining a run of 2 failures

before obtaining a run of 2 successes. Then P (F2) =q(1− p2)

1− pq.

Now, we shall derive a formula for E[T2,2|S2].

12

Theorem 2.3.3. Let S2 be the event of obtaining a run of 2 successes before

obtaining a run of 2 failures. Then E[T2,2|S2] =2 + 3q − pq2

(1 + q)(1− pq).

Proof. We begin with two sums, corresponding to an even number of trials and

an odd number of trials, respectively. We multiply by the number of trials in

order to calculate the conditional expectation, and we divide by the probability

of the given event P (S2):

E[T2,2|S2] =1− pqp(1− q2)

(p2

∞∑k=0

(pq)k(2k + 2) + qp2∞∑k=0

(pq)k(2k + 3)

)

=1− pqp2(1 + q)

(2p2

∞∑k=0

(pq)kk + 2p2∞∑k=0

(pq)k + 2qp2∞∑k=0

(pq)kk + 3qp2∞∑k=0

(pq)k

)

=1− pq1 + q

(2pq

(1− pq)2+

2

1− pq+

2q2p

(1− pq)2+

3q

1− pq

)=

1− pq1 + q

(2pq + 2(1− pq) + 2q2p+ 3q(1− pq)

(1− pq)2

)=

1− pq1 + q

(2 + 3q − q2p

(1− pq)2

)=

2 + 3q − pq2

(1 + q)(1− pq).

Again, we obtain the analogous result by symmetry.

Corollary 2.3.4. Let F2 be the event of obtaining a run of 2 failures before

obtaining a run of 2 successes. Then E[T2,2|F2] =2 + 3p− qp2

(1 + p)(1− pq).

2.4 - Numerical Examples

We now present an example utilizing the formulas derived in the previous

sections.

Example 2.4.1. Let p = 0.4 be the probability of success on each trial.

13

By Theorem 2.2.2, the average number of trials needed to obtain a run of

two successes is

E[T2] =1 + p

p2=

1 + 0.4

0.42= 8.75.

By Theorem 2.1.3, the average number of trials needed to obtain a run of two

successes or a run of two failures is

E[T2,2] =2 + pq

1− pq=

2 + (0.4)(0.6)

1− (.4)(.6)=

56

19≈ 2.9474.

By Theorem 2.3.1, the probability of obtaining a run of two successes before a

run of two failures is

P (S2) =p(1− q2)

1− pq=

(0.4)(1− 0.62)

1− (.4)(.6)≈ 0.3368,

and, similarly, by Corollary 2.3.2, the probability of obtaining a run of two failures

before a run of two successes is

P (F2) =q(1− p2)

1− pq=

(0.6)(1− 0.42)

1− (.4)(.6)≈ 0.6632.

Finally, by Theorem 2.3.3, the average number of trials needed to obtain a run of

two successes, given that we obtain such a run before a run of two failures, is

E[T2,2|S2] =2 + 3q − pq2

(1 + q)(1− pq)

=2 + 3(.6)− (.4)(.6)2

(1 + .6)(1− (.4)(.6))

≈ 3.0066.

14

Likewise, by Corollary 2.3.4, the average number of trials needed to obtain a run

of two failures, given that we obtain such a run before a run of two successes, is

E[T2,2|F2] =2 + 3p− qp2

(1 + p)(1− pq)

=2 + 3(.4)− (.6)(.4)2

(1 + .4)(1− (.4)(.6))

≈ 2.9173.

The previous two results demonstrate that T2,2 depends on which run occurs first.

The next example will treat the repeated Bernoulli trials as corresponding

to individual steps in a one-dimensional random walk. A success corresponds to

an upward step, and a failure corresponds to a downward step. More formally, if

upward steps are a units each, and downward steps are b units each, then we can

define the height Ht as

Ht =t∑

i=1

(a1{Xi=1} − b1{Xi=0})

where 1{Xt=1} is an indicator variable that returns 1 for a success and 0 for a

failure, and 1{Xt=0} is an indicator variable that returns 0 for a success and 1 for

a failure. Then Ht represents the height at time t (i.e., after t trials).

A very useful result for studying random walks is Wald’s Identity. Next,

we will present a proof of Wald’s Identity from Harper and Ross [3].

Theorem 2.4.2. Wald’s Identity: Suppose that Ht is the height of a random walk

defined by Ht =∑t

i=1(a1{Xi=1}− b1{Xi=0}). If Xt are i.i.d., T is a stopping time,

and E[X1] and E[T ] are both finite, then E[HT ] = E[X1] · E[T ].

15

Proof. As Xt and the indicator 1{T≥n} are independent, we have

E[HT ] =∞∑n=1

E[Xn · 1{T≥n}]

=∞∑n=1

E[Xn] · E[1{T≥n}]

=∞∑n=1

E[X1] · P (T ≥ n)

= E[X1]∞∑n=1

n · P (T = n)

= E[X1] · E[T ].

We will use Wald’s Identity in the next example.

Example 2.4.3. Let p = 0.4 be the probability of success on each trial, and let

each upward step be a units and each downward step be b units. We would like

to compute E[HT2 ] and E[HT2,2 ]. First, we observe that

E[X1] = ap− bq.

Then, by applying Wald’s Identity, we find

E[HT2 ] = E[X1] · E[T2]

= (ap− bq)(

1 + p

p2

)= (0.4a− 0.6b)(8.75)

= 3.5a− 5.25b,

16

and

E[HT2,2 ] = E[X1] · E[T2,2]

= (ap− bq)(

2 + pq

1− pq

)= (0.4a− 0.6b)

(56

19

)=

112a

95− 168b

95.

17

CHAPTER 3

MARKOV CHAINS AND RECURSIVE SEQUENCES

3.1 - Markov Chain Solutions

The formula for the probability mass function f2(t) derived with the direct

counting approach, (2.2), is difficult to use directly. For this reason, we attempt

to develop a simpler formula by using a Markov chain. In this situation, we have

three different states, corresponding to having a current run of either 0, 1, or 2

successes. Thus we have a 1 × 3 initial state matrix B and a 3 × 3 transition

matrix A:

B =

(1 0 0

)

A =

1− p p 0

1− p 0 p

0 0 1

.

The entries in B correspond to the initial probabilities; since we always

begin with a run of 0 successes, there is a 1 in the first entry and 0 elsewhere.

Each entry in A corresponds to the probability of moving from its row state to its

column state. Note that the third row represents the absorbing state, since once

we obtain a run of 2 successes, we will remain in that state indefinitely. Then

the 1 × 3 matrix B × At represents the probabilities of being in each state after

t trials. Because computing At is quite difficult, we seek a diagonalization of A

using standard techniques (see [7]).

The eigenvalues of A can be found by solving its characteristic equation:

(1− p− λ)(−λ)(1− λ)− (1− p)(p)(1− λ) = 0

18

(1− λ)[λ2 − (1− p)λ− p(1− p)] = 0

Solving for λ, we obtain:

λ = 1 or λ =(1− p)±

√(1− p)(1 + 3p)

2.

Then the three eigenvalues of A are

λ1 = 1, λ2 =(1− p) +

√(1− p)(1 + 3p)

2, and λ3 =

(1− p)−√

(1− p)(1 + 3p)

2.

We can also find the corresponding matrix of eigenvectors P , which is

P =

1

(1−p)+√

(1−p)(1+3p)

2(1−p)

(1−p)−√

(1−p)(1+3p)

2(1−p)

1 1 1

1 0 0

,

where the ith column of P represents the eigenvector corresponding to λi. Note

that P is invertible, and

P−1 =1− p√

(1− p)(1 + 3p)

0 0

√(1−p)(1+3p)

1−p

1−(1−p)+

√(1−p)(1+3p)

2(1−p)

−(1−p)−√

(1−p)(1+3p)

2(1−p)

−1 (1−p)+√

(1−p)(1+3p)

2(1−p)

(1−p)−√

(1−p)(1+3p)

2(1−p)

.

Then we have the diagonalization A = PDP−1, where D is simply the diagonal

matrix containing the eigenvalues on the main diagonal:

D =

1 0 0

0(1−p)+

√(1−p)(1+3p)

20

0 0(1−p)−

√(1−p)(1+3p)

2

.

19

This allows us to easily calculate At = PDtP−1.

Note that the matrix

B ×At =

(P (0) P (1) P (2)

)

represents the final probabilities of being in each state after t trials. In particular,

P (2) is the probability of obtaining a run of 2 successes in t or fewer trials,

which is the cumulative mass function F2(t). Thus, when we perform the matrix

multiplication At = PDtP−1, we can focus on this entry of the final state matrix,

which is as follows:

P (2) =1− p√

(1− p)(1 + 3p)

[√(1− p)(1 + 3p)

1− p

−

((1− p) +

√(1− p)(1 + 3p)

2(1− p)

)2((1− p) +

√(1− p)(1 + 3p)

2

)t

+

((1− p)−

√(1− p)(1 + 3p)

2(1− p)

)2((1− p)−

√(1− p)(1 + 3p)

2

)t .

The above result simplifies to yield

F2(t) = 1−

[(1− p) +

√(1− p)(1 + 3p)

]t+2

−[(1− p)−

√(1− p)(1 + 3p)

]t+2

2t+2(1− p)√

(1− p)(1 + 3p),

(3.1)

from which we can derive f2(t) = F2(t)− F2(t− 1).

A remarkable simplification is possible in the case where p = 1/2; in this

case, we obtain

F2(t) = 1−

[(1− 1

2) +

√(1− 1

2)(1 + 3

2)]t+2

−[(1− 1

2)−

√(1− 1

2)(1 + 3

2)]t+2

2t+2(1− 12)√

(1− 12)(1 + 3

2)

20

= 1−

(12

+√

54

)t+2

−(

12−√

54

)t+2

2t+2(12

)√54

= 1−

(1+√5

2

)t+2

−(

1−√5

2

)t+2

2t√

5.

We note that1 +√

5

2= ϕ, where ϕ is the golden ratio, and

1−√

5

2= 1 − ϕ.

This allows us to apply Binet’s Formula, which states Fn =ϕn − (1− ϕ)n

ϕ− (1− ϕ), where

Fn is the nth Fibonacci number. Thus we obtain the following theorem.

Theorem 3.1.1. The cumulative distribution function F2(t) for the probability of

obtaining a run of 2 successes within t trials is given by F2(t) = 1− Ft+2

2t, where

Ft is the tth Fibonacci number.

Using Theorem 3.1.1, we can also obtain the pdf f2(t):

f2(t) = F2(t)− F2(t− 1)

= 1− Ft+2

2t−(

1− Ft+1

2t−1

)=

2Ft+1 − Ft+2

2t

=2Ft+1 − (Ft+1 + Ft)

2t

=Ft+1 − Ft

2t.

We thus obtain the following result.

Theorem 3.1.2. The probability mass function f2(t) for the probability of ob-

taining a run of 2 successes ending on the tth trial is given by f2(t) =Ft−1

2t, where

Ft is the tth Fibonacci number.

We can use the probability distribution function in Theorem 3.1.2 to pro-

vide an alternate proof that a run of 2 successes must have finite expectation in

21

the case where p = 1/2.

Theorem 3.1.3. Suppose p = 1/2. Then T2 is finite almost surely, and E[T2] is

finite.

Proof. To prove the first part of the theorem, we will show that the following sum

is equal to 1:

∞∑t=2

f2(t) =∞∑t=2

Ft−1

2t

=∞∑t=2

ϕt−1 − (1− ϕ)t−1

2t[ϕ− (1− ϕ)]

=1

2[ϕ− (1− ϕ)]

∞∑t=2

(ϕt−1

2t−1 −(1− ϕ)t−1

2t−1

)

=1

2[ϕ− (1− ϕ)]

∞∑t=1

[(ϕ2

)t−(

(1− ϕ)

2

)t]

=1

4ϕ− 2

[∞∑t=1

(ϕ2

)t−∞∑t=1

((1− ϕ)

2

)t]

=1

4ϕ− 2

(1

1− ϕ2

− 1− 1

1− 1−ϕ2

+ 1

)

=1

4ϕ− 2

(1

2−ϕ2

− 11+ϕ2

)

=1

4ϕ− 2

(2

2− ϕ− 2

1 + ϕ

)=

1

4ϕ− 2

(2 + 2ϕ− 4 + 2ϕ

(2− ϕ)(1 + ϕ)

)=

1

2 + ϕ− ϕ2

=1

2 + 1+√5

2− (1+

√5

2)2

=1

8+2+2√5−1−2

√5−5

4

= 1.

22

For the second part of the theorem, we will use the ratio test on the series

E[T2] =∞∑t=2

Ft−1

2tt,

obtaining the following limit:

limt→∞

(Ft

Ft−1

)(2t

2t+1

)(t+ 1

t

)= lim

t→∞

(Ft

Ft−1

)(1

2

)=ϕ

2< 1.

Hence E[T2] is finite.

3.2 - A Larger Markov Chain Example

We may apply the same Markov chain technique for longer runs where

n > 2. In these cases, however, it will not be as straightforward to solve the

characteristic equation and obtain a diagonalization of the transition matrix. We

can, however, obtain numerical approximations for given values of p and t.

Example 3.2.1. Let p = 0.637. What is the probability of obtaining a run of

4 successes within 20 trials? Within 50 trials? What is the probability of first

obtaining a run of 4 successes on the 10th trial?

To set up the Markov chain, we note that our initial state matrix will be

a 1 × 5 matrix (allowing for runs of length 0, 1, 2, 3, and 4), and the transition

matrix will be a 5 × 5 matrix. As before, the initial state matrix will contain a

single 1 followed by all zeros. All but the last row of the transition matrix will

contain 1 − p = 0.363 in the first entry, p = 0.637 on row i, column i + 1, and

zeros elsewhere. The last row, representing the absorbing state, will contain all

zeros followed by a single 1 in the last entry. Thus,

B =

(1 0 0 0 0

)23

and

A =

0.363 0.637 0 0 0

0.363 0 0.637 0 0

0.363 0 0 0.637 0

0.363 0 0 0 0.637

0 0 0 0 1

.

Then, to find the probability of obtaining a run of 4 successes within 20

trials, we simply need to compute

B ×A20 =

(P (0) P (1) P (2) P (3) P (4)

),

which yields the following (the values do not sum to 1 due to rounding):

B ×A20 =

(0.0812 0.0566 0.0394 0.0274 0.7953

).

Therefore, from the last entry in the matrix, there is about a 79.53% chance of

obtaining a run of 4 successes within 20 trials. Similarly, we can compute

B ×A50 =

(0.0056 0.0039 0.0027 0.0019 0.9860

)

to find that there is about a 98.60% chance of obtaining a run of 4 successes

within 50 trials. Essentially, we are computing specific values of the cumulative

distribution function. Thus, in order to answer the last question, we need to

compute two of these cdf values and subtract them.

f4(10) = F4(10)− F4(9)

= (B ×A10)5 − (B ×A9)5

24

≈ 0.5000− 0.4536 = .0464.

Thus, there is about a 4.64% chance of first completing a run of 4 successes on

the tenth trial.

3.3 - Pseudo-Fibonacci sequences

We can derive Binet’s Formula for the nth Fibonacci number by using a

generating function. We let G(x) = F0 +F1x+F2x2 +F3x

3 + ... be the generating

function, with coefficient Fn representing the nth Fibonacci number. We also have

the initial conditions F0 = 0, F1 = 1, and Fn = Fn−1 + Fn−2. Then we have

G(x) = F0 + F1x+ F2x2 + F3x

3 + ...

xG(x) = F0x+ F1x2 + F2x

3 + F3x4 + ...

x2G(x) = F0x2 + F1x

3 + F2x4 + F3x

5 + ... .

Subtracting the last two lines from the first line gives

(1− x− x2)G(x) = F0 + (F1 − F0)x, or

G(x) =x

1− x− x2.

Thus we have solved for the generating function G(x). Note that the denominator

can be factored (using1 +√

5

2= ϕ and

1−√

5

2= 1− ϕ); hence,

G(x) =x

(1− ϕx)(1− (1− ϕ)x).

25

The right side of the equation can be rewritten using the partial fraction expansion

G(x) =A

1− ϕx+

B

1− (1− ϕ)x.

Through some straightforward algebra, we obtain A =1√5

and B = − 1√5

. Note

that√

5 = ϕ− (1− ϕ). Thus we have

G(x) =1

ϕ− (1− ϕ)

(1

1− ϕx+

1

1− (1− ϕ)x

).

Note that we can rewrite the above using power series as follows:

G(x) =1

ϕ− (1− ϕ)

(∞∑n=0

(ϕx)n −∞∑n=0

[(1− ϕ)x]n

).

Then the coefficient Fn for the xn term of G(x) isϕn − (1− ϕ)n

ϕ− (1− ϕ), which is exactly

Binet’s Formula.

We now consider the more general formula for F2(t) from (3.1):

F2(t) = 1−

[(1− p) +

√(1− p)(1 + 3p)

]t+2

−[(1− p)−

√(1− p)(1 + 3p)

]t+2

2t+2(1− p)√

(1− p)(1 + 3p).

We let α = (1− p) +√

(1− p)(1 + 3p) and β = (1− p)−√

(1− p)(1 + 3p), and

note that α− β = 2√

(1− p)(1 + 3p), which is present in the denominator of

[(1− p) +

√(1− p)(1 + 3p)

]n−[(1− p)−

√(1− p)(1 + 3p)

]n2√

(1− p)(1 + 3p).

Then for n = 0 we obtain 0, and for n = 1 we obtain 1, matching the initial values

of the Fibonacci sequence. However, we will need a different recursive relation

than the one for the Fibonacci sequence.

26

For the Fibonacci generating function, we used the polynomial 1− x− x2

in order to eliminate all but a finite number of terms to give

(1− x− x2)G(x) = F0 + (F1 − F0)x.

Now, the Fibonacci recursive relation may be written as Fn − Fn−1 − Fn−2 =

0, which reveals the key insight. The coefficients of the polynomial and the

recursion match (constant with Fn, x with Fn−1, and x2 with Fn−2). Thus, we

seek a polynomial that factors into (1 − αx)(1 − βx). This turns out to be

4p(1− p)− 2(1− p)x− x2. Then we can define a new recursive sequence, Pn, of

“pseudo-Fibonacci” numbers with conditions

P0 = 0, P1 = 1

Pn = 2(1− p)Pn−1 + 4p(1− p)Pn−2.

Then by following the same argument as in the derivation of Binet’s Formula, we

obtain

Pn =

[(1− p) +

√(1− p)(1 + 3p)

]n−[(1− p)−

√(1− p)(1 + 3p)

]n2√

(1− p)(1 + 3p). (3.2)

For p = 1/2, (3.2) gives Binet’s Formula for Fn. Thus we have defined recursive

sequences {Pn} corresponding to probability values, which we can use to rewrite

F2(t) = P (T2 ≤ t), as follows:

F2(t) = 1− Pt+2

2t+1(1− p)

= 1− 2(1− p)Pt+1 + 4p(1− p)Pt

2t+1(1− p)

27

= 1− Pt+1 + 2pPt

2t.

It may be of interest to examine the sequences {Pn}. Table 3.1 presents

the beginning of the sequences for selected values of p.

Table 3.1: Selected Pseudo-Fibonacci Sequences

p First 7 terms of {Pn}

0 0, 1, 2, 4, 8, 16, 32, ...

0.2 0, 1, 1.6, 3.2, 6.144, 11.8784, 22.9376, ...

0.3 0, 1, 1.4, 2.8, 5.096, 9.4864, 17.5616, ...

0.5 0, 1, 1, 2, 3, 5, 8, ...

0.75 0, 1, 0.5, 1, 0.875, 1.1875, 1.25, ...

ϕ/2 0, 1, 0.3820, 0.7639, 0.5279, 0.6738, 0.5836, ...

0.9 0, 1, 0.2, 0.4, 0.152, 0.1744, 0.0896, ...

1 0, 1, 0, 0, 0, 0, 0, ...

Note that some sequences are strictly increasing, while others oscillate.

Some properties of the family {Pn} may be observed. For instance, the

only values which yield sequences of only integers are p = 0, p = 1/2, and p = 1.

Using p = 0 gives powers of 2, and of course p = 1/2 returns the familiar Fibonacci

sequence.

Examining the end behavior of the sequences also provides some interesting

properties. For 0 ≤ p ≤ 1/2, {Pn} is a monotone increasing sequence. For 1/2 <

p < ϕ/2, {Pn} oscillates a finite number of times before eventually increasing to

infinity. For p = ϕ/2, {Pn} oscillates infinitely often, but the oscillations become

smaller and smaller, and the sequence converges to the limit ϕ− 1. For p > ϕ/2,

{Pn} eventually becomes decreasing and converges to 0.

28

Variations on the Fibonacci sequence are not a new concept. For instance,

the generalized Fibonacci sequence modifies the starting values while preserv-

ing the coefficients in the recursive relation. In contrast, the family of pseudo-

Fibonacci sequences defined here modified the coefficients in the recursive relation

while preserving the starting values of 0 and 1. More results and applications re-

lated to generalized Fibonacci numbers can be found in Koshy [5].

29

CHAPTER 4

EXPECTATION FOR OBTAINING A RUN OF LENGTH N

Again, we define the stopping time Tn to be the number of trials needed

to first observe n consecutive successes, and we define the stopping time Tn,m to

be the number of trials needed to first observe either n consecutive successes or

m consecutive failures. Because the counting arguments presented in Chapter

2 would seemingly be quite complicated for n,m > 2, we seek a more general

method to calculate E[Tn] and E[Tn,m].

4.1 - Deriving E[Tn] Recursively

Before we begin our procedure to derive a formula for E[Tn], we must first

show that it is finite for any non-zero choice of p and any choice of n.

Theorem 4.1.1. For 0 < p ≤ 1 and fixed n, E[Tn] is finite.

Proof. We have already proven the result for the trivial case of p = 1, so suppose

0 < p < 1. Consider the trials in blocks of n; i.e., trials X1 through Xn, trials Xn+1

through X2n, and so on. Each block contains the run we desire with probability

pn. The blocks correspond to a geometric distribution with parameter pn, so the

expected number of blocks required to observe a run is 1/pn, which corresponds

to n/pn trials. This expectation, which is finite, in fact provides an upper bound

for E[Tn], since the first run might occur across two different blocks. Thus we

can safely conclude that E[Tn] must be finite for any choice of n.

As an aside, note that the case n = 1 is equivalent to the geometric

distribution with parameter p, which has expected value 1/p. We have already

derived a formula for n = 2 in Theorem 2.2.2, but we would like a general formula.

We now develop such a formula by recursively calculating the weighted average.

30

Theorem 4.1.2. Let Tn be the number of trials required to obtain a run of n

consecutive successes. Then for 0 < p < 1, E[Tn] =1− pn

pn(1− p).

Proof. Suppose we obtain a run of n successes immediately. This situation occurs

with probability pn and requires n trials. If instead we do not obtain a run of

n successes, then we must have first obtained a run of k successes followed by

a failure, for some k ∈ {0, 1, ..., n − 1}. This situation occurs with probability

pk(1− p). We have already observed k + 1 trials, and since the failure resets the

run, we will require E[Tn] additional trials, for a total of k+1+E[Tn] trials. Thus

we obtain the following:

E[Tn] = pnn+n−1∑k=0

pk(1− p)(k + 1 + E[Tn])

= pnn+n−1∑k=0

pk(1− p)E[Tn] +n−1∑k=0

pk(1− p)(k + 1)

= pnn+ (1− pn)E[Tn] +n−1∑k=0

pk(k + 1)−n−1∑k=0

pk+1(k + 1).

Because E[Tn] is finite, we can subtract it from the right-hand side. Solving for

E[Tn], we obtain

pnE[Tn] = pnn+n−1∑k=0

pk(k + 1)−n∑

k=1

pkk

= pnn+ 1 +n−1∑k=1

pk(k + 1)−n−1∑k=1

pkk − pnn

= 1 +n−1∑k=1

pk

=n−1∑k=0

pk

=1− pn

1− p,

31

which gives the result.

Note that if we use n = 1 in Theorem 4.1.2, we obtain the expectation

E[T1] = 1p, which is consistent with the geometric distribution. Furthermore, for

n = 2, we obtain E[T1] =1− p2

p2(1− p)=

1 + p

p2, which agrees with Theorem 2.2.2.

Thus we have solved for E[Tn] in all cases:

E[Tn] =

∞, p = 0

1− pn

pn(1− p), 0 < p < 1

n, p = 1.

(4.1)

We now turn our attention to E[Tn,m].

4.2 - Deriving E[Tn,m] Recursively

For 0 < p < 1, it is not difficult to see that E[Tn,m] must be finite by using

Theorem 4.1.1, as it is clear that E[Tn,m,] ≤ E[Tn].

Corollary 4.2.1. For 0 ≤ p ≤ 1 and fixed n and m, E[Tn,m] is finite.

Since Corollary 4.2.1 guarantees E[Tn,m] must be finite for any parameter

p, we now seek its derivation. The resulting formula is given as the following

theorem.

Theorem 4.2.2. Let Tn,m be the number of trials required to obtain either a run

of n successes or a run of m failures. Then E[Tn,m] is given by

E[Tn,m] =

m, p = 0

(1− pn)(1− qm)

pnq + pqm − pnqm, 0 < p < 1

n, p = 1.

32

Proof. In order to facilitate the creation of formulas in this proof, we define two

additional variables. We let A1 be the expected number of additional trials needed

when starting with a single success, and we let A2 be the expected number of

additional trials needed when starting with a single failure. Both A1 and A2 must

be finite, as both values are less than or equal to Tn,m. Now, consider the first

trial, which results in success with probability p and failure with probability q.

We may then write

E[Tn,m] = 1 + pA1 + qA2. (4.2)

Suppose we have a current run of one success. If we obtain successes in

the next n−1 trials, we are finished. This situation occurs with probability pn−1.

If instead we do not obtain a run of n − 1 successes, then we must have first

obtained a run of k successes followed by a failure, for some k ∈ {0, 1, ..., n− 2}.

This situation occurs with probability pkq. We have then observed k + 1 trials,

and since we now have a current run of one failure, we will require A2 additional

trials, for a total of k + 1 + A2 trials. Thus we obtain the following:

A1 = pn−1(n− 1) +n−2∑k=0

pkq(k + 1 + A2)

= pn−1(n− 1) +n−2∑k=0

pk(1− p)A2 +n−2∑k=0

pk(1− p)(k + 1)

= pn−1(n− 1) + (1− pn−1)A2 +n−2∑k=0

pk(k + 1)−n−2∑k=0

pk+1(k + 1).

We proceed by moving the A2 term to the left side:

A1 − A2(1− pn−1) = pn−1(n− 1) +n−2∑k=0

pk(k + 1)−n−1∑k=1

pkk

= pn−1(n− 1) + 1 +n−2∑k=1

pk(k + 1)−n−2∑k=1

pkk − pn−1(n− 1)

33

= 1 +n−2∑k=1

pk

=n−2∑k=0

pk.

The result is as follows:

A1 − A2(1− pn−1) =1− pn−1

1− p. (4.3)

If we apply the same steps with success and failure reversed, we also obtain

A2 − A1(1− qm−1) =1− qm−1

1− q. (4.4)

Thus, (4.3) and (4.4) give a system of two equations in the two variables

A1 and A2. Solving the system gives the following:

A1 =(1− pn−1)(1− qm)

pnq + pqm − pnqm, A2 =

(1− qm−1)(1− pn)

pnq + pqm − pnqm. (4.5)

For (4.2), we need pA1 and pA2. First, we have

pA1 =p(1− pn−1)(1− qm)

pnq + pqm − pnqm

=p− pn − pqm + pnqm

pnq + pqm − pnqm

=p− pn + (pnq − pnq)− pqm + pnqm

pnq + pqm − pnqm

=p− pn + pnq

pnq + pqm − pnqm− 1

=p− pn+1

pnq + pqm − pnqm− 1.

34

Similarly, we have

pA2 =q − qm+1

pnq + pqm − pnqm− 1.

Substituting these expressions into (4.2) and simplifying, we obtain

E[Tn,m] = 1 + pA1 + qA2

=p− pn+1

pnq + pqm − pnqm+

q − qm+1

pnq + pqm − pnqm− 1

=p− pn+1 + q − qm+1 − pnq − pqm + pnqm

pnq + pqm − pnqm

=1− pn(p+ q)− qm(q + p) + pnqm

pnq + pqm − pnqm

=1− pn − qm + pnqm

pnq + pqm − pnqm

=(1− pn)(1− qm)

pnq + pqm − pnqm,

which is valid for 0 < p < 1.

It should be noted that the formula in Theorem 4.2.2 confirms the formula

for n,m = 2 derived in Chapter 2. Making the substitutions n = 2 and m = 2

and simplifying yields

E[T2,2] =(1− p2)(1− q2)p2q + pq2 − p2q2

=1− p2 − q2 + p2q2

pq(p+ q)− p2q2

=p+ q − p2 − q2 + p2q2

pq − p2q2

=p(1− p) + q(1− q) + p2q2

pq(1− pq)

=2pq + p2q2

pq(1− pq)

35

=2 + pq

1− pq,

which agrees with Theorem 2.1.3.

4.3 - Relationships Between E[Tn] and E[Tn,m]

For 0 < p < 1, we have obtained the following expectations

E[Tn] =1− pn

pn(1− p), E[Tn,m] =

(1− pn)(1− qm)

pnq + pqm − pnqm. (4.6)

Also, suppose that we wish to obtain a run of m consecutive failures, where the

probability of failure is q. We denote the number of trials required by T ∗m. Then,

by symmetry, E[T ∗m] is given by

E[T ∗m] =1− qm

qm(1− q).

We then have the following identity.

Theorem 4.3.1. For 0 < p < 1,1

E[Tn]+

1

E[T ∗m]=

1

E[Tn,m].

Proof. Simplifying the left side of the identity gives

1

E[Tn]+

1

E[T ∗m]=pn(1− p)

1− pn+qm(1− q)

1− qm

=pnq(1− qm) + qmp(1− pn)

(1− pn)(1− qm)

=pnq − pnqm+1 + qmp− qmpn+1

(1− pn)(1− qm)

=pnq + pqm − pnqm

(1− pn)(1− qm)

=1

E[Tn,m].

36

In a sense, we can think of E[Tn] and E[T ∗m] as two “components” of

E[Tn,m]. Theorem 4.3.1 then says that the reciprocal of E[Tn,m] is equal to the

sum of the reciprocals of its two components.

In the special case where p = q = 1/2, we can relate E[Tn] and E[Tn,n].

Theorem 4.3.2. For p = q = 1/2, E[Tn] = 2E[Tn,n].

Proof. For p = 1/2, we have the following:

E[Tn] =1−

(12

)n(12

)n(1− 1

2)

=1−

(12

)n(12

)n+1

= 2

(1−

(12

)n(12

)n)

= 2(2n − 1);

E[Tn,n] =

[1−

(12

)n] [1−

(12

)n](12

)n+1+(12

)n+1 −(12

)2n=

[1−

(12

)n]2(12

)n − (12

)2n=

[1−

(12

)n]2(12

)n [1−

(12

)n]=

1−(12

)n(12

)n= 2n − 1

=E[Tn]

2.

37

4.4 - Conditional Expectations

Suppose we are now interested in the event that we obtain a run of n

successes before we obtain a run of m failures, which we denote Rnbm. We shall

derive P (Rnbm) by considering cases of how smaller runs may occur before a run

of n successes.

Theorem 4.4.1. Let Rnbm be the event that we obtain a run of n successes before

a run of m failures. Then P (Rnbm) =pn−1(1− qm)

pn−1 + qm−1 − pn−1qm−1.

Analogously, if RCnbm is the event that we obtain a run of m failures before a run

of n successes, then P (RCnbm) =

qm−1(1− pn)

pn−1 + qm−1 − pn−1qm−1

Proof. For Rnbm, we have two cases. In Case I, we start with a success, which

occurs with probability p. We might immediately obtain all remaining n − 1

successes, or our potential run of successes might be interrupted. An interruption

will have two parts: some number of successes followed by a failure (part 1),

then some number of failures followed by a success (part 2). Note that in part

1, we must obtain a failure within n − 1 trials, and in part 2, we must obtain a

success within m − 1 trials (or else we will obtain a run of m failures, which is

not allowed). After the interruption, we have one success, which returns to the

original Case I scenario.

To summarize, we have one success, followed by some number of interrup-

tions (possibly 0), followed by the remaining n − 1 successes. Also note that we

can use the geometric distribution for each interruption; i.e., the probability of

obtaining a failure within n−1 trials is 1−pn−1, and the probability of obtaining

a success within m − 1 trials is 1 − qm−1. Thus, by summing on the number of

38

possible interruptions k, the probability of Case I is

P1 = pn∞∑k=0

[(1− pn−1)(1− qm−1)]k.

In Case II, we start with a failure. We must then obtain some number

of failures followed by a success, which returns us to the beginning of case I. We

then obtain some number of interruptions followed by the final n − 1 successes.

In this case, we will have the probability of the failure, the probability of the

n− 1 successes, and a single geometric probability outside the summation for the

interruptions. The probability in this case is

P2 = qpn−1(1− qm−1)∞∑k=0

[(1− pn−1)(1− qm−1)]k.

The total probability, P (Rnbm), is simply the sum of P1 and P2:

P (Rnbm) = P1 + P2

= pn∞∑k=0

[(1− pn−1)(1− qm−1)]k + qpn−1(1− qm−1)∞∑k=0

[(1− pn−1)(1− qm−1)]k

= [pn + qpn−1(1− qm−1)]∞∑k=0

[(1− pn−1)(1− qm−1)]k

=pn + qpn−1(1− qm−1)

1− (1− pn−1)(1− qm−1)

=pn−1(p+ q(1− qm−1))pn−1 + qm−1 − pn−1qm−1

=pn−1(1− qm)

pn−1 + qm−1 − pn−1qm−1.

We can define RCnbm analogously, and we can find P (RC

nbm) similarly. Thus,

39

we have

P (Rnbm) =pn−1(1− qm)

pn−1 + qm−1 − pn−1qm−1,

and

P (RCnbm) =

qm−1(1− pn)

pn−1 + qm−1 − pn−1qm−1.

The derivation of the conditional probabilities leads to the following result.

Theorem 4.4.2. Let Tn,m|Rnbm be the number of trials required to obtain a run

of n successes given that we obtain such a run before a run of m failures. Then

E[Tn,m|Rnbm] = n+ (1−pn)(q−qm(m+q−mq))+(1−qm)(1−qm−1)(p−pn(n+p−np))(1−qm)(pnq+pqm−pnqm)

.

Also, if Tn,m|RCnbm is the number of trials required to obtain a run of m failures

given that we obtain such a run before a run of n successes, then E[Tn,m|RCnbm] =

m+ (1−qm)(p−pn(n+p−np))+(1−pn)(1−pn−1)(q−qm(m+q−mq))(1−pn)(pnq+pqm−pnqm)

.

Proof. As before, we consider the average in each of the two cases. To facilitate

this, let us define conditional averages for the geometrically distributed interrup-

tions. For X ∼ Geometric(p), we define e1 = E[X|X ≤ m− 1], and likewise, for

Y ∼ Geometric(q), we define e2 = E[Y |Y ≤ n− 1]. Then for Case I we have

A1 = pn∞∑k=0

[n+ k(e1 + e2)][(1− pn−1)(1− qm−1)]k, (4.7)

40

and for Case II we have

A2 = qpn−1(1− qm−1)∞∑k=0

[n+ e1 + k(e1 + e2)][(1− pn−1)(1− qm−1)]k. (4.8)

We then obtain

E[Tn,m|Rnbm] =A1 + A2

P (Rnbm). (4.9)

We will now derive e1 and e2.

Lemma 4.4.3. Suppose X ∼ Geometric(p). Then

E[X|X ≤ m− 1] = 1−qm−1(m+q−mq)p(1−qm−1)

.

Proof. We have E[X|X ≤ m−1] =

∑m−1k=1 kP (k)

P (X ≤ m− 1), and sinceX ∼ Geometric(p),

P (k) = pqk−1 and P (X ≤ m− 1) = 1− qm−1. Thus we obtain

E[X|X ≤ m− 1] =

∑m−1k=1 kpq

k−1

1− qm−1

=p[1− qm−1(m+ q −mq)]

p2(1− qm−1)

=1− qm−1(m+ q −mq)

p(1− qm−1)

Then, by Lemma 4.4.3, we have e1 =1− qm−1(m+ q −mq)

p(1− qm−1)and e2 =

1− pn−1(n+ p− np)q(1− pn−1)

. Now consider A1 from (4.7):

A1 = pn∞∑k=0

[n+ k(e1 + e2)][(1− pn−1)(1− qm−1)]k

= pn

[∞∑k=0

n[(1− pn−1)(1− qm−1)]k

+∞∑k=0

k

(1− qm−1(m+ q −mq)

p(1− qm−1)

)[(1− pn−1)(1− qm−1)]k

41

+∞∑k=0

k

(1− pn−1(n+ p− np)

q(1− pn−1)

)[(1− pn−1)(1− qm−1)]k

]

= npn∞∑k=0

[(1− pn−1)(1− qm−1)]k

+

[pn−1(1− pn−1)(1− qm−1(m+ q −mq))

+pn

q(1− qm−1)(1− pn−1(n+ p− np))

] ∞∑k=0

k[(1− pn−1)(1− qm−1)]k−1

=npn

1− (1− pn−1)(1− qm−1)+pn−1q(1− pn−1)(1− qm−1(m+ q −mq))

q[1− (1− pn−1)(1− qm−1)]2

+pn(1− qm−1(1− pn−1(n+ p− np))

q[1− (1− pn−1)(1− qm−1)]2

=

(1

q[pn−1 + qm−1 − pn−1qm−1]2

)(npnq(pn−1 + qm−1 − pn−1qm−1)

+pn−1q(1− pn−1)(1− qm−1(m+ q −mq))

+pn(1− qm−1)(1− pn−1(n+ p− np))).

Consider A2 from (4.8).

A2 = qpn−1(1− qm−1)∞∑k=0

[n+ e1 + k(e1 + e2)][(1− pn−1)(1− qm−1)]k

= qpn−1(1− qm−1)

[∞∑k=0

n[(1− pn−1)(1− qm−1)]k

+∞∑k=0

(k + 1)

(1− qm−1(m+ q −mq)

p(1− qm−1)

)[(1− pn−1)(1− qm−1)]k

+∞∑k=0

k

(1− pn−1(n+ p− np)

q(1− pn−1)

)[(1− pn−1)(1− qm−1)]k

]

=nqpn−1(1− qm−1)

1− (1− pn−1)(1− qm−1)

+ qpn−2(1− qm−1(m+ q −mq))∞∑k=0

(k + 1)[(1− pn−1)(1− qm−1)]k

+pn−1(1− qm−1)(1− pn−1(n+ p− np))

1− pn−1∞∑k=0

k[(1− pn−1)(1− qm−1)]k

42

=nqpn−1(1− qm−1) + qpn−2(1− qm−1(m+ q −mq))

pn−1 + qm−1 − pn−1qm−1

+[qpn−2(1− qm−1(m+ q −mq))](1− pn−1)(1− qm−1)

[1− (1− pn−1)(1− qm−1)]2

+[pn−1(1− qm−1)(1− pn−1(n+ p− np))](1− qm−1)

[1− (1− pn−1)(1− qm−1)]2

=

(1

(pn−1 + qm−1 − pn−1qm−1)2

)([nqpn−1(1− qm−1) + qpn−2(1− qm−1(m+ q −mq))](pn−1 + qm−1 − pn−1qm−1)

+qpn−2(1− qm−1(m+ q −mq))(1− pn−1 − qm−1 + pn−1qm−1)

+pn−1(1− qm−1)(1− pn−1(n+ p− np))(1− qm−1))

=

(1

(pn−1 + qm−1 − pn−1qm−1)2

)(nqpn−1(1− qm−1)(pn−1 + qm−1 − pn−1qm−1)

+qpn−2(1− qm−1(m+ q −mq)) + pn−1(1− qm−1)2(1− pn−1(n+ p− np))).

Then, from (4.9):

E[Tn,m|Rnbm] =A1 + A2

P (Rnbm)

=

(pn−1 + qm−1 − pn−1qm−1

pn−1q(1− qm)[pn−1 + qm−1 − pn−1qm−1]2

)(npnq(pn−1 + qm−1 − pn−1qm−1)

+ pn−1q(1− pn−1)(1− qm−1(m+ q −mq)) + pn(1− qm−1)(1− pn−1(n+ p− np))

+ nq2pn−1(1− qm−1)(pn−1 + qm−1 − pn−1qm−1)

+ q2pn−2(1− qm−1(m+ q −mq)) + qpn−1(1− qm−1)2(1− pn−1(n+ p− np)))

=

(1

pn−1q(1− qm)(pn−1 + qm−1 − pn−1qm−1)

)(npn−1q(pn−1 + qm−1 − pn−1qm−1)(p+ q(1− qm−1)

+ pn−2q(1− qm−1(m+ q −mq))(p(1− pm−1) + q)

+pn−1(1− qm−1)(1− pn−1(n+ p− np))(p+ q(1− qm−1)))

43

=

(1

pq(1− qm)(pn−1 + qm−1 − pn−1qm−1)

)(npq(pn−1 + qm−1 − pn−1qm−1)(1− qm)

+ q(1− qm−1(m+ q −mq))(1− pm) + q)

+p(1− qm−1)(1− pn−1(n+ p− np))(1− qm)))

= n+(1− qm−1(m+ q −mq))(1− pn)

p(1− qm)(pn−1 + qm−1 − pn−1qm−1)+

(1− pn−1(n+ p− np))(1− qm−1)q(pn−1 + qm−1 − pn−1qm−1)

= n+ q(1−pn)(1−qm−1(m+q−mq))+p(1−qm)(1−qm−1)(1−pn−1(n+p−np))pq(1−qm)(pn−1+qm−1−pn−1qm−1)

= n+ (1−pn)(q−qm(m+q−mq))+(1−qm)(1−qm−1)(p−pn(n+p−np))(1−qm)(pnq+pqm−pnqm)

.

We can also use Theorem 4.4.2 in conjunction with other results to obtain

the expected number of trials needed to obtain both a run of n successes and a

run of m failures.

Corollary 4.4.4. The expected number of trials needed to obtain both a run of n

successes and a run of m failures is E[Tn&m] = (E[Tn,m|Rnbm]+E[T ∗m])P (Rnbm)+

(E[Tn,m|RCnbm] + E[Tn])P (RC

nbm).

4.5 - Applications of Conditional Runs

There are several interesting applications for the conditional probabilities

and expectations that were derived in Section 4.4. For instance, we might devise

a game in which the player wins by obtaining a run of n successes and loses by

obtaining a run of m failures. Given fixed values for n and m, what probability

p should be used in order to create a fair game? The answer to that question is

identical to asking what value of p gives P (Rnbm) = 12, or, equivalently, P (Rnbm) =

P (RCnbm).

44

Setting the two conditional probabilities equal gives:

P (Rnbm) = P (RCnbm)

pn−1(1− qm)

pn−1 + qm−1 − pn−1qm−1=

qm−1(1− pn)

pn−1 + qm−1 − pn−1qm−1

pn−1(1− qm) = qm−1(1− pn)

We replace q with 1 − p in order to create a polynomial equation in a single

variable:

pn−1[1− (1− p)m] = (1− p)m−1(1− pn) (4.10)

Clearly, if we let n = m, then the solution to (4.10) must be p =1

2. For

small values of n and m, it is possible to solve (4.10) exactly (e.g., the solution for

n = 2,m = 1 is p =1√2

), but for large values of n and m, we will only be able to

provide approximate solutions. Approximate solutions to (4.10) for several values

of n and m are presented in Table 4.1.

Certainly, we can also use previous results to calculate the average length

of such games, as well as the average length of a winning game or the average

length of a losing game.

45

Table 4.1: Probabilities for a Fair Run Game

m = 1 m = 2 m = 3 m = 4 m = 5 m = 6 m = 7 m = 8

n = 1 0.500 0.293 0.206 0.159 0.129 0.109 0.094 0.083

n = 2 0.707 0.500 0.396 0.332 0.288 0.256 0.232 0.212

n = 3 0.794 0.604 0.500 0.433 0.385 0.349 0.320 0.297

n = 4 0.841 0.668 0.567 0.500 0.451 0.413 0.383 0.358

n = 5 0.871 0.712 0.615 0.549 0.500 0.462 0.430 0.404

n = 6 0.891 0.744 0.651 0.587 0.538 0.500 0.469 0.442

n = 7 0.906 0.768 0.680 0.617 0.570 0.531 0.500 0.473

n = 8 0.917 0.788 0.703 0.642 0.596 0.558 0.527 0.500

If one player wins on a run of n successes and the other player wins on a run of

m failures, using the corresponding probability in the chart will result in a fair

game.

46

CHAPTER 5

SIMULATIONS WITH MATHEMATICA

One way to verify the results we have obtained is through simulation.

To do this, we created several short programs using the Wolfram Mathematica

programming environment [13]. Each program relies on Monte Carlo methods

in order to sample the stopping times using various inputs for the probability

and run length values. We then calculate the observed expectation based on the

sample and compare this with the theoretical expectation based on the formulas

from previous chapters. In this chapter, we discuss the general approach used to

simulate each scenario. The actual code for each program is presented in various

appendices.

The first part of each program defines the user input variables, which in-

clude the probability value(s), run length(s), and number of simulations. We then

initialize variables to keep track of the progress of each simulation. To generate

a trial, we use the RandomReal function, which generates a pseudorandom real

number between 0 and 1. This allows us to easily categorize each trial as either

a success or failure. Depending on which result we are simulating, we program

a stopping condition for the main loop. Finally, we calculate both the simulated

and theoretical values for the result we are testing.

Three simulations were created, testing Theorem 4.1.2, Theorem 4.2.2,

and Theorems 4.4.1 and 4.4.2, respectively. In Table 5.1, we present the results

of several simulations of each scenario using the input values from each program’s

code, which can be found in Appendices A, B, and C, respectively.

47

Table 5.1: Simulation Results

Theorem Theoretical Value Simulated Values

Theorem 4.1.2 9.07407 9.05043, 9.0642, 9.08414, 9.08956, 9.06104

Theorem 4.2.2 6.09683 6.10786, 6.08681, 6.08848, 6.09193, 6.0931

Theorem 4.4.1 0.548863 0.5469, 0.54925, 0.5492, 0.54824, 0.54923

Theorem 4.4.2 9.63839 9.64152, 9.70015, 9.62028, 9.62533, 9.62025

See the appendices for the input values of each simulation.

48

CHAPTER 6

GENERALIZATIONS AND FURTHER STUDY

In this chapter, we will examine several related problems, providing a

number of avenues for further study.

6.1 - Obtaining a Run with a Non-Constant Probability

Previous results relied on the assumption that the probability of success

remains constant for each trial. Suppose that we relax this requirement as follows.

As before, we let n be the length of a run of successes that we seek. Now, we

let pi be the probability of success, given that the current run of successes is of

length i − 1, for i ∈ {1, 2, ..., n}. In other words, the probability of success is p1

while seeking the first success in a row, p2 while seeking the second success in a

row, and so on. The probabilities of failures are defined as qi = 1− pi. Then Tn

is the number of trials needed to obtain a run of n consecutive successes under

these conditions.

It is not difficult to show that E[Tn] is finite so long as all pi are nonzero.

We can modify the argument used to prove Theorem 4.1.1 by using the product

of all pi instead of simply pn. Thus we have

Theorem 6.1.1. For 0 < pi ≤ 1 and fixed n, E[Tn] is finite.

The recursive procedure used to prove the expectation can be modified in

order to obtain a formula in this new scenario.

Theorem 6.1.2. Let Tn be the number of trials required to obtain a run of n

consecutive successes, given that the probability pi is the probability of obtaining

the ith success. Then

E[Tn] =1 +

∑n−1j=1

(∏ji=1 pi

)∏n

i=1 pi.

49

Proof. Suppose we obtain a run of n successes immediately. This requires n trials

and occurs with probability p1 · ... · pn. If instead we obtain a run of k successes

followed by a failure, for some k ∈ {0, 1, ..., n−1}, we have observed k+1 trials and

will require an additional E[Tn] trials. This occurs with probability p1 ·...·pk ·qk+1.

(The product p1 · ... · pk may be empty, if k = 0.) Thus we obtain the following.

E[Tn] = q1(1 + E[Tn]) + p1q2(2 + E[Tn]) + ...+ p1...pn−1qn(n+ E[Tn]) + p1...pnn.

Since E[Tn] is finite, we may move all those terms to the left-hand side.

E[Tn]−E[Tn](q1+p1q2+...+p1...pn−1qn) = q1+2p1q2+...+np1...pn−1qn+np1...pn.

(6.1)

Simplifying the left-hand side yields

E[Tn]− E[Tn](q1 + p1q2 + ...+ p1...pn−1qn)

=E[Tn]− E[Tn](1− p1 + p1(1− p2) + ...+ p1...pn−1(1− pn))

=E[Tn]− E[Tn](1− p1...pn)

=E[Tn](p1...pn),

and simplifying the right-hand side yields

q1 + 2p1q2 + ...+ np1...pn−1qn + np1...pn

=1− p1 + 2p1(1− p2) + ...+ np1...pn−1(1− pn) + np1...pn

=1 + p1 + p1p2 + ...+ p1...pn−1.

50

Substituting into (6.1) gives

E[Tn](p1...pn) = 1 + p1 + p1p2 + ...+ p1...pn−1,

so we finally reach

E[Tn] =1 + p1 + p1p2 + ...+ p1...pn−1

(p1...pn)

=1 +

∑n−1j=1

(∏ji=1 pi

)∏n

i=1 pi.

6.2 - Obtaining a Run in a Finite Set

When considering events that come from the Bernoulli distribution, it is

possible to imagine that we are sampling from an infinite set of outcomes, where

p is the proportion of outcomes that are successes. It is also possible to study

the situation in which we have a finite number of outcomes, of which a certain

proportion p are successes. If we sample from this set with replacement, then

this changes nothing. Sampling without replacement, however, leads to a new

scenario.

Suppose we have a set of a successes, b failures (with a+ b = c outcomes),

and we wish to obtain a run of n consecutive successes by sampling from the set

without replacement. An example would be a deck of playing cards, where we

might consider the 13 hearts to be successes and each of the other 39 cards to

be failures. First we note that we are no longer guaranteed to obtain a run at

all, because we might run out of cards before a run is obtained. Thus, before

51

considering the expected number of trials needed to obtain a run, we should

calculate the probability that a run actually occurs.

Calculation of this probability turns out to be reasonably straightforward,

and a formula is presented as a theorem. However, we will first present a result

from Heubach and Mansour as a lemma [4]. This involves finding the number of S-

restricted compositions of a number. A composition of an integer is analogous to a

partition in which the order of the pieces matters, and an S-restricted composition

is a composition in which the size of each piece belongs to the set S.

Lemma 6.2.1. The number of S-restricted compositions of an integer n into k

parts is the coefficient of xn of the polynomial

(∑s∈S

xs

)k

.

The number of S-restricted partitions will be used to derive the probability

of obtaining a run of n consecutive successes in the next theorem.

Theorem 6.2.2. Suppose we are drawing without replacement from a finite deck

of c = a + b cards, where a is the number of successes and b is the number of

failures. Let Dn be the event that we obtain a run of n consecutive successes

before running out of cards. Then

P (Dn) =

(c

a

)− Z(c

a

)

where Z is the coefficient of xa in (1 + x+ x2 + ...+ xn−1)b+1.

Proof. First, note that there are

(c

a

)distinct arrangements of the deck. Let

Z be the number of arrangements which do not contain a run of n consecutive

52

successes. Then

(c

a

)− Z is the number of arrangements which do contain such

a run. We claim that Z is the coefficient of xa in (1 + x+ x2 + ...+ xn−1)b+1.

Suppose that a particular arrangement of cards does not contain a run of

n consecutive successes. Then there are at most n − 1 successes before the first

failure, after the last failure, and between any two failures. In effect, we have

b + 1 “gaps” in which we must place our successes, but we must place no more

than n−1 successes in each gap. Thus Z is the number of {0, ..., n−1}-restricted

compositions of a with b+ 1 parts, and the value of Z can be found using Lemma

6.2.1.

Thus Z is the number of arrangements which do not contain a run of n

successes, and

P (Dn) =

(c

a

)− Z(c

a

) .

Although Theorem 6.2.2 is not very complicated, we do not reach a closed-

form expression for the probability P (Dn). Furthermore, calculating the expected

number of draws needed to obtain a run of n successes (given that we actually

obtain one) likely requires another technique. We thus leave the derivation of

such a formula as an avenue for future study.

6.3 - Obtaining a Run with a Categorical Distribution

The Bernoulli distribution allows for only two possible outcomes (1 for

success, 0 for failure). We might also be interested in obtaining a run of identical

outcomes when there are k possible outcomes (for instance, k = 6 for a standard

53

six-sided die). In this case, a categorical distribution would be appropriate. The

categorical distribution, which is also known as a generalized Bernoulli distri-

bution or a multinoulli distribution, assigns probabilities p1, ..., pk to each of k

possible outcomes, where the sum of all pi is equal to 1.

If we wish to obtain a run of n of a particular outcome, then this is equiv-

alent to the Bernoulli distribution. Suppose, then, that we wish to obtain a run

of ni consecutive outcome i and that it does not matter which outcome produces

the run. We can define Tn to be the number of trials required. If k = 2, then the

average number of trials, E[Tn1,n2 ], can be found using Theorem 4.2.2. If k > 2,

however, the problem becomes more difficult. Attempting to provide a recursive

argument for certain values of k is possible, though likely very tedious. At any

rate, such an approach would not solve the general problem for all values of k.

For a general solution, we return to Theorem 4.3.1, which states

1

E[Tn]+

1

E[T ∗m]=

1

E[Tn,m].

In terms of a categorical distribution with k = 2, we may write

1

E[Tn1 ]+

1

E[Tn2 ]=

1

E[Tn1,n2 ].

It seems possible that this relationship might hold for larger values of k.

Indeed, observation of simulations suggests that Theorem 4.3.1 does extend to a

categorical distribution in the natural way. Although we do not have a proof of

the result, a likely approach would use induction on k, in which the base case of

k = 2 is merely Theorem 4.2.2. Formalizing our hypothesis, we have the following.

Conjecture 6.3.1. For a categorical distribution with k outcomes, each occurring

with respective probability p1, ..., pk, and respective desired run lengths n1, ..., nk,

54

the average number of trials to obtain a desired run of any outcome is

E[Tn1,...,nk] =

1∑ki=1

1E[Tni ]

.

We have also created a simulation to test Conjecture 6.3.1, using the meth-

ods described in Chapter 5. For the code of this simulation, refer to Appendix

D.

6.4 - Obtaining a Run from Multiple Sequences

Previously, we have considered a stopping time that depends only on a

single sequence of outcomes, though there might be multiple conditions on which

we stop. For instance, we have previously examined the stopping condition in

which we want either a run of n consecutive successes or a run of m consecutive

failures.

We may also consider the scenario in which we have several independent

sequences of outcomes, each with its own stopping condition and corresponding

stopping time Si. We then define the stopping time S as the minimum of the

stopping times Si. This is different from the scenario in which we have multiple

stopping conditions on a single sequence because now we allow for the possibility

that two sequences stop after the same number of trials. With a single sequence,

however, it is impossible to obtain both a run of n successes and a run of m

failures, for instance, on the same trial.

Suppose that we have k sequences each with a stopping condition and

stopping time Si for i = 1, 2, ...k. We would like to obtain a formula for the

expectation E[S], but this is very difficult without knowing something about the

55

individual stopping conditions for each of the k sequences. However, we will

present a result about the minimum of several exponential variables as a basis to

guide future research.

Theorem 6.4.1. If Xi ∼ Exponential(λi) for i = 1, 2, ..., n, and all Xi are inde-

pendent, then min{Xi} ∼ Exponential(∑n

i=1 λi).

For an exponential distribution with parameter λ, the expectation for the

distribution is1

λ. Thus, to find the expectation for the minimum of several

exponentially distributed variables, which is itself exponentially distributed, we

must take the reciprocal of its parameter.

Though the variables for the sequences we have studied do not follow the

exponential distribution, and are in fact discrete rather than continuous, it is

possible that a similar approach could be used to find E[S]. We could then

combine previously derived expectations for each sequence (as well as possibly

Conjecture 6.3.1 about the categorical distribution) in order to describe the most

general stopping time, which depends on obtaining a run of identical outcomes in

at least one sequence of trials.

56

APPENDIX A - CODE FOR THEOREM 4.1.2

57

APPENDIX B - CODE FOR THEOREM 4.2.2

58

APPENDIX C - CODE FOR THEOREMS 4.4.1/4.4.2

59

APPENDIX D - CODE FOR CONJECTURE 6.3.1

60

REFERENCES

[1] S. Ahdout, S. Rothman, H. Strassberg, Streaks and Generalized Fibonacci

Sequences, The College Mathematics Journal 37, No. 3 (2006) 221-223.

[2] T. Gilovich, R. Vallone, A. Tversky, The Hot Hand in Basketball: On the

Misperception of Random Sequences, Cognitive Psychology 17, 295-314 (1985).

[3] J.D. Harper, K.A. Ross, Stopping Strategies and Gambler’s Ruin, Mathemat-

ics Magazine 78, No. 4 (2005) 255-268.

[4] S. Heubach, T. Mansour, Compositions of n with parts in a set, Congressus

Numerantium 164 (2004) 127-143.

[5] T. Koshy, Fibonacci and Lucas Numbers with Applications Wiley, 2001.

[6] E.F. Krause, Bernoulli Trials and Family Planning, Mathematics Magazine

73, No. 3 (2000) 212-216.

[7] R. Larson, Elementary Linear Algebra, Seventh Edition Cengage Learning,

2013.

[8] M. Rabin, D. Vayanos, The Gambler’s and Hot-Hand Fallacies: Theory and

Applications, The Review of Economic Studies 77, No. 2 (2010) 730-778.

[9] M.F. Schilling, The Longest Run of Heads, The College Mathematics Journal

21, No. 3 (1990) 196-207.

[10] M.F. Schilling, Long Run Predictions, Math Horizons 1, No. 2 (1994) 10-12.

[11] M.F. Schilling, The Surprising Predictability of Long Runs, Mathematics

Magazine 85, No. 2 (2012) 141-149.

[12] W. Stadje, The Maximum Average Gain in a Sequence of Bernoulli Games,

The American Mathematics Monthly 115, No. 10 (2008) 902-910.

[13] Wolfram Research, Inc., Mathematica, Version 10.1, Champaign, IL (2015).

61

Documents

Runs of Identical Outcomes in a Sequence of Bernoulli Trials