Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Western Kentucky UniversityTopSCHOLAR®
Masters Theses & Specialist Projects Graduate School
Spring 2018
Runs of Identical Outcomes in a Sequence ofBernoulli TrialsMatthew RiggleWestern Kentucky University, [email protected]
Follow this and additional works at: https://digitalcommons.wku.edu/theses
Part of the Applied Mathematics Commons, and the Discrete Mathematics and CombinatoricsCommons
This Thesis is brought to you for free and open access by TopSCHOLAR®. It has been accepted for inclusion in Masters Theses & Specialist Projects byan authorized administrator of TopSCHOLAR®. For more information, please contact [email protected].
Recommended CitationRiggle, Matthew, "Runs of Identical Outcomes in a Sequence of Bernoulli Trials" (2018). Masters Theses & Specialist Projects. Paper2451.https://digitalcommons.wku.edu/theses/2451
RUNS OF IDENTICAL OUTCOMES IN A SEQUENCE OF BERNOULLITRIALS
A ThesisPresented to
The Faculty of the Department of MathematicsWestern Kentucky University
Bowling Green, Kentucky
In Partial FulfillmentOf the Requirements for the Degree
Master of Science
ByMatthew Riggle
May 2018
Dedicated to Mom and Dad and Michael
ACKNOWLEDGMENTS
There are many individuals who contributed to the success of this project.
First and foremost, I would like to thank my advisor, Dr. David Neal. He has
provided a wealth of guidance and insight throughout the course of the project,
and it is an honor to have written the last master’s thesis that he will advise.
(Happy retirement!)
Next, I would like to thank the other readers of my committee, Dr. Melanie
Autin and Dr. Ngoc Nguyen. Both of them have been supportive during the
project, and were great professors for the many statistics classes I was able to
take during my time at WKU.
In addition to those named above, I would also like to thank the many
other mathematics professors who have taught and advised me during my under-
graduate and graduate career. I have enjoyed working with all of you, and I am
grateful for all of the challenges and opportunities that helped me develop as a
mathematician.
Lastly, I would like to thank my friends and family who have supported
me during my academic journey. In particular, Kathleen Bell deserves special
recognition for doing countless homework assignments together, sharing in the
experience of being calculus TAs, being a great office mate, and finally for joining
me for several “Thesis Thursday” sessions. Also, I would like to thank Emily
Keith, who came all the way from Illinois to attend my defense.
iv
CONTENTS
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2. Expectation for Obtaining a Run of Length 2 . . . . . . . . . . . . . . . . . . 3
2.1. The Direct Counting Approach for E[T2,2] . . . . . . . . . . . . . . . . . . . . . . . . . . .3
2.2. The Direct Counting Approach for E[T2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3. Conditional Average for E[T2,2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4. Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 3. Markov Chains and Recursive Sequences . . . . . . . . . . . . . . . . . . . . . . 18
3.1. Markov Chain Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2. A Larger Markov Chain Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3. Pseudo-Fibonacci Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Chapter 4. Expectation for Obtaining a Run of Length n . . . . . . . . . . . . . . . . . 30
4.1. Deriving E[Tn] Recursively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
4.2. Deriving E[Tn,m] Recursively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32
4.3. Relationships Between E[Tn] and E[Tn,m] . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4. Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5. Applications of Conditional Runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Chapter 5. Simulations with Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Chapter 6. Generalizations and Further Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.1. Obtaining a Run with a Non-Constant Probability . . . . . . . . . . . . . . . . . 49
6.2. Obtaining a Run in a Finite Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
v
6.3. Obtaining a Run with a Categorical Distribution . . . . . . . . . . . . . . . . . . .53
6.4. Obtaining a Run from Multiple Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 55
Appendix A Code for Theorem 4.1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Appendix B Code for Theorem 4.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
Appendix C Code for Theorems 4.4.1/4.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Appendix D Code for Conjecture 6.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
vi
LIST OF TABLES
Table 3.1 Selected Pseudo-Fibonacci Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Table 4.1 Probabilities for a Fair Run Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Table 5.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
vii
RUNS OF IDENTICAL OUTCOMES IN A SEQUENCE OF BERNOULLITRIALS
Matthew Riggle May 2018 61 Pages
Directed by: David Neal, Melanie Autin, and Ngoc Nguyen
Department of Mathematics Western Kentucky University
The Bernoulli distribution is a basic, well-studied distribution in proba-
bility. In this thesis, we will consider repeated Bernoulli trials in order to study
runs of identical outcomes. More formally, for t ∈ N, we let Xt ∼ Bernoulli(p),
where p is the probability of success, q = 1 − p is the probability of failure, and
all Xt are independent. Then Xt gives the outcome of the tth trial, which is 1 for
success or 0 for failure. For n,m ∈ N, we define Tn to be the number of trials
needed to first observe n consecutive successes (where the nth success occurs on
trial XTn). Likewise, we define Tn,m to be the number of trials needed to first
observe either n consecutive successes or m consecutive failures.
We shall primarily focus our attention on calculating E[Tn] and E[Tn,m].
Starting with the simple cases of E[T2] and E[T2,2], we will use a variety of
techniques, such as counting arguments and Markov chains, in order to derive
the expectations. When possible, we shall also provide closed-form expressions
for the probability mass function, cumulative distribution function, variance, and
other values of interest. Eventually we will work our way to general formulas
for E[Tn] and E[Tn,m]. We will also derive formulas for conditional averages,
and discuss how famous results from probability such as Wald’s Identity apply to
our problem. Numerical examples will also be given in order to supplement the
discussion and clarify the results.
viii
CHAPTER 1
INTRODUCTION
Throughout, we letXt ∼ Bernoulli(p), where p is the probability of success,
q = 1 − p is the probability of failure, and all Xt are independent. Xt gives the
outcome of the tth trial, which is 1 for success and 0 for failure. For integers n,m ≥
2, we define Tn to be the number of trials needed to first observe n consecutive
successes (where the nth success occurs on trial XTn). Likewise, we define the
stopping time Tn,m to be the number of trials needed to first observe either n
consecutive successes or m consecutive failures. What are E[Tn] and E[Tn,m]?
Motivation for the main problem can be found in a variety of applications.
In particular, this problem is closely related to the concept of the “hot-hand
fallacy”, which was discussed in a famous psychology paper by Gilovich, Vallone,
and Tversky [2]. The idea is that basketball fans have a mental picture of a player
who makes several consecutive shots as being “on fire”, when in reality most such
streaks can be ascribed to random chance. The hot-hand fallacy is closely related
to the gambler’s fallacy [8], which falsely assumes that even for independent
events, long streaks of identical outcomes are less likely than probability would
dictate.
While we are largely interested in the average number of trials needed to
observe a run of a given length, other work has been done on related problems.
Several papers by Schilling discuss the average length of the longest run in a given
finite number of Bernoulli trials [9], [10], [11]. These papers provide examples of
some of the techniques we will use for our problem, including counting arguments
and recursive methods.
Another application of studying runs is related to the concept of “optimal
stopping,” which seeks to identify when to take a certain action so as to maximize
1
(or minimize) some value. An example of a gambling application will be presented
in this project, in which the player will quit after some number of consecutive
wins or losses. We will investigate the probability of coming out ahead and the
associated conditional winnings. Other research related to optimal stopping and
the Bernoulli distribution can be found in [6] and [12].
Finally, while not the main focus of this project, it is possible to closely
examine the probability mass function for the case T2 in order to develop recursive
“Fibonacci-like” sequences. It turns out that using p = 1/2 returns precisely the
Fibonacci sequence, and varying the probability parameter will provide a different
recursive sequence. Using a modified version of Binet’s Formula for the Fibonacci
numbers allows us to define such sequences as a side result. Similar recursive
sequences are examined in [1]. The two trivial cases, where p = 0 or p = 1, can
be handled with little effort. If p = 1, we will obtain a success on every trial, so
it takes n trials to obtain a run of n successes. Thus E[Tn] = n and E[Tn,m] = n.
If instead p = 0, we will obtain a failure on every trial, so it takes m trials to
obtain a run of m failures and E[Tn,m] = m. However, we will never obtain a run
of n successes, so E[Tn] =∞. We will thus focus on the case where 0 < p < 1.
We begin in Chapter 2 by studying the case where we wish to obtain a run
of length 2, in order to illustrate the techniques used to derive formulas for the
regular and conditional expectations. In Chapter 3, we use Markov chains as an
alternate approach to the problem, which allows use to highlight some nice results
related to the Fibonacci sequence. In Chapter 4, we generalize the problem for
runs of length n or m where n,m ≥ 2 and compare the results with those obtained
in Chapter 2. Chapter 5 describes simulation programs used to verify the results
obtained in previous chapters. Finally, we present further generalizations and
avenues for future study in Chapter 6.
2
CHAPTER 2
EXPECTATION FOR OBTAINING A RUN OF LENGTH 2
In this chapter, we utilize a variety of approaches to calculate E[T2] and
E[T2,2], beginning with a direct counting approach. Surprisingly, this approach
turns out to be more straightforward for the calculation of E[T2,2], so we begin
there.
2.1 - The Direct Counting Approach for E[T2,2]
We begin by deriving the probability mass function f2,2(t) = P (T2,2 = t),
defined for t ≥ 2. We note that we must end either by obtaining a run of 2
successes (which occurs with probability p2) or by obtaining a run of 2 failures
(which occurs with probability q2). Also, since we cannot obtain a run of length
2 before the observed run, all previous trials must alternate between success and
failure. This means that for any given value of t, there are two valid options,
depending on the parity of t. If t is even, then the only valid options are 1 0 1 0
1 0 ... 0 1 1 or 0 1 0 1 0 1 ... 1 0 0, where 1 indicates a success and 0 indicates a
failure. If t is odd, then the two valid options are 1 0 1 0 1 0 ... 1 0 0 or 0 1 0 1
0 1 ... 0 1 1. Thus, the outcome of the starting trial will depend on whether t is
even or odd.
We may then find the probability for any particular value of t, depend-
ing on whether it is even or odd. If t is even, then there will be either t2
+ 1
successes and t2− 1 failures, or vice versa. Also, each valid sequence of trials
has only one possible ordering. The corresponding probabilities are pt2+1q
t2−1 and
pt2−1q
t2+1, so the total probability is p
t2+1q
t2−1 +p
t2−1q
t2+1, which can be rewritten
as (pq)t2−1(p2 + q2).
3
If instead t is odd, then there will be either t+12
successes and t−12
failures,
or vice versa. The corresponding probabilities are pt+12 q
t−12 and p
t−12 q
t+12 , so the
total probability is pt+12 q
t−12 + p
t−12 q
t+12 , which can be simplified to (pq)
t−12 .
Thus the probability mass function f2,2(t) for T2,2 is as follows:
f2,2(t) =
p
t2−1q
t2−1(p2 + q2), if t is even
(pq)t−12 , if t is odd.
(2.1)
Summing f2,2(t) over all possible values of t allows us to prove the following result.
Theorem 2.1.1. For 0 ≤ p ≤ 1, we will obtain a run of length 2 almost surely.
Proof. For p = 0 and p = 1, it is guaranteed that the run will occur in the first
two trials. For 0 < p < 1, we let even t = 2k and odd t = 2k + 1, and consider
the piecewise sum of f2,2(t):
P (T2,2 <∞) =∞∑k=1
(pq)k−1(p2 + q2) +∞∑k=1
(pq)k
=∞∑k=0
(pq)k(p2 + q2) +∞∑k=0
(pq)k+1
=p2 + q2
1− pq+
pq
1− pq
=p2 + q2 + pq
1− pq
=p2 + q(q + p)
1− p(1− p)
=p2 + q
1− p+ p2
=p2 + q
q + p2
= 1.
Since the sum of f2,2(t) over all values of t is equal to 1, we will obtain a run of
4
two successes or two failures almost surely.
Theorem 2.1.1 holds even though there are two sequences which never give
a run of 2; i.e., those which forever alternate between success and failure. It is
not difficult to show that these two sequences occur with probability 0. However,
just because T2,2 is finite-valued, it is not obvious that E[T2,2] is finite.
As a corollary to our work thus far, it is now convenient to utilize the
probability mass function to develop a formula for the cumulative distribution
function F2,2(t) = P (T2,2 ≤ t). We note that for even t = 2k, the even sum will
contain an extra term. For odd t = 2k + 1, we have
F2,2(t) =k∑
i=1
(pq)i−1(p2 + q2) +k∑
i=1
(pq)i
=(p2 + q2)(1− (pq)k)
1− pq+pq − (pq)k+1
1− pq
=(p2 + q2)(1− (pq)k)
1− pq+pq − (pq)k+1
1− pq
=(p2 + q2)(1− (pq)k) + pq − (pq)k+1
1− pq
=(p2 + q2)(1− (pq)k) + pq(1− (pq)k)
1− pq
=(p2 + q2 + pq)(1− (pq)k)
1− (1− q)q
=(p(p+ q) + q2)(1− (pq)k)
1− q + q2
=(p+ q2)(1− (pq)k)
p+ q2
= 1− (pq)k,
and for even t = 2k, we have
F2,2(t) =k∑
i=1
(pq)i−1(p2 + q2) +k−1∑i=1
(pq)i
5
=(p2 + q2)(1− (pq)k)
1− pq+pq − (pq)k
1− pq
=(p2 + q2)(1− (pq)k) + pq − (pq)k
1− pq
=(p2 + q2)(1− (pq)k) + pq − 1 + (1− (pq)k)
1− pq
=(p2 + q2 + 1)(1− (pq)k) + pq − 1
1− pq
=(p2 + q2 + 1)(1− (pq)k)
1− pq− 1
=(p2 + (1− p)2 + 1)(1− (pq)k)
1− p(1− p)− 1
=(p2 + 1− 2p+ p2 + 1)(1− (pq)k)
1− p+ p2− 1
=(2p2 − 2p+ 2)(1− (pq)k)
1− p+ p2− 1
= 2(1− (pq)k)− 1
= 1− 2(pq)k.
Then we reach the following corollary.
Corollary 2.1.2. The cumulative mass function F2,2(t), which gives the proba-
bility of obtaining a run of two identical trials within t trials, is given by
F2,2(t) =
1− 2(pq)
t2 , if t is even
1− (pq)t−12 , if t is odd.
Returning to the probability mass function allows us to calculate E[T2,2].
Theorem 2.1.3. Let T2,2 be the number of trials required to obtain either a run
of two successes or a run of two failures. Then E[T2,2] =2 + pq
1− pq.
Proof. We use two sums, one for even t and one for odd t. For even t = 2k, we
6
have
∞∑k=1
(2k)(pq)k−1(p2 + q2),
and for odd t = 2k + 1, we have
∞∑k=1
(2k + 1)(pq)k.
Combining the even and odd sums, we obtain:
E[T2,2] =∞∑k=1
(2k)(pq)k−1(p2 + q2) +∞∑k=1
(2k + 1)(pq)k.
Applying the formulas∑∞
k=0 rk =
1
1− rand
∑∞k=1 kr
k−1 =1
(1− r)2and the fact
that p+ q = 1 allows us to simplify the sum:
E[T2,2] =∞∑k=1
(2k)(pq)k−1(p2 + q2) +∞∑k=1
(2k + 1)(pq)k
= 2(p2 + q2)∞∑k=1
k(pq)k−1 + 2pq∞∑k=1
k(pq)k−1 + pq∞∑k=0
(pq)k
=2(p2 + q2)
(1− pq)2+
2pq
(1− pq)2+
pq
(1− pq)
=2p2 + 2q2 + 2pq + pq(1− pq)
(1− pq)2
=2p2 + 2q2 + 3pq − p2q2
(1− pq)2
=(2p2 + 2pq) + (2q2 + 2pq)− pq − p2q2
(1− pq)2
=(2p2 + 2pq) + (2q2 + 2pq)− pq − p2q2
(1− pq)2
=2p(p+ q) + 2q(p+ q)− pq − p2q2
(1− pq)2
7
=2− pq − p2q2
(1− pq)2
=(2 + pq)(1− pq)
(1− pq)2
=2 + pq
1− pq.
We note that the formula is also valid for p = 0 or p = 1; in either case, we obtain
E[T2,2] = 2.
2.2 - The Direct Counting Approach for E[T2]
We now consider the probability mass function f2(t) = P (T = t), defined
for t ≥ 2, for first obtaining a run of two successes ending on the tth trial. The
first few values of f2(t) are straightforward to calculate: t = 2 corresponds to a
run of two successes, so f2(2) = p2. Likewise, for t = 3, we must obtain successes
on trials 2 and 3, and a failure on trial 1 (or else we would have achieved the run
at t = 2). Thus f2(3) = p2q. It is clear that for t > 3, the last three trials must
be failure, success, and success, in that order, but there are more possibilities for
the first t− 3 trials (hereafter referred to as the “pre-run”). The only restriction
is that we cannot obtain two consecutive successes. Note that the number of
successes in the pre-run cannot exceed⌊t−22
⌋, since otherwise we would have a
run of two successes earlier than the tth trial.
We now consider the choice of successes within the pre-run. If the pre-
run contains no successes, this can only occur in one way (we will write this as(t−20
)for reasons that will soon be apparent). If the pre-run contains exactly
1 success, there are(t−31
)ways this may occur. If the pre-run contains more
than 1 success, then each success we place prior to the last will necessarily be
preceded by a failure, which removes 1 position from consideration. Thus a pre-
run that contains exactly 2 successes can occur in(t−42
)ways, and a pre-run that
8
contains exactly k successes can occur in(t−2−k
k
)ways. Finally, note that a pre-
run of length t−3 with k successes occurs with probability pkqt−3−k, so the entire
observed sequence occurs with probability pk+2qt−2−k. Thus, the probability mass
function can be written as
f2(t) =
b t−22 c∑
k=0
(t− 2− k
k
)pk+2qt−2−k, defined for integers t ≥ 2, (2.2)
which leads to the following expected value
E[T2] =∞∑t=2
t
b t−22 c∑
k=0
(t− 2− k
k
)pk+2qt−2−k. (2.3)
As can be seen, the formula in (2.3) is quite unwieldy, so we now pursue
a different method of enumeration. Instead of developing a formula based on the
number of trials t, we will consider the number of failures. We know that the
sequence must end with two successes, but before that, there may be any number
of failures k ∈ {0, 1, 2, ...}. Between any two failures or before the first failure,
there can be at most one success. Thus there are i ∈ {0, 1, 2, ..., k} successes, in
addition to the two successes in the run. Thus the probability of our sequence
containing exactly k failures is
k∑i=0
(k
i
)p2+iqk = p2qk
k∑i=0
(k
i
)pi = p2qk(p+ 1)k. (2.4)
Using this probability allows us to prove the following result.
Theorem 2.2.1. For 0 < p ≤ 1, we will obtain a run of two successes almost
surely.
Proof. For p = 1, it is guaranteed that the run will occur in the first two trials.
9
For 0 < p < 1, we have
P (T2 <∞) =∞∑k=0
p2qk(p+ 1)k
= p2∞∑k=0
[q(p+ 1)]k
=p2
1− q(p+ 1)
=p2
1− pq − q
=p2
p− pq
=p2
p(1− q)
= 1.
As the total probability is equal to 1, we will obtain the run almost surely.
In this case, there are infinitely many sequences which never give a run of
2 successes; however, the union of all such sequences has probability 0.
We can also use the probability function in (2.4) to obtain a formula for
E[T2].
Theorem 2.2.2. Let T2 be the number of trials required to obtain a run of 2
successes. Then E[T2] =1 + p
p2.
Proof. We sum on the number of failures k, for k ≥ 0. Since we might have a
success before each failure, there are i successes before the run, for some 0 ≤ i ≤ k.
Together with the run of 2 successes, there is a total of 2 + k + i trials. Then
E[T2] =∞∑k=0
k∑i=0
(k
i
)p2+iqk(2 + k + i)
10
= p2∞∑k=0
qkk∑
i=0
(k
i
)pi(2 + k + i)
= p2∞∑k=0
qk
((2 + k)
k∑i=0
(k
i
)pi +
k∑i=0
(k
i
)pii
)
= p2∞∑k=0
qk((2 + k)(p+ 1)k + kp(p+ 1)k−1
)= p2
(∞∑k=0
qk(2 + k)(p+ 1)k +∞∑k=0
qkkp(p+ 1)k−1
)
= p2
(2∞∑k=0
qk(p+ 1)k +∞∑k=0
qk(p+ 1)kk + pq∞∑k=0
kqk−1(p+ 1)k−1
)
= p2(
2
1− q(p+ 1)+
(p+ 1)q
(1− q(p+ 1))2+
pq
(1− q(p+ 1))2
)=p2(2− 2q(p+ 1) + (p+ 1)q + pq)
(1− q(p+ 1))2
=p2(2− q(p+ 1) + pq)
(1− qp− q)2
=p2(2− qp− q + pq)
(p− pq)2
=p2(2− q)
(p(1− q))2
=2− qp2
=1 + p
p2.
The formula also works for p = 1, as E[T2] = 2.
2.3 - Conditional Average for E[T2,2]
We may also be interested in obtaining a run of n successes before we
obtain a run of m failures. In order to derive the conditional expectation of the
number of trials needed in this case, we will need the probability of first obtaining
a run of successes. As before, we first seek a derivation for the case where n = 2
and m = 2.
11
Theorem 2.3.1. Let P (S2) be the probability of obtaining a run of 2 successes
before obtaining a run of 2 failures. Then P (S2) =p(1− q2)
1− pq.
Proof. Suppose we obtain the run of 2 successes first and that it takes t trials.
Then the sequence of trials must end with two successes and be preceded by al-
ternating successes and failures. If t is even, we will have two successes (occurring
with probability p2), plus an equal number of successes and failures in the pre-
ceding t− 2 trials. If instead t is odd, we will have two successes (occurring with
probability p2), and there will be one additional success in the preceding t − 2
trials. Again, by summing on the number of failures k, we obtain
P (S2) = p2∞∑k=0
(pq)k + qp2∞∑k=0
(pq)k
=p2
1− pq+
qp2
1− pq
=p2 + qp2
1− pq
=p2(1 + q)
1− pq
=p(1− q)(1 + q)
1− pq
=p(1− q2)
1− pq.
By symmetry, we also obtain the analogous result.
Corollary 2.3.2. Let P (F2) be the probability of obtaining a run of 2 failures
before obtaining a run of 2 successes. Then P (F2) =q(1− p2)
1− pq.
Now, we shall derive a formula for E[T2,2|S2].
12
Theorem 2.3.3. Let S2 be the event of obtaining a run of 2 successes before
obtaining a run of 2 failures. Then E[T2,2|S2] =2 + 3q − pq2
(1 + q)(1− pq).
Proof. We begin with two sums, corresponding to an even number of trials and
an odd number of trials, respectively. We multiply by the number of trials in
order to calculate the conditional expectation, and we divide by the probability
of the given event P (S2):
E[T2,2|S2] =1− pqp(1− q2)
(p2
∞∑k=0
(pq)k(2k + 2) + qp2∞∑k=0
(pq)k(2k + 3)
)
=1− pqp2(1 + q)
(2p2
∞∑k=0
(pq)kk + 2p2∞∑k=0
(pq)k + 2qp2∞∑k=0
(pq)kk + 3qp2∞∑k=0
(pq)k
)
=1− pq1 + q
(2pq
(1− pq)2+
2
1− pq+
2q2p
(1− pq)2+
3q
1− pq
)=
1− pq1 + q
(2pq + 2(1− pq) + 2q2p+ 3q(1− pq)
(1− pq)2
)=
1− pq1 + q
(2 + 3q − q2p
(1− pq)2
)=
2 + 3q − pq2
(1 + q)(1− pq).
Again, we obtain the analogous result by symmetry.
Corollary 2.3.4. Let F2 be the event of obtaining a run of 2 failures before
obtaining a run of 2 successes. Then E[T2,2|F2] =2 + 3p− qp2
(1 + p)(1− pq).
2.4 - Numerical Examples
We now present an example utilizing the formulas derived in the previous
sections.
Example 2.4.1. Let p = 0.4 be the probability of success on each trial.
13
By Theorem 2.2.2, the average number of trials needed to obtain a run of
two successes is
E[T2] =1 + p
p2=
1 + 0.4
0.42= 8.75.
By Theorem 2.1.3, the average number of trials needed to obtain a run of two
successes or a run of two failures is
E[T2,2] =2 + pq
1− pq=
2 + (0.4)(0.6)
1− (.4)(.6)=
56
19≈ 2.9474.
By Theorem 2.3.1, the probability of obtaining a run of two successes before a
run of two failures is
P (S2) =p(1− q2)
1− pq=
(0.4)(1− 0.62)
1− (.4)(.6)≈ 0.3368,
and, similarly, by Corollary 2.3.2, the probability of obtaining a run of two failures
before a run of two successes is
P (F2) =q(1− p2)
1− pq=
(0.6)(1− 0.42)
1− (.4)(.6)≈ 0.6632.
Finally, by Theorem 2.3.3, the average number of trials needed to obtain a run of
two successes, given that we obtain such a run before a run of two failures, is
E[T2,2|S2] =2 + 3q − pq2
(1 + q)(1− pq)
=2 + 3(.6)− (.4)(.6)2
(1 + .6)(1− (.4)(.6))
≈ 3.0066.
14
Likewise, by Corollary 2.3.4, the average number of trials needed to obtain a run
of two failures, given that we obtain such a run before a run of two successes, is
E[T2,2|F2] =2 + 3p− qp2
(1 + p)(1− pq)
=2 + 3(.4)− (.6)(.4)2
(1 + .4)(1− (.4)(.6))
≈ 2.9173.
The previous two results demonstrate that T2,2 depends on which run occurs first.
The next example will treat the repeated Bernoulli trials as corresponding
to individual steps in a one-dimensional random walk. A success corresponds to
an upward step, and a failure corresponds to a downward step. More formally, if
upward steps are a units each, and downward steps are b units each, then we can
define the height Ht as
Ht =t∑
i=1
(a1{Xi=1} − b1{Xi=0})
where 1{Xt=1} is an indicator variable that returns 1 for a success and 0 for a
failure, and 1{Xt=0} is an indicator variable that returns 0 for a success and 1 for
a failure. Then Ht represents the height at time t (i.e., after t trials).
A very useful result for studying random walks is Wald’s Identity. Next,
we will present a proof of Wald’s Identity from Harper and Ross [3].
Theorem 2.4.2. Wald’s Identity: Suppose that Ht is the height of a random walk
defined by Ht =∑t
i=1(a1{Xi=1}− b1{Xi=0}). If Xt are i.i.d., T is a stopping time,
and E[X1] and E[T ] are both finite, then E[HT ] = E[X1] · E[T ].
15
Proof. As Xt and the indicator 1{T≥n} are independent, we have
E[HT ] =∞∑n=1
E[Xn · 1{T≥n}]
=∞∑n=1
E[Xn] · E[1{T≥n}]
=∞∑n=1
E[X1] · P (T ≥ n)
= E[X1]∞∑n=1
n · P (T = n)
= E[X1] · E[T ].
We will use Wald’s Identity in the next example.
Example 2.4.3. Let p = 0.4 be the probability of success on each trial, and let
each upward step be a units and each downward step be b units. We would like
to compute E[HT2 ] and E[HT2,2 ]. First, we observe that
E[X1] = ap− bq.
Then, by applying Wald’s Identity, we find
E[HT2 ] = E[X1] · E[T2]
= (ap− bq)(
1 + p
p2
)= (0.4a− 0.6b)(8.75)
= 3.5a− 5.25b,
16
and
E[HT2,2 ] = E[X1] · E[T2,2]
= (ap− bq)(
2 + pq
1− pq
)= (0.4a− 0.6b)
(56
19
)=
112a
95− 168b
95.
17
CHAPTER 3
MARKOV CHAINS AND RECURSIVE SEQUENCES
3.1 - Markov Chain Solutions
The formula for the probability mass function f2(t) derived with the direct
counting approach, (2.2), is difficult to use directly. For this reason, we attempt
to develop a simpler formula by using a Markov chain. In this situation, we have
three different states, corresponding to having a current run of either 0, 1, or 2
successes. Thus we have a 1 × 3 initial state matrix B and a 3 × 3 transition
matrix A:
B =
(1 0 0
)
A =
1− p p 0
1− p 0 p
0 0 1
.
The entries in B correspond to the initial probabilities; since we always
begin with a run of 0 successes, there is a 1 in the first entry and 0 elsewhere.
Each entry in A corresponds to the probability of moving from its row state to its
column state. Note that the third row represents the absorbing state, since once
we obtain a run of 2 successes, we will remain in that state indefinitely. Then
the 1 × 3 matrix B × At represents the probabilities of being in each state after
t trials. Because computing At is quite difficult, we seek a diagonalization of A
using standard techniques (see [7]).
The eigenvalues of A can be found by solving its characteristic equation:
(1− p− λ)(−λ)(1− λ)− (1− p)(p)(1− λ) = 0
18
(1− λ)[λ2 − (1− p)λ− p(1− p)] = 0
Solving for λ, we obtain:
λ = 1 or λ =(1− p)±
√(1− p)(1 + 3p)
2.
Then the three eigenvalues of A are
λ1 = 1, λ2 =(1− p) +
√(1− p)(1 + 3p)
2, and λ3 =
(1− p)−√
(1− p)(1 + 3p)
2.
We can also find the corresponding matrix of eigenvectors P , which is
P =
1
(1−p)+√
(1−p)(1+3p)
2(1−p)
(1−p)−√
(1−p)(1+3p)
2(1−p)
1 1 1
1 0 0
,
where the ith column of P represents the eigenvector corresponding to λi. Note
that P is invertible, and
P−1 =1− p√
(1− p)(1 + 3p)
0 0
√(1−p)(1+3p)
1−p
1−(1−p)+
√(1−p)(1+3p)
2(1−p)
−(1−p)−√
(1−p)(1+3p)
2(1−p)
−1 (1−p)+√
(1−p)(1+3p)
2(1−p)
(1−p)−√
(1−p)(1+3p)
2(1−p)
.
Then we have the diagonalization A = PDP−1, where D is simply the diagonal
matrix containing the eigenvalues on the main diagonal:
D =
1 0 0
0(1−p)+
√(1−p)(1+3p)
20
0 0(1−p)−
√(1−p)(1+3p)
2
.
19
This allows us to easily calculate At = PDtP−1.
Note that the matrix
B ×At =
(P (0) P (1) P (2)
)
represents the final probabilities of being in each state after t trials. In particular,
P (2) is the probability of obtaining a run of 2 successes in t or fewer trials,
which is the cumulative mass function F2(t). Thus, when we perform the matrix
multiplication At = PDtP−1, we can focus on this entry of the final state matrix,
which is as follows:
P (2) =1− p√
(1− p)(1 + 3p)
[√(1− p)(1 + 3p)
1− p
−
((1− p) +
√(1− p)(1 + 3p)
2(1− p)
)2((1− p) +
√(1− p)(1 + 3p)
2
)t
+
((1− p)−
√(1− p)(1 + 3p)
2(1− p)
)2((1− p)−
√(1− p)(1 + 3p)
2
)t .
The above result simplifies to yield
F2(t) = 1−
[(1− p) +
√(1− p)(1 + 3p)
]t+2
−[(1− p)−
√(1− p)(1 + 3p)
]t+2
2t+2(1− p)√
(1− p)(1 + 3p),
(3.1)
from which we can derive f2(t) = F2(t)− F2(t− 1).
A remarkable simplification is possible in the case where p = 1/2; in this
case, we obtain
F2(t) = 1−
[(1− 1
2) +
√(1− 1
2)(1 + 3
2)]t+2
−[(1− 1
2)−
√(1− 1
2)(1 + 3
2)]t+2
2t+2(1− 12)√
(1− 12)(1 + 3
2)
20
= 1−
(12
+√
54
)t+2
−(
12−√
54
)t+2
2t+2(12
)√54
= 1−
(1+√5
2
)t+2
−(
1−√5
2
)t+2
2t√
5.
We note that1 +√
5
2= ϕ, where ϕ is the golden ratio, and
1−√
5
2= 1 − ϕ.
This allows us to apply Binet’s Formula, which states Fn =ϕn − (1− ϕ)n
ϕ− (1− ϕ), where
Fn is the nth Fibonacci number. Thus we obtain the following theorem.
Theorem 3.1.1. The cumulative distribution function F2(t) for the probability of
obtaining a run of 2 successes within t trials is given by F2(t) = 1− Ft+2
2t, where
Ft is the tth Fibonacci number.
Using Theorem 3.1.1, we can also obtain the pdf f2(t):
f2(t) = F2(t)− F2(t− 1)
= 1− Ft+2
2t−(
1− Ft+1
2t−1
)=
2Ft+1 − Ft+2
2t
=2Ft+1 − (Ft+1 + Ft)
2t
=Ft+1 − Ft
2t.
We thus obtain the following result.
Theorem 3.1.2. The probability mass function f2(t) for the probability of ob-
taining a run of 2 successes ending on the tth trial is given by f2(t) =Ft−1
2t, where
Ft is the tth Fibonacci number.
We can use the probability distribution function in Theorem 3.1.2 to pro-
vide an alternate proof that a run of 2 successes must have finite expectation in
21
the case where p = 1/2.
Theorem 3.1.3. Suppose p = 1/2. Then T2 is finite almost surely, and E[T2] is
finite.
Proof. To prove the first part of the theorem, we will show that the following sum
is equal to 1:
∞∑t=2
f2(t) =∞∑t=2
Ft−1
2t
=∞∑t=2
ϕt−1 − (1− ϕ)t−1
2t[ϕ− (1− ϕ)]
=1
2[ϕ− (1− ϕ)]
∞∑t=2
(ϕt−1
2t−1 −(1− ϕ)t−1
2t−1
)
=1
2[ϕ− (1− ϕ)]
∞∑t=1
[(ϕ2
)t−(
(1− ϕ)
2
)t]
=1
4ϕ− 2
[∞∑t=1
(ϕ2
)t−∞∑t=1
((1− ϕ)
2
)t]
=1
4ϕ− 2
(1
1− ϕ2
− 1− 1
1− 1−ϕ2
+ 1
)
=1
4ϕ− 2
(1
2−ϕ2
− 11+ϕ2
)
=1
4ϕ− 2
(2
2− ϕ− 2
1 + ϕ
)=
1
4ϕ− 2
(2 + 2ϕ− 4 + 2ϕ
(2− ϕ)(1 + ϕ)
)=
1
2 + ϕ− ϕ2
=1
2 + 1+√5
2− (1+
√5
2)2
=1
8+2+2√5−1−2
√5−5
4
= 1.
22
For the second part of the theorem, we will use the ratio test on the series
E[T2] =∞∑t=2
Ft−1
2tt,
obtaining the following limit:
limt→∞
(Ft
Ft−1
)(2t
2t+1
)(t+ 1
t
)= lim
t→∞
(Ft
Ft−1
)(1
2
)=ϕ
2< 1.
Hence E[T2] is finite.
3.2 - A Larger Markov Chain Example
We may apply the same Markov chain technique for longer runs where
n > 2. In these cases, however, it will not be as straightforward to solve the
characteristic equation and obtain a diagonalization of the transition matrix. We
can, however, obtain numerical approximations for given values of p and t.
Example 3.2.1. Let p = 0.637. What is the probability of obtaining a run of
4 successes within 20 trials? Within 50 trials? What is the probability of first
obtaining a run of 4 successes on the 10th trial?
To set up the Markov chain, we note that our initial state matrix will be
a 1 × 5 matrix (allowing for runs of length 0, 1, 2, 3, and 4), and the transition
matrix will be a 5 × 5 matrix. As before, the initial state matrix will contain a
single 1 followed by all zeros. All but the last row of the transition matrix will
contain 1 − p = 0.363 in the first entry, p = 0.637 on row i, column i + 1, and
zeros elsewhere. The last row, representing the absorbing state, will contain all
zeros followed by a single 1 in the last entry. Thus,
B =
(1 0 0 0 0
)23
and
A =
0.363 0.637 0 0 0
0.363 0 0.637 0 0
0.363 0 0 0.637 0
0.363 0 0 0 0.637
0 0 0 0 1
.
Then, to find the probability of obtaining a run of 4 successes within 20
trials, we simply need to compute
B ×A20 =
(P (0) P (1) P (2) P (3) P (4)
),
which yields the following (the values do not sum to 1 due to rounding):
B ×A20 =
(0.0812 0.0566 0.0394 0.0274 0.7953
).
Therefore, from the last entry in the matrix, there is about a 79.53% chance of
obtaining a run of 4 successes within 20 trials. Similarly, we can compute
B ×A50 =
(0.0056 0.0039 0.0027 0.0019 0.9860
)
to find that there is about a 98.60% chance of obtaining a run of 4 successes
within 50 trials. Essentially, we are computing specific values of the cumulative
distribution function. Thus, in order to answer the last question, we need to
compute two of these cdf values and subtract them.
f4(10) = F4(10)− F4(9)
= (B ×A10)5 − (B ×A9)5
24
≈ 0.5000− 0.4536 = .0464.
Thus, there is about a 4.64% chance of first completing a run of 4 successes on
the tenth trial.
3.3 - Pseudo-Fibonacci sequences
We can derive Binet’s Formula for the nth Fibonacci number by using a
generating function. We let G(x) = F0 +F1x+F2x2 +F3x
3 + ... be the generating
function, with coefficient Fn representing the nth Fibonacci number. We also have
the initial conditions F0 = 0, F1 = 1, and Fn = Fn−1 + Fn−2. Then we have
G(x) = F0 + F1x+ F2x2 + F3x
3 + ...
xG(x) = F0x+ F1x2 + F2x
3 + F3x4 + ...
x2G(x) = F0x2 + F1x
3 + F2x4 + F3x
5 + ... .
Subtracting the last two lines from the first line gives
(1− x− x2)G(x) = F0 + (F1 − F0)x, or
G(x) =x
1− x− x2.
Thus we have solved for the generating function G(x). Note that the denominator
can be factored (using1 +√
5
2= ϕ and
1−√
5
2= 1− ϕ); hence,
G(x) =x
(1− ϕx)(1− (1− ϕ)x).
25
The right side of the equation can be rewritten using the partial fraction expansion
G(x) =A
1− ϕx+
B
1− (1− ϕ)x.
Through some straightforward algebra, we obtain A =1√5
and B = − 1√5
. Note
that√
5 = ϕ− (1− ϕ). Thus we have
G(x) =1
ϕ− (1− ϕ)
(1
1− ϕx+
1
1− (1− ϕ)x
).
Note that we can rewrite the above using power series as follows:
G(x) =1
ϕ− (1− ϕ)
(∞∑n=0
(ϕx)n −∞∑n=0
[(1− ϕ)x]n
).
Then the coefficient Fn for the xn term of G(x) isϕn − (1− ϕ)n
ϕ− (1− ϕ), which is exactly
Binet’s Formula.
We now consider the more general formula for F2(t) from (3.1):
F2(t) = 1−
[(1− p) +
√(1− p)(1 + 3p)
]t+2
−[(1− p)−
√(1− p)(1 + 3p)
]t+2
2t+2(1− p)√
(1− p)(1 + 3p).
We let α = (1− p) +√
(1− p)(1 + 3p) and β = (1− p)−√
(1− p)(1 + 3p), and
note that α− β = 2√
(1− p)(1 + 3p), which is present in the denominator of
[(1− p) +
√(1− p)(1 + 3p)
]n−[(1− p)−
√(1− p)(1 + 3p)
]n2√
(1− p)(1 + 3p).
Then for n = 0 we obtain 0, and for n = 1 we obtain 1, matching the initial values
of the Fibonacci sequence. However, we will need a different recursive relation
than the one for the Fibonacci sequence.
26
For the Fibonacci generating function, we used the polynomial 1− x− x2
in order to eliminate all but a finite number of terms to give
(1− x− x2)G(x) = F0 + (F1 − F0)x.
Now, the Fibonacci recursive relation may be written as Fn − Fn−1 − Fn−2 =
0, which reveals the key insight. The coefficients of the polynomial and the
recursion match (constant with Fn, x with Fn−1, and x2 with Fn−2). Thus, we
seek a polynomial that factors into (1 − αx)(1 − βx). This turns out to be
4p(1− p)− 2(1− p)x− x2. Then we can define a new recursive sequence, Pn, of
“pseudo-Fibonacci” numbers with conditions
P0 = 0, P1 = 1
Pn = 2(1− p)Pn−1 + 4p(1− p)Pn−2.
Then by following the same argument as in the derivation of Binet’s Formula, we
obtain
Pn =
[(1− p) +
√(1− p)(1 + 3p)
]n−[(1− p)−
√(1− p)(1 + 3p)
]n2√
(1− p)(1 + 3p). (3.2)
For p = 1/2, (3.2) gives Binet’s Formula for Fn. Thus we have defined recursive
sequences {Pn} corresponding to probability values, which we can use to rewrite
F2(t) = P (T2 ≤ t), as follows:
F2(t) = 1− Pt+2
2t+1(1− p)
= 1− 2(1− p)Pt+1 + 4p(1− p)Pt
2t+1(1− p)
27
= 1− Pt+1 + 2pPt
2t.
It may be of interest to examine the sequences {Pn}. Table 3.1 presents
the beginning of the sequences for selected values of p.
Table 3.1: Selected Pseudo-Fibonacci Sequences
p First 7 terms of {Pn}
0 0, 1, 2, 4, 8, 16, 32, ...
0.2 0, 1, 1.6, 3.2, 6.144, 11.8784, 22.9376, ...
0.3 0, 1, 1.4, 2.8, 5.096, 9.4864, 17.5616, ...
0.5 0, 1, 1, 2, 3, 5, 8, ...
0.75 0, 1, 0.5, 1, 0.875, 1.1875, 1.25, ...
ϕ/2 0, 1, 0.3820, 0.7639, 0.5279, 0.6738, 0.5836, ...
0.9 0, 1, 0.2, 0.4, 0.152, 0.1744, 0.0896, ...
1 0, 1, 0, 0, 0, 0, 0, ...
Note that some sequences are strictly increasing, while others oscillate.
Some properties of the family {Pn} may be observed. For instance, the
only values which yield sequences of only integers are p = 0, p = 1/2, and p = 1.
Using p = 0 gives powers of 2, and of course p = 1/2 returns the familiar Fibonacci
sequence.
Examining the end behavior of the sequences also provides some interesting
properties. For 0 ≤ p ≤ 1/2, {Pn} is a monotone increasing sequence. For 1/2 <
p < ϕ/2, {Pn} oscillates a finite number of times before eventually increasing to
infinity. For p = ϕ/2, {Pn} oscillates infinitely often, but the oscillations become
smaller and smaller, and the sequence converges to the limit ϕ− 1. For p > ϕ/2,
{Pn} eventually becomes decreasing and converges to 0.
28
Variations on the Fibonacci sequence are not a new concept. For instance,
the generalized Fibonacci sequence modifies the starting values while preserv-
ing the coefficients in the recursive relation. In contrast, the family of pseudo-
Fibonacci sequences defined here modified the coefficients in the recursive relation
while preserving the starting values of 0 and 1. More results and applications re-
lated to generalized Fibonacci numbers can be found in Koshy [5].
29
CHAPTER 4
EXPECTATION FOR OBTAINING A RUN OF LENGTH N
Again, we define the stopping time Tn to be the number of trials needed
to first observe n consecutive successes, and we define the stopping time Tn,m to
be the number of trials needed to first observe either n consecutive successes or
m consecutive failures. Because the counting arguments presented in Chapter
2 would seemingly be quite complicated for n,m > 2, we seek a more general
method to calculate E[Tn] and E[Tn,m].
4.1 - Deriving E[Tn] Recursively
Before we begin our procedure to derive a formula for E[Tn], we must first
show that it is finite for any non-zero choice of p and any choice of n.
Theorem 4.1.1. For 0 < p ≤ 1 and fixed n, E[Tn] is finite.
Proof. We have already proven the result for the trivial case of p = 1, so suppose
0 < p < 1. Consider the trials in blocks of n; i.e., trials X1 through Xn, trials Xn+1
through X2n, and so on. Each block contains the run we desire with probability
pn. The blocks correspond to a geometric distribution with parameter pn, so the
expected number of blocks required to observe a run is 1/pn, which corresponds
to n/pn trials. This expectation, which is finite, in fact provides an upper bound
for E[Tn], since the first run might occur across two different blocks. Thus we
can safely conclude that E[Tn] must be finite for any choice of n.
As an aside, note that the case n = 1 is equivalent to the geometric
distribution with parameter p, which has expected value 1/p. We have already
derived a formula for n = 2 in Theorem 2.2.2, but we would like a general formula.
We now develop such a formula by recursively calculating the weighted average.
30
Theorem 4.1.2. Let Tn be the number of trials required to obtain a run of n
consecutive successes. Then for 0 < p < 1, E[Tn] =1− pn
pn(1− p).
Proof. Suppose we obtain a run of n successes immediately. This situation occurs
with probability pn and requires n trials. If instead we do not obtain a run of
n successes, then we must have first obtained a run of k successes followed by
a failure, for some k ∈ {0, 1, ..., n − 1}. This situation occurs with probability
pk(1− p). We have already observed k + 1 trials, and since the failure resets the
run, we will require E[Tn] additional trials, for a total of k+1+E[Tn] trials. Thus
we obtain the following:
E[Tn] = pnn+n−1∑k=0
pk(1− p)(k + 1 + E[Tn])
= pnn+n−1∑k=0
pk(1− p)E[Tn] +n−1∑k=0
pk(1− p)(k + 1)
= pnn+ (1− pn)E[Tn] +n−1∑k=0
pk(k + 1)−n−1∑k=0
pk+1(k + 1).
Because E[Tn] is finite, we can subtract it from the right-hand side. Solving for
E[Tn], we obtain
pnE[Tn] = pnn+n−1∑k=0
pk(k + 1)−n∑
k=1
pkk
= pnn+ 1 +n−1∑k=1
pk(k + 1)−n−1∑k=1
pkk − pnn
= 1 +n−1∑k=1
pk
=n−1∑k=0
pk
=1− pn
1− p,
31
which gives the result.
Note that if we use n = 1 in Theorem 4.1.2, we obtain the expectation
E[T1] = 1p, which is consistent with the geometric distribution. Furthermore, for
n = 2, we obtain E[T1] =1− p2
p2(1− p)=
1 + p
p2, which agrees with Theorem 2.2.2.
Thus we have solved for E[Tn] in all cases:
E[Tn] =
∞, p = 0
1− pn
pn(1− p), 0 < p < 1
n, p = 1.
(4.1)
We now turn our attention to E[Tn,m].
4.2 - Deriving E[Tn,m] Recursively
For 0 < p < 1, it is not difficult to see that E[Tn,m] must be finite by using
Theorem 4.1.1, as it is clear that E[Tn,m,] ≤ E[Tn].
Corollary 4.2.1. For 0 ≤ p ≤ 1 and fixed n and m, E[Tn,m] is finite.
Since Corollary 4.2.1 guarantees E[Tn,m] must be finite for any parameter
p, we now seek its derivation. The resulting formula is given as the following
theorem.
Theorem 4.2.2. Let Tn,m be the number of trials required to obtain either a run
of n successes or a run of m failures. Then E[Tn,m] is given by
E[Tn,m] =
m, p = 0
(1− pn)(1− qm)
pnq + pqm − pnqm, 0 < p < 1
n, p = 1.
32
Proof. In order to facilitate the creation of formulas in this proof, we define two
additional variables. We let A1 be the expected number of additional trials needed
when starting with a single success, and we let A2 be the expected number of
additional trials needed when starting with a single failure. Both A1 and A2 must
be finite, as both values are less than or equal to Tn,m. Now, consider the first
trial, which results in success with probability p and failure with probability q.
We may then write
E[Tn,m] = 1 + pA1 + qA2. (4.2)
Suppose we have a current run of one success. If we obtain successes in
the next n−1 trials, we are finished. This situation occurs with probability pn−1.
If instead we do not obtain a run of n − 1 successes, then we must have first
obtained a run of k successes followed by a failure, for some k ∈ {0, 1, ..., n− 2}.
This situation occurs with probability pkq. We have then observed k + 1 trials,
and since we now have a current run of one failure, we will require A2 additional
trials, for a total of k + 1 + A2 trials. Thus we obtain the following:
A1 = pn−1(n− 1) +n−2∑k=0
pkq(k + 1 + A2)
= pn−1(n− 1) +n−2∑k=0
pk(1− p)A2 +n−2∑k=0
pk(1− p)(k + 1)
= pn−1(n− 1) + (1− pn−1)A2 +n−2∑k=0
pk(k + 1)−n−2∑k=0
pk+1(k + 1).
We proceed by moving the A2 term to the left side:
A1 − A2(1− pn−1) = pn−1(n− 1) +n−2∑k=0
pk(k + 1)−n−1∑k=1
pkk
= pn−1(n− 1) + 1 +n−2∑k=1
pk(k + 1)−n−2∑k=1
pkk − pn−1(n− 1)
33
= 1 +n−2∑k=1
pk
=n−2∑k=0
pk.
The result is as follows:
A1 − A2(1− pn−1) =1− pn−1
1− p. (4.3)
If we apply the same steps with success and failure reversed, we also obtain
A2 − A1(1− qm−1) =1− qm−1
1− q. (4.4)
Thus, (4.3) and (4.4) give a system of two equations in the two variables
A1 and A2. Solving the system gives the following:
A1 =(1− pn−1)(1− qm)
pnq + pqm − pnqm, A2 =
(1− qm−1)(1− pn)
pnq + pqm − pnqm. (4.5)
For (4.2), we need pA1 and pA2. First, we have
pA1 =p(1− pn−1)(1− qm)
pnq + pqm − pnqm
=p− pn − pqm + pnqm
pnq + pqm − pnqm
=p− pn + (pnq − pnq)− pqm + pnqm
pnq + pqm − pnqm
=p− pn + pnq
pnq + pqm − pnqm− 1
=p− pn+1
pnq + pqm − pnqm− 1.
34
Similarly, we have
pA2 =q − qm+1
pnq + pqm − pnqm− 1.
Substituting these expressions into (4.2) and simplifying, we obtain
E[Tn,m] = 1 + pA1 + qA2
=p− pn+1
pnq + pqm − pnqm+
q − qm+1
pnq + pqm − pnqm− 1
=p− pn+1 + q − qm+1 − pnq − pqm + pnqm
pnq + pqm − pnqm
=1− pn(p+ q)− qm(q + p) + pnqm
pnq + pqm − pnqm
=1− pn − qm + pnqm
pnq + pqm − pnqm
=(1− pn)(1− qm)
pnq + pqm − pnqm,
which is valid for 0 < p < 1.
It should be noted that the formula in Theorem 4.2.2 confirms the formula
for n,m = 2 derived in Chapter 2. Making the substitutions n = 2 and m = 2
and simplifying yields
E[T2,2] =(1− p2)(1− q2)p2q + pq2 − p2q2
=1− p2 − q2 + p2q2
pq(p+ q)− p2q2
=p+ q − p2 − q2 + p2q2
pq − p2q2
=p(1− p) + q(1− q) + p2q2
pq(1− pq)
=2pq + p2q2
pq(1− pq)
35
=2 + pq
1− pq,
which agrees with Theorem 2.1.3.
4.3 - Relationships Between E[Tn] and E[Tn,m]
For 0 < p < 1, we have obtained the following expectations
E[Tn] =1− pn
pn(1− p), E[Tn,m] =
(1− pn)(1− qm)
pnq + pqm − pnqm. (4.6)
Also, suppose that we wish to obtain a run of m consecutive failures, where the
probability of failure is q. We denote the number of trials required by T ∗m. Then,
by symmetry, E[T ∗m] is given by
E[T ∗m] =1− qm
qm(1− q).
We then have the following identity.
Theorem 4.3.1. For 0 < p < 1,1
E[Tn]+
1
E[T ∗m]=
1
E[Tn,m].
Proof. Simplifying the left side of the identity gives
1
E[Tn]+
1
E[T ∗m]=pn(1− p)
1− pn+qm(1− q)
1− qm
=pnq(1− qm) + qmp(1− pn)
(1− pn)(1− qm)
=pnq − pnqm+1 + qmp− qmpn+1
(1− pn)(1− qm)
=pnq + pqm − pnqm
(1− pn)(1− qm)
=1
E[Tn,m].
36
In a sense, we can think of E[Tn] and E[T ∗m] as two “components” of
E[Tn,m]. Theorem 4.3.1 then says that the reciprocal of E[Tn,m] is equal to the
sum of the reciprocals of its two components.
In the special case where p = q = 1/2, we can relate E[Tn] and E[Tn,n].
Theorem 4.3.2. For p = q = 1/2, E[Tn] = 2E[Tn,n].
Proof. For p = 1/2, we have the following:
E[Tn] =1−
(12
)n(12
)n(1− 1
2)
=1−
(12
)n(12
)n+1
= 2
(1−
(12
)n(12
)n)
= 2(2n − 1);
E[Tn,n] =
[1−
(12
)n] [1−
(12
)n](12
)n+1+(12
)n+1 −(12
)2n=
[1−
(12
)n]2(12
)n − (12
)2n=
[1−
(12
)n]2(12
)n [1−
(12
)n]=
1−(12
)n(12
)n= 2n − 1
=E[Tn]
2.
37
4.4 - Conditional Expectations
Suppose we are now interested in the event that we obtain a run of n
successes before we obtain a run of m failures, which we denote Rnbm. We shall
derive P (Rnbm) by considering cases of how smaller runs may occur before a run
of n successes.
Theorem 4.4.1. Let Rnbm be the event that we obtain a run of n successes before
a run of m failures. Then P (Rnbm) =pn−1(1− qm)
pn−1 + qm−1 − pn−1qm−1.
Analogously, if RCnbm is the event that we obtain a run of m failures before a run
of n successes, then P (RCnbm) =
qm−1(1− pn)
pn−1 + qm−1 − pn−1qm−1
Proof. For Rnbm, we have two cases. In Case I, we start with a success, which
occurs with probability p. We might immediately obtain all remaining n − 1
successes, or our potential run of successes might be interrupted. An interruption
will have two parts: some number of successes followed by a failure (part 1),
then some number of failures followed by a success (part 2). Note that in part
1, we must obtain a failure within n − 1 trials, and in part 2, we must obtain a
success within m − 1 trials (or else we will obtain a run of m failures, which is
not allowed). After the interruption, we have one success, which returns to the
original Case I scenario.
To summarize, we have one success, followed by some number of interrup-
tions (possibly 0), followed by the remaining n − 1 successes. Also note that we
can use the geometric distribution for each interruption; i.e., the probability of
obtaining a failure within n−1 trials is 1−pn−1, and the probability of obtaining
a success within m − 1 trials is 1 − qm−1. Thus, by summing on the number of
38
possible interruptions k, the probability of Case I is
P1 = pn∞∑k=0
[(1− pn−1)(1− qm−1)]k.
In Case II, we start with a failure. We must then obtain some number
of failures followed by a success, which returns us to the beginning of case I. We
then obtain some number of interruptions followed by the final n − 1 successes.
In this case, we will have the probability of the failure, the probability of the
n− 1 successes, and a single geometric probability outside the summation for the
interruptions. The probability in this case is
P2 = qpn−1(1− qm−1)∞∑k=0
[(1− pn−1)(1− qm−1)]k.
The total probability, P (Rnbm), is simply the sum of P1 and P2:
P (Rnbm) = P1 + P2
= pn∞∑k=0
[(1− pn−1)(1− qm−1)]k + qpn−1(1− qm−1)∞∑k=0
[(1− pn−1)(1− qm−1)]k
= [pn + qpn−1(1− qm−1)]∞∑k=0
[(1− pn−1)(1− qm−1)]k
=pn + qpn−1(1− qm−1)
1− (1− pn−1)(1− qm−1)
=pn−1(p+ q(1− qm−1))pn−1 + qm−1 − pn−1qm−1
=pn−1(1− qm)
pn−1 + qm−1 − pn−1qm−1.
We can define RCnbm analogously, and we can find P (RC
nbm) similarly. Thus,
39
we have
P (Rnbm) =pn−1(1− qm)
pn−1 + qm−1 − pn−1qm−1,
and
P (RCnbm) =
qm−1(1− pn)
pn−1 + qm−1 − pn−1qm−1.
The derivation of the conditional probabilities leads to the following result.
Theorem 4.4.2. Let Tn,m|Rnbm be the number of trials required to obtain a run
of n successes given that we obtain such a run before a run of m failures. Then
E[Tn,m|Rnbm] = n+ (1−pn)(q−qm(m+q−mq))+(1−qm)(1−qm−1)(p−pn(n+p−np))(1−qm)(pnq+pqm−pnqm)
.
Also, if Tn,m|RCnbm is the number of trials required to obtain a run of m failures
given that we obtain such a run before a run of n successes, then E[Tn,m|RCnbm] =
m+ (1−qm)(p−pn(n+p−np))+(1−pn)(1−pn−1)(q−qm(m+q−mq))(1−pn)(pnq+pqm−pnqm)
.
Proof. As before, we consider the average in each of the two cases. To facilitate
this, let us define conditional averages for the geometrically distributed interrup-
tions. For X ∼ Geometric(p), we define e1 = E[X|X ≤ m− 1], and likewise, for
Y ∼ Geometric(q), we define e2 = E[Y |Y ≤ n− 1]. Then for Case I we have
A1 = pn∞∑k=0
[n+ k(e1 + e2)][(1− pn−1)(1− qm−1)]k, (4.7)
40
and for Case II we have
A2 = qpn−1(1− qm−1)∞∑k=0
[n+ e1 + k(e1 + e2)][(1− pn−1)(1− qm−1)]k. (4.8)
We then obtain
E[Tn,m|Rnbm] =A1 + A2
P (Rnbm). (4.9)
We will now derive e1 and e2.
Lemma 4.4.3. Suppose X ∼ Geometric(p). Then
E[X|X ≤ m− 1] = 1−qm−1(m+q−mq)p(1−qm−1)
.
Proof. We have E[X|X ≤ m−1] =
∑m−1k=1 kP (k)
P (X ≤ m− 1), and sinceX ∼ Geometric(p),
P (k) = pqk−1 and P (X ≤ m− 1) = 1− qm−1. Thus we obtain
E[X|X ≤ m− 1] =
∑m−1k=1 kpq
k−1
1− qm−1
=p[1− qm−1(m+ q −mq)]
p2(1− qm−1)
=1− qm−1(m+ q −mq)
p(1− qm−1)
Then, by Lemma 4.4.3, we have e1 =1− qm−1(m+ q −mq)
p(1− qm−1)and e2 =
1− pn−1(n+ p− np)q(1− pn−1)
. Now consider A1 from (4.7):
A1 = pn∞∑k=0
[n+ k(e1 + e2)][(1− pn−1)(1− qm−1)]k
= pn
[∞∑k=0
n[(1− pn−1)(1− qm−1)]k
+∞∑k=0
k
(1− qm−1(m+ q −mq)
p(1− qm−1)
)[(1− pn−1)(1− qm−1)]k
41
+∞∑k=0
k
(1− pn−1(n+ p− np)
q(1− pn−1)
)[(1− pn−1)(1− qm−1)]k
]
= npn∞∑k=0
[(1− pn−1)(1− qm−1)]k
+
[pn−1(1− pn−1)(1− qm−1(m+ q −mq))
+pn
q(1− qm−1)(1− pn−1(n+ p− np))
] ∞∑k=0
k[(1− pn−1)(1− qm−1)]k−1
=npn
1− (1− pn−1)(1− qm−1)+pn−1q(1− pn−1)(1− qm−1(m+ q −mq))
q[1− (1− pn−1)(1− qm−1)]2
+pn(1− qm−1(1− pn−1(n+ p− np))
q[1− (1− pn−1)(1− qm−1)]2
=
(1
q[pn−1 + qm−1 − pn−1qm−1]2
)(npnq(pn−1 + qm−1 − pn−1qm−1)
+pn−1q(1− pn−1)(1− qm−1(m+ q −mq))
+pn(1− qm−1)(1− pn−1(n+ p− np))).
Consider A2 from (4.8).
A2 = qpn−1(1− qm−1)∞∑k=0
[n+ e1 + k(e1 + e2)][(1− pn−1)(1− qm−1)]k
= qpn−1(1− qm−1)
[∞∑k=0
n[(1− pn−1)(1− qm−1)]k
+∞∑k=0
(k + 1)
(1− qm−1(m+ q −mq)
p(1− qm−1)
)[(1− pn−1)(1− qm−1)]k
+∞∑k=0
k
(1− pn−1(n+ p− np)
q(1− pn−1)
)[(1− pn−1)(1− qm−1)]k
]
=nqpn−1(1− qm−1)
1− (1− pn−1)(1− qm−1)
+ qpn−2(1− qm−1(m+ q −mq))∞∑k=0
(k + 1)[(1− pn−1)(1− qm−1)]k
+pn−1(1− qm−1)(1− pn−1(n+ p− np))
1− pn−1∞∑k=0
k[(1− pn−1)(1− qm−1)]k
42
=nqpn−1(1− qm−1) + qpn−2(1− qm−1(m+ q −mq))
pn−1 + qm−1 − pn−1qm−1
+[qpn−2(1− qm−1(m+ q −mq))](1− pn−1)(1− qm−1)
[1− (1− pn−1)(1− qm−1)]2
+[pn−1(1− qm−1)(1− pn−1(n+ p− np))](1− qm−1)
[1− (1− pn−1)(1− qm−1)]2
=
(1
(pn−1 + qm−1 − pn−1qm−1)2
)([nqpn−1(1− qm−1) + qpn−2(1− qm−1(m+ q −mq))](pn−1 + qm−1 − pn−1qm−1)
+qpn−2(1− qm−1(m+ q −mq))(1− pn−1 − qm−1 + pn−1qm−1)
+pn−1(1− qm−1)(1− pn−1(n+ p− np))(1− qm−1))
=
(1
(pn−1 + qm−1 − pn−1qm−1)2
)(nqpn−1(1− qm−1)(pn−1 + qm−1 − pn−1qm−1)
+qpn−2(1− qm−1(m+ q −mq)) + pn−1(1− qm−1)2(1− pn−1(n+ p− np))).
Then, from (4.9):
E[Tn,m|Rnbm] =A1 + A2
P (Rnbm)
=
(pn−1 + qm−1 − pn−1qm−1
pn−1q(1− qm)[pn−1 + qm−1 − pn−1qm−1]2
)(npnq(pn−1 + qm−1 − pn−1qm−1)
+ pn−1q(1− pn−1)(1− qm−1(m+ q −mq)) + pn(1− qm−1)(1− pn−1(n+ p− np))
+ nq2pn−1(1− qm−1)(pn−1 + qm−1 − pn−1qm−1)
+ q2pn−2(1− qm−1(m+ q −mq)) + qpn−1(1− qm−1)2(1− pn−1(n+ p− np)))
=
(1
pn−1q(1− qm)(pn−1 + qm−1 − pn−1qm−1)
)(npn−1q(pn−1 + qm−1 − pn−1qm−1)(p+ q(1− qm−1)
+ pn−2q(1− qm−1(m+ q −mq))(p(1− pm−1) + q)
+pn−1(1− qm−1)(1− pn−1(n+ p− np))(p+ q(1− qm−1)))
43
=
(1
pq(1− qm)(pn−1 + qm−1 − pn−1qm−1)
)(npq(pn−1 + qm−1 − pn−1qm−1)(1− qm)
+ q(1− qm−1(m+ q −mq))(1− pm) + q)
+p(1− qm−1)(1− pn−1(n+ p− np))(1− qm)))
= n+(1− qm−1(m+ q −mq))(1− pn)
p(1− qm)(pn−1 + qm−1 − pn−1qm−1)+
(1− pn−1(n+ p− np))(1− qm−1)q(pn−1 + qm−1 − pn−1qm−1)
= n+ q(1−pn)(1−qm−1(m+q−mq))+p(1−qm)(1−qm−1)(1−pn−1(n+p−np))pq(1−qm)(pn−1+qm−1−pn−1qm−1)
= n+ (1−pn)(q−qm(m+q−mq))+(1−qm)(1−qm−1)(p−pn(n+p−np))(1−qm)(pnq+pqm−pnqm)
.
We can also use Theorem 4.4.2 in conjunction with other results to obtain
the expected number of trials needed to obtain both a run of n successes and a
run of m failures.
Corollary 4.4.4. The expected number of trials needed to obtain both a run of n
successes and a run of m failures is E[Tn&m] = (E[Tn,m|Rnbm]+E[T ∗m])P (Rnbm)+
(E[Tn,m|RCnbm] + E[Tn])P (RC
nbm).
4.5 - Applications of Conditional Runs
There are several interesting applications for the conditional probabilities
and expectations that were derived in Section 4.4. For instance, we might devise
a game in which the player wins by obtaining a run of n successes and loses by
obtaining a run of m failures. Given fixed values for n and m, what probability
p should be used in order to create a fair game? The answer to that question is
identical to asking what value of p gives P (Rnbm) = 12, or, equivalently, P (Rnbm) =
P (RCnbm).
44
Setting the two conditional probabilities equal gives:
P (Rnbm) = P (RCnbm)
pn−1(1− qm)
pn−1 + qm−1 − pn−1qm−1=
qm−1(1− pn)
pn−1 + qm−1 − pn−1qm−1
pn−1(1− qm) = qm−1(1− pn)
We replace q with 1 − p in order to create a polynomial equation in a single
variable:
pn−1[1− (1− p)m] = (1− p)m−1(1− pn) (4.10)
Clearly, if we let n = m, then the solution to (4.10) must be p =1
2. For
small values of n and m, it is possible to solve (4.10) exactly (e.g., the solution for
n = 2,m = 1 is p =1√2
), but for large values of n and m, we will only be able to
provide approximate solutions. Approximate solutions to (4.10) for several values
of n and m are presented in Table 4.1.
Certainly, we can also use previous results to calculate the average length
of such games, as well as the average length of a winning game or the average
length of a losing game.
45
Table 4.1: Probabilities for a Fair Run Game
m = 1 m = 2 m = 3 m = 4 m = 5 m = 6 m = 7 m = 8
n = 1 0.500 0.293 0.206 0.159 0.129 0.109 0.094 0.083
n = 2 0.707 0.500 0.396 0.332 0.288 0.256 0.232 0.212
n = 3 0.794 0.604 0.500 0.433 0.385 0.349 0.320 0.297
n = 4 0.841 0.668 0.567 0.500 0.451 0.413 0.383 0.358
n = 5 0.871 0.712 0.615 0.549 0.500 0.462 0.430 0.404
n = 6 0.891 0.744 0.651 0.587 0.538 0.500 0.469 0.442
n = 7 0.906 0.768 0.680 0.617 0.570 0.531 0.500 0.473
n = 8 0.917 0.788 0.703 0.642 0.596 0.558 0.527 0.500
If one player wins on a run of n successes and the other player wins on a run of
m failures, using the corresponding probability in the chart will result in a fair
game.
46
CHAPTER 5
SIMULATIONS WITH MATHEMATICA
One way to verify the results we have obtained is through simulation.
To do this, we created several short programs using the Wolfram Mathematica
programming environment [13]. Each program relies on Monte Carlo methods
in order to sample the stopping times using various inputs for the probability
and run length values. We then calculate the observed expectation based on the
sample and compare this with the theoretical expectation based on the formulas
from previous chapters. In this chapter, we discuss the general approach used to
simulate each scenario. The actual code for each program is presented in various
appendices.
The first part of each program defines the user input variables, which in-
clude the probability value(s), run length(s), and number of simulations. We then
initialize variables to keep track of the progress of each simulation. To generate
a trial, we use the RandomReal function, which generates a pseudorandom real
number between 0 and 1. This allows us to easily categorize each trial as either
a success or failure. Depending on which result we are simulating, we program
a stopping condition for the main loop. Finally, we calculate both the simulated
and theoretical values for the result we are testing.
Three simulations were created, testing Theorem 4.1.2, Theorem 4.2.2,
and Theorems 4.4.1 and 4.4.2, respectively. In Table 5.1, we present the results
of several simulations of each scenario using the input values from each program’s
code, which can be found in Appendices A, B, and C, respectively.
47
Table 5.1: Simulation Results
Theorem Theoretical Value Simulated Values
Theorem 4.1.2 9.07407 9.05043, 9.0642, 9.08414, 9.08956, 9.06104
Theorem 4.2.2 6.09683 6.10786, 6.08681, 6.08848, 6.09193, 6.0931
Theorem 4.4.1 0.548863 0.5469, 0.54925, 0.5492, 0.54824, 0.54923
Theorem 4.4.2 9.63839 9.64152, 9.70015, 9.62028, 9.62533, 9.62025
See the appendices for the input values of each simulation.
48
CHAPTER 6
GENERALIZATIONS AND FURTHER STUDY
In this chapter, we will examine several related problems, providing a
number of avenues for further study.
6.1 - Obtaining a Run with a Non-Constant Probability
Previous results relied on the assumption that the probability of success
remains constant for each trial. Suppose that we relax this requirement as follows.
As before, we let n be the length of a run of successes that we seek. Now, we
let pi be the probability of success, given that the current run of successes is of
length i − 1, for i ∈ {1, 2, ..., n}. In other words, the probability of success is p1
while seeking the first success in a row, p2 while seeking the second success in a
row, and so on. The probabilities of failures are defined as qi = 1− pi. Then Tn
is the number of trials needed to obtain a run of n consecutive successes under
these conditions.
It is not difficult to show that E[Tn] is finite so long as all pi are nonzero.
We can modify the argument used to prove Theorem 4.1.1 by using the product
of all pi instead of simply pn. Thus we have
Theorem 6.1.1. For 0 < pi ≤ 1 and fixed n, E[Tn] is finite.
The recursive procedure used to prove the expectation can be modified in
order to obtain a formula in this new scenario.
Theorem 6.1.2. Let Tn be the number of trials required to obtain a run of n
consecutive successes, given that the probability pi is the probability of obtaining
the ith success. Then
E[Tn] =1 +
∑n−1j=1
(∏ji=1 pi
)∏n
i=1 pi.
49
Proof. Suppose we obtain a run of n successes immediately. This requires n trials
and occurs with probability p1 · ... · pn. If instead we obtain a run of k successes
followed by a failure, for some k ∈ {0, 1, ..., n−1}, we have observed k+1 trials and
will require an additional E[Tn] trials. This occurs with probability p1 ·...·pk ·qk+1.
(The product p1 · ... · pk may be empty, if k = 0.) Thus we obtain the following.
E[Tn] = q1(1 + E[Tn]) + p1q2(2 + E[Tn]) + ...+ p1...pn−1qn(n+ E[Tn]) + p1...pnn.
Since E[Tn] is finite, we may move all those terms to the left-hand side.
E[Tn]−E[Tn](q1+p1q2+...+p1...pn−1qn) = q1+2p1q2+...+np1...pn−1qn+np1...pn.
(6.1)
Simplifying the left-hand side yields
E[Tn]− E[Tn](q1 + p1q2 + ...+ p1...pn−1qn)
=E[Tn]− E[Tn](1− p1 + p1(1− p2) + ...+ p1...pn−1(1− pn))
=E[Tn]− E[Tn](1− p1...pn)
=E[Tn](p1...pn),
and simplifying the right-hand side yields
q1 + 2p1q2 + ...+ np1...pn−1qn + np1...pn
=1− p1 + 2p1(1− p2) + ...+ np1...pn−1(1− pn) + np1...pn
=1 + p1 + p1p2 + ...+ p1...pn−1.
50
Substituting into (6.1) gives
E[Tn](p1...pn) = 1 + p1 + p1p2 + ...+ p1...pn−1,
so we finally reach
E[Tn] =1 + p1 + p1p2 + ...+ p1...pn−1
(p1...pn)
=1 +
∑n−1j=1
(∏ji=1 pi
)∏n
i=1 pi.
6.2 - Obtaining a Run in a Finite Set
When considering events that come from the Bernoulli distribution, it is
possible to imagine that we are sampling from an infinite set of outcomes, where
p is the proportion of outcomes that are successes. It is also possible to study
the situation in which we have a finite number of outcomes, of which a certain
proportion p are successes. If we sample from this set with replacement, then
this changes nothing. Sampling without replacement, however, leads to a new
scenario.
Suppose we have a set of a successes, b failures (with a+ b = c outcomes),
and we wish to obtain a run of n consecutive successes by sampling from the set
without replacement. An example would be a deck of playing cards, where we
might consider the 13 hearts to be successes and each of the other 39 cards to
be failures. First we note that we are no longer guaranteed to obtain a run at
all, because we might run out of cards before a run is obtained. Thus, before
51
considering the expected number of trials needed to obtain a run, we should
calculate the probability that a run actually occurs.
Calculation of this probability turns out to be reasonably straightforward,
and a formula is presented as a theorem. However, we will first present a result
from Heubach and Mansour as a lemma [4]. This involves finding the number of S-
restricted compositions of a number. A composition of an integer is analogous to a
partition in which the order of the pieces matters, and an S-restricted composition
is a composition in which the size of each piece belongs to the set S.
Lemma 6.2.1. The number of S-restricted compositions of an integer n into k
parts is the coefficient of xn of the polynomial
(∑s∈S
xs
)k
.
The number of S-restricted partitions will be used to derive the probability
of obtaining a run of n consecutive successes in the next theorem.
Theorem 6.2.2. Suppose we are drawing without replacement from a finite deck
of c = a + b cards, where a is the number of successes and b is the number of
failures. Let Dn be the event that we obtain a run of n consecutive successes
before running out of cards. Then
P (Dn) =
(c
a
)− Z(c
a
)
where Z is the coefficient of xa in (1 + x+ x2 + ...+ xn−1)b+1.
Proof. First, note that there are
(c
a
)distinct arrangements of the deck. Let
Z be the number of arrangements which do not contain a run of n consecutive
52
successes. Then
(c
a
)− Z is the number of arrangements which do contain such
a run. We claim that Z is the coefficient of xa in (1 + x+ x2 + ...+ xn−1)b+1.
Suppose that a particular arrangement of cards does not contain a run of
n consecutive successes. Then there are at most n − 1 successes before the first
failure, after the last failure, and between any two failures. In effect, we have
b + 1 “gaps” in which we must place our successes, but we must place no more
than n−1 successes in each gap. Thus Z is the number of {0, ..., n−1}-restricted
compositions of a with b+ 1 parts, and the value of Z can be found using Lemma
6.2.1.
Thus Z is the number of arrangements which do not contain a run of n
successes, and
P (Dn) =
(c
a
)− Z(c
a
) .
Although Theorem 6.2.2 is not very complicated, we do not reach a closed-
form expression for the probability P (Dn). Furthermore, calculating the expected
number of draws needed to obtain a run of n successes (given that we actually
obtain one) likely requires another technique. We thus leave the derivation of
such a formula as an avenue for future study.
6.3 - Obtaining a Run with a Categorical Distribution
The Bernoulli distribution allows for only two possible outcomes (1 for
success, 0 for failure). We might also be interested in obtaining a run of identical
outcomes when there are k possible outcomes (for instance, k = 6 for a standard
53
six-sided die). In this case, a categorical distribution would be appropriate. The
categorical distribution, which is also known as a generalized Bernoulli distri-
bution or a multinoulli distribution, assigns probabilities p1, ..., pk to each of k
possible outcomes, where the sum of all pi is equal to 1.
If we wish to obtain a run of n of a particular outcome, then this is equiv-
alent to the Bernoulli distribution. Suppose, then, that we wish to obtain a run
of ni consecutive outcome i and that it does not matter which outcome produces
the run. We can define Tn to be the number of trials required. If k = 2, then the
average number of trials, E[Tn1,n2 ], can be found using Theorem 4.2.2. If k > 2,
however, the problem becomes more difficult. Attempting to provide a recursive
argument for certain values of k is possible, though likely very tedious. At any
rate, such an approach would not solve the general problem for all values of k.
For a general solution, we return to Theorem 4.3.1, which states
1
E[Tn]+
1
E[T ∗m]=
1
E[Tn,m].
In terms of a categorical distribution with k = 2, we may write
1
E[Tn1 ]+
1
E[Tn2 ]=
1
E[Tn1,n2 ].
It seems possible that this relationship might hold for larger values of k.
Indeed, observation of simulations suggests that Theorem 4.3.1 does extend to a
categorical distribution in the natural way. Although we do not have a proof of
the result, a likely approach would use induction on k, in which the base case of
k = 2 is merely Theorem 4.2.2. Formalizing our hypothesis, we have the following.
Conjecture 6.3.1. For a categorical distribution with k outcomes, each occurring
with respective probability p1, ..., pk, and respective desired run lengths n1, ..., nk,
54
the average number of trials to obtain a desired run of any outcome is
E[Tn1,...,nk] =
1∑ki=1
1E[Tni ]
.
We have also created a simulation to test Conjecture 6.3.1, using the meth-
ods described in Chapter 5. For the code of this simulation, refer to Appendix
D.
6.4 - Obtaining a Run from Multiple Sequences
Previously, we have considered a stopping time that depends only on a
single sequence of outcomes, though there might be multiple conditions on which
we stop. For instance, we have previously examined the stopping condition in
which we want either a run of n consecutive successes or a run of m consecutive
failures.
We may also consider the scenario in which we have several independent
sequences of outcomes, each with its own stopping condition and corresponding
stopping time Si. We then define the stopping time S as the minimum of the
stopping times Si. This is different from the scenario in which we have multiple
stopping conditions on a single sequence because now we allow for the possibility
that two sequences stop after the same number of trials. With a single sequence,
however, it is impossible to obtain both a run of n successes and a run of m
failures, for instance, on the same trial.
Suppose that we have k sequences each with a stopping condition and
stopping time Si for i = 1, 2, ...k. We would like to obtain a formula for the
expectation E[S], but this is very difficult without knowing something about the
55
individual stopping conditions for each of the k sequences. However, we will
present a result about the minimum of several exponential variables as a basis to
guide future research.
Theorem 6.4.1. If Xi ∼ Exponential(λi) for i = 1, 2, ..., n, and all Xi are inde-
pendent, then min{Xi} ∼ Exponential(∑n
i=1 λi).
For an exponential distribution with parameter λ, the expectation for the
distribution is1
λ. Thus, to find the expectation for the minimum of several
exponentially distributed variables, which is itself exponentially distributed, we
must take the reciprocal of its parameter.
Though the variables for the sequences we have studied do not follow the
exponential distribution, and are in fact discrete rather than continuous, it is
possible that a similar approach could be used to find E[S]. We could then
combine previously derived expectations for each sequence (as well as possibly
Conjecture 6.3.1 about the categorical distribution) in order to describe the most
general stopping time, which depends on obtaining a run of identical outcomes in
at least one sequence of trials.
56
APPENDIX A - CODE FOR THEOREM 4.1.2
57
APPENDIX B - CODE FOR THEOREM 4.2.2
58
APPENDIX C - CODE FOR THEOREMS 4.4.1/4.4.2
59
APPENDIX D - CODE FOR CONJECTURE 6.3.1
60
REFERENCES
[1] S. Ahdout, S. Rothman, H. Strassberg, Streaks and Generalized Fibonacci
Sequences, The College Mathematics Journal 37, No. 3 (2006) 221-223.
[2] T. Gilovich, R. Vallone, A. Tversky, The Hot Hand in Basketball: On the
Misperception of Random Sequences, Cognitive Psychology 17, 295-314 (1985).
[3] J.D. Harper, K.A. Ross, Stopping Strategies and Gambler’s Ruin, Mathemat-
ics Magazine 78, No. 4 (2005) 255-268.
[4] S. Heubach, T. Mansour, Compositions of n with parts in a set, Congressus
Numerantium 164 (2004) 127-143.
[5] T. Koshy, Fibonacci and Lucas Numbers with Applications Wiley, 2001.
[6] E.F. Krause, Bernoulli Trials and Family Planning, Mathematics Magazine
73, No. 3 (2000) 212-216.
[7] R. Larson, Elementary Linear Algebra, Seventh Edition Cengage Learning,
2013.
[8] M. Rabin, D. Vayanos, The Gambler’s and Hot-Hand Fallacies: Theory and
Applications, The Review of Economic Studies 77, No. 2 (2010) 730-778.
[9] M.F. Schilling, The Longest Run of Heads, The College Mathematics Journal
21, No. 3 (1990) 196-207.
[10] M.F. Schilling, Long Run Predictions, Math Horizons 1, No. 2 (1994) 10-12.
[11] M.F. Schilling, The Surprising Predictability of Long Runs, Mathematics
Magazine 85, No. 2 (2012) 141-149.
[12] W. Stadje, The Maximum Average Gain in a Sequence of Bernoulli Games,
The American Mathematics Monthly 115, No. 10 (2008) 902-910.
[13] Wolfram Research, Inc., Mathematica, Version 10.1, Champaign, IL (2015).
61