5
A Sequential Decision Problem with a Finite Memory Author(s): Herbert Robbins Source: Proceedings of the National Academy of Sciences of the United States of America, Vol. 42, No. 12 (Dec. 15, 1956), pp. 920-923 Published by: National Academy of Sciences Stable URL: http://www.jstor.org/stable/89641 . Accessed: 05/05/2014 17:31 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . National Academy of Sciences is collaborating with JSTOR to digitize, preserve and extend access to Proceedings of the National Academy of Sciences of the United States of America. http://www.jstor.org This content downloaded from 194.29.185.176 on Mon, 5 May 2014 17:31:47 PM All use subject to JSTOR Terms and Conditions

A Sequential Decision Problem with a Finite Memory

Embed Size (px)

Citation preview

Page 1: A Sequential Decision Problem with a Finite Memory

A Sequential Decision Problem with a Finite MemoryAuthor(s): Herbert RobbinsSource: Proceedings of the National Academy of Sciences of the United States of America,Vol. 42, No. 12 (Dec. 15, 1956), pp. 920-923Published by: National Academy of SciencesStable URL: http://www.jstor.org/stable/89641 .

Accessed: 05/05/2014 17:31

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

National Academy of Sciences is collaborating with JSTOR to digitize, preserve and extend access toProceedings of the National Academy of Sciences of the United States of America.

http://www.jstor.org

This content downloaded from 194.29.185.176 on Mon, 5 May 2014 17:31:47 PMAll use subject to JSTOR Terms and Conditions

Page 2: A Sequential Decision Problem with a Finite Memory

mnote

[XI = m2a

d write

_- (E + E ) -a, n = mJl + 2t t 1 W2+1 nt

Len it follows immediately that

4(t) < ll(t)I + 2( t It

I 1(t)I + -3() + + I -4(t)I I+ 5(t)I + -6(t) 1

<.-2 = e 2

- t < t". E.g., 4-(t) = o(t) as t -- 0. This proves the theorem. 2. Now, let A(t) be an odd function. Suppose that

co

I(t) E b sin nt. n =I

rite

T(t) = fJo' ,(u) du.

considering

I-(t) c 1 - cos nt 0_ sin nu t

t I nt b nu 2

d the same arguments as in the proof of Theorem I, we establish FHEOREM 2. If 2b,, converges absolutely, then I(t)/t --) B as t - + o5, where B the sum of Zbn-

This theorem can be considered as the parallel theorem to Hardy-Littlewood's. Cf. G. H. rdy and J. E. Littlewood, "On Young's Convergence Criterion for Fourier Series," Proc. idon Math. Soc., Ser. 2, 28, 305-306, 1928.

SEQUENTIAL DECISION PROBLEM WITH A FINITE MEMORY*

BY HERBERT ROBBINS

COLUMBIA UNIVERSITY

Communicated by Paul A. Smith, October 1, 1956

1. Summary.-We consider the problem of successively choosing one of two ,ys of action, each of which may lead to success or failure, in such a way as to Lximize the long-run proportion of successes obtained, the choice each time being sed on the results of a fixed number of the previous trials.

mnote

[XI = m2a

d write

_- (E + E ) -a, n = mJl + 2t t 1 W2+1 nt

Len it follows immediately that

4(t) < ll(t)I + 2( t It

I 1(t)I + -3() + + I -4(t)I I+ 5(t)I + -6(t) 1

<.-2 = e 2

- t < t". E.g., 4-(t) = o(t) as t -- 0. This proves the theorem. 2. Now, let A(t) be an odd function. Suppose that

co

I(t) E b sin nt. n =I

rite

T(t) = fJo' ,(u) du.

considering

I-(t) c 1 - cos nt 0_ sin nu t

t I nt b nu 2

d the same arguments as in the proof of Theorem I, we establish FHEOREM 2. If 2b,, converges absolutely, then I(t)/t --) B as t - + o5, where B the sum of Zbn-

This theorem can be considered as the parallel theorem to Hardy-Littlewood's. Cf. G. H. rdy and J. E. Littlewood, "On Young's Convergence Criterion for Fourier Series," Proc. idon Math. Soc., Ser. 2, 28, 305-306, 1928.

SEQUENTIAL DECISION PROBLEM WITH A FINITE MEMORY*

BY HERBERT ROBBINS

COLUMBIA UNIVERSITY

Communicated by Paul A. Smith, October 1, 1956

1. Summary.-We consider the problem of successively choosing one of two ,ys of action, each of which may lead to success or failure, in such a way as to Lximize the long-run proportion of successes obtained, the choice each time being sed on the results of a fixed number of the previous trials.

This content downloaded from 194.29.185.176 on Mon, 5 May 2014 17:31:47 PMAll use subject to JSTOR Terms and Conditions

Page 3: A Sequential Decision Problem with a Finite Memory

I. Introduction.-An experimenter has two coins, coin 1 and coin 2, with re- active probabilities of coming up heads equal to pi = 1 - qi and p2 = 1 - q2, the ues of which are unknown to him. He wishes to carry out an infinite sequence tosses, at each toss using either coin 1 or coin 2, in such a way as to maximize

long-run proportion of heads obtained. His problem is to find a rule for decid-

; at each stage, on the basis of the results of the previous tosses, whether to use n 1 or coin 2 for the next toss. [f he knew at the outset which coin had the larger p-value, he would of course use ~xclusively, irrespective of the results of the tosses, and by doing so would know it with probability 1

number of heads in first n tosses lim -= max (pl, p2). (1)

n ---co n

have shown elsewhere' that even without a knowledge of the p-values there st rules for carrying out the sequence of tosses such that equation (1) holds for

pl, p2. However, any rule with this property must clearly be such that the de- ion as to which coin to use for the nth toss depends on the results of all the first - 1 tosses; in other words, the rule requires a "memory" of unlimited length. 3 shall be interested here in seeing what can be done with rules requiring only a ite memory: a rule will be said to be of type r if the decision as to which coin to

for the nth toss depends only on the results of tosses n - r, n - r + 1, . . . , - 1. (By the "result" of a toss we mean both which coin was used and which ae came up.) We shall exhibt a rule Rr, of type r, for which with probability 1

number of heads in first n tosses p1q2r + p2q,l lim - . (2)

n --c0 n qlr + q2r

)te that as r a--- o the right-hand side of equation (2) steadily increases and tends the right-hand side of equation (1) as limit. It would be interesting to know whether our rule Rr is "best" in the sense that the ,ht-hand side of equation (2) is at least as great, for all pi, p2, as the corresponding iction for any other rule of type r for which the left-hand side of equation (2) ists and is a symmetric function of pi, p2. We do not know whether this is true. 3. Definition of the Rule Rr and Proof of the Theorem.-We recall2 that if a single

in, with probability q = 1 - p of obtaining tails on each toss, is tossed repeatedly til the first run of r consecutive tails occurs, then the expected number of tosses

luired is

1 - qr pqr

ow suppose the coin is tossed repeatedly with the following stopping rule: stop bhe first toss is tails, otherwise continue tossing until the first run of r consecutive ils occurs. Then the expected number of tosses required will be

q.1 + p- + ) = 1-q (3)

This content downloaded from 194.29.185.176 on Mon, 5 May 2014 17:31:47 PMAll use subject to JSTOR Terms and Conditions

Page 4: A Sequential Decision Problem with a Finite Memory

a now prove the following: rHEOREM. Define the rule Rr as follows: start tossing with coin 1. Stop if the

-t toss is tails, otherwise continue tossing until the first run of r successive tails occurs I then stop. This defines the first block of tosses with coin 1. Now start tossing h coin 2 and apply the same rule, obtaining the first block of tosses with coin 2.

en start again with coin 1 and apply the same rule, obtaining the second block of ses with coin 1, and so on indefinitely, thus generating an infinite sequence of tosses

-sisting of alternate blocks of tosses with coins 1 and 2.

Vith rule Rr so defined, we assert that equatian (2) hold with probability 1.

Proof: Let xJ(y,) denote the length of the ith block of tosses with coin 1 (2), = 1, 2, .... The process of tossing generates with probability 1 an infinite se-

ence of independent random variables

Xl yl, xl, yX2, 2. . . , Xn, Y, .... (4)

e proportion of times that coin 1 is used during the first 2n blocks of tosses is then

___ + . . + Xn (Xi + . . . + Xn) /n + . . . + Xn + y1 + . . . + Yn (Xi + . . -+ Xn)/n + (y, +.. . + y.) /n,

(5)

d, by the strong law of large numbers and equation (3), this tends with proba- ity 1 to the limit

l/ ql- _ _ q(6)

l/qa + 1 q2r qlr + q2r'

w let Un denote the proportion of heads obtained in the first 2n blocks of tosses; an

= (proportion of times coin 1 is used in first 2n blocks of tosses) * (propor- tion of times heads occurs among these tosses with coin 1) + a similar

product for coin 2, (7)

that with probability 1

,.m q2r qlr plq2r + p2qlr lim u- = --. pl + p ap2 =- . (8) n- - qlq + q2 q - + q2 q q + q2

Tow let n be any positive integer, and define the random integer N = N(n) so it

Xi + yl + .- . .+ XN + YN < n < xl + yi + + XN+ + YN+I. - (9)

noting by wn the number of heads obtained among the first n tosses, we have

mrber of heads in first 2N blocks < wn < number of heads in first 2N + 2 blocks, (10)

I from (9) and (10) it follows that

(Xi + yi + -.- + XN + yN) W < + Yt + ... + XN+i + YN+I n

(xi + Yi + . . . + XN+I + yN+l)

x1 + Yi + . . . + XN + AY

This content downloaded from 194.29.185.176 on Mon, 5 May 2014 17:31:47 PMAll use subject to JSTOR Terms and Conditions

Page 5: A Sequential Decision Problem with a Finite Memory

t since (XN+l + yN+i)/N tends to 0 with probability 1 as n, and therefore N, be- nes infinite, it follows that with probability 1

X 1 + yl + .. + XN +-1 + YN+ lim --+---+----+x---+-N?= 1, n---.o Xl + Yl + * - + XN + YN

i hence with probability 1, from relations (8) and (11),

w n/n = lim UN = (plq2 + p-p2qr)/(qlr + q2r), (12) co N-. o

ich was to be proved.

rhe author is indebted to John W. Tukey for a helpful remark which simplified a preceding proof.

Work sponsored by the Office of Scientific Research of the Air Force, Contract No. AF18(600)- , Project No. R-345-20-7. H. Robbins, "Some Aspects of the Sequential Design of Experiments," Bull. Am. Math. Soc., 529-532, 1952. W. Feller, An Introduction to Probability Theory and Its Applicotions, 1, (New York, 1950), 266.

SLIP SURFACES IN PLASTIC SOLIDS

BY T. Y. THOMAS

GRADUATE INSTITUTE FOR MATHEMATICS AND MECHANICS, INDIANA UNIVERSITY

Communicated October 2, 1956

1. Introduction.--In a previous communication' we considered the problem of a determination of the inclination of slip bands, conceived as strips of plastic ma- ial in an elastically deformed solid at the yield point, and on the basis of this con- )tion we deduced an expression for the inclination of the bands which appear in a usual tension test on flat rectangular bars. The present paper2 differs from the mer in that (1) we deal with the slip surface as such and (2) we consider the entire

id, in which the slip surface is immersed, to be in the plastic state. This latter wpoint is perhaps the one most commonly adopted. lt is assumed that the stress field is plane, by which it is meant that a system of tangular co-ordinates xi, X2, X3 can be chosen such that

-a3 = 0 (a = 1, 2, 3); (a- = 4(ia, X2, t) (a, B = 1, 2), (1)

ere the a's are the components of the stress tensor and the O's are at most func- ns of the two co-ordinates xl, x2 and the time t. To the conditions (1) we must i the basic dynamical equations relating the discontinuities in the above com- ients a, the velocity components va, and the density p across a shock surface, nely,

pi(vln - G) = P2(v2, - G), (2)

[aa,3]pv = Pl(v,l - G) [va], (3)

t since (XN+l + yN+i)/N tends to 0 with probability 1 as n, and therefore N, be- nes infinite, it follows that with probability 1

X 1 + yl + .. + XN +-1 + YN+ lim --+---+----+x---+-N?= 1, n---.o Xl + Yl + * - + XN + YN

i hence with probability 1, from relations (8) and (11),

w n/n = lim UN = (plq2 + p-p2qr)/(qlr + q2r), (12) co N-. o

ich was to be proved.

rhe author is indebted to John W. Tukey for a helpful remark which simplified a preceding proof.

Work sponsored by the Office of Scientific Research of the Air Force, Contract No. AF18(600)- , Project No. R-345-20-7. H. Robbins, "Some Aspects of the Sequential Design of Experiments," Bull. Am. Math. Soc., 529-532, 1952. W. Feller, An Introduction to Probability Theory and Its Applicotions, 1, (New York, 1950), 266.

SLIP SURFACES IN PLASTIC SOLIDS

BY T. Y. THOMAS

GRADUATE INSTITUTE FOR MATHEMATICS AND MECHANICS, INDIANA UNIVERSITY

Communicated October 2, 1956

1. Introduction.--In a previous communication' we considered the problem of a determination of the inclination of slip bands, conceived as strips of plastic ma- ial in an elastically deformed solid at the yield point, and on the basis of this con- )tion we deduced an expression for the inclination of the bands which appear in a usual tension test on flat rectangular bars. The present paper2 differs from the mer in that (1) we deal with the slip surface as such and (2) we consider the entire

id, in which the slip surface is immersed, to be in the plastic state. This latter wpoint is perhaps the one most commonly adopted. lt is assumed that the stress field is plane, by which it is meant that a system of tangular co-ordinates xi, X2, X3 can be chosen such that

-a3 = 0 (a = 1, 2, 3); (a- = 4(ia, X2, t) (a, B = 1, 2), (1)

ere the a's are the components of the stress tensor and the O's are at most func- ns of the two co-ordinates xl, x2 and the time t. To the conditions (1) we must i the basic dynamical equations relating the discontinuities in the above com- ients a, the velocity components va, and the density p across a shock surface, nely,

pi(vln - G) = P2(v2, - G), (2)

[aa,3]pv = Pl(v,l - G) [va], (3)

This content downloaded from 194.29.185.176 on Mon, 5 May 2014 17:31:47 PMAll use subject to JSTOR Terms and Conditions