On the Computation of the Optimal Cost Function for Discrete Markov Models With Partial Observations

8/2/2019 On the Computation of the Optimal Cost Function for Discrete Markov Models With Partial Observations

1/41

, iTII

Annalsof OperationsResearch 9 (7997)477-512 477

ON THE COMPUTATION OF THE OPTIMAL COST FUNCTIONFOR DISCRETE TIME MARKOV MODELS WITH PARTIALOBSERVATIONS *EnriqueL. SERNIK and Steven. MARCUSDepartment of Electrical and Computer Engineering, The Uniuersity of Texas at Austin, Austin,Texas 78712-1084,USA

Wc consider several applications of two state, finite action, infinite horizon, discrete-timeMarkov decision processeswith partial observations, for two special cases of observationquality, and show that in each of these cases he optimal cost function is piecewise inear.This in turn allows us to obtain either explicit formulas or simplified algorithms to computethe optimal cost functi on and t he associated optimal control policy. Several examples arepresented.

Keywords: Markov chains (finite state and action spaces, partial observations), dynamicprogramming (infinite horizon, value iteration algorithm).

l. IntroductionFinding structural characteristicsof the optimal cost and optimal policiesassociatedwith stochastic ontrol systems,whereonly partial observations f the

states are available , has been a problem that has interested researchers n thedifferent disciplines where these models occur. This is clear since such knowledgegreatly facilitates the design of controls to improve the performance of thesystem. The determination of structural properties is important, both because tdrastically reducescomputation, and because ften th e discretizations ssociatedwith the numerical procedure make it difficult to obtain certain information fromthe system,such as sensitivity of the optimal policy with respect o small changesin the parametersof the model-

For the kind of problems in which we are interested, namely control of finitestate, finite action, infinite horizon, partially observed Markov chains, severalimportant structural results have been obtained, and for the sake of brevity we* Research supported in part by the Air Force Office of Scientific Research under Grant

AFOSR-86-0029, in part by the National ScienceFoundation under Grant ECS-8617860, n partby the Advanced Technology Program of the State of Texas, and in part by the DoD JointServices Electronics Program through the Air Force Office of Scientific Research (AFSC)Contract F49620-86-C-0045.

o J,C. Baltzer A.G. Scientific Publishing Company


2/41

472 E.1,. Sernik, S.I. Marcus / Discrete time Markou modelscannot list them all here. As examples though, we mention some of thesestructural results, and refer the reader to the other papers cited in this work, andto the references therein. White [32] in his work on production-replacementmodels, and for the discounted cost case, showed that among the stationaryoptimal policies there is a smaller class of optimal policies, called structuredpolicies,such that one needonly look among thesewhen searching or an optimalreplacement policy. The extension of these results to thc case in which theperformance index is the averagecost, has recently been obtained by Fernandez,Arapostathis and Marcus in [8,9]. Albright [1 ] showed for a two dimensionalproblem that both the optimal cost and the optimal poiicy are monotonefunctions of the current information state. Kumar and Seidman [16], in theirwork on the one-armedbandit problem, showed the existenceof a function whosegraph, called the " boundary betweendecision regions", divides the state spaceinto regions in such a way that in each region only on e decision is optimal.Sondik [26,27), showed for the strictly partially observed (PO) case that theinfinite horizon, optimal cost function is piecewise inear whenever the associatedoptimal policy is finitely transient, an d provided an algorithm to com pute theoptimal cost and policy. Recently, Sernik and Marcus 124,251 howed for aparticular replacementmodel, that the infinite horizon, optimal discounted cost ispiecewisc inear even when the associated ptimal policy is not finitely transient,and provided explicit formulas for the computation of the optimal cost an dpolicy.

As is mentioned in [16], for the type of problems in which we are interested(and more generally or Bayesianadaptivecontrol problems,see 16]),writing theDynamic Programming (DP) equation for the optimal expected eturn is rela-tively simple, bu t obtaining its explicit solution is extremely difficult. In thispaper, our objective is to show that the sameprocedure applied by the authors in[24,25] to a two dimensional replacement modcl, can be used to obtain similarresults to those of [24,251 or other applications. There are several interestingdecision problems that are naturally modeled using two states. For example, see[1 ] (advertising model), [12] (internal audit timing), U6l (one-armed banditproblem), [18] (optimal stopping times), [19] (equipment checking and targetsearch), 28 ] (inspection of standby units) an d [13,20,25,29,371(quality ontrol,replacementmodels). The results to be presented below provide exact solutionsfor most of these problems. In addition, these results can be used to obtainadditional insight into the structure of each particular application (cf. [25,examples 1-4]), and to develop theoretical insights in more complex models.

This work is organized as follows. In section 2 we analyze the replacementmodel studied, among others, by Ross [20], White [31], Hughes [13] and Wang[29]. Section 3 is devoted to the analysis of the inspection model of Thomas,Jacobs and Gaver [28]. In section 4 we present other applications for which theapproach employed in sections2 and 3 can be used.In section 5 we provide someexamples.Section6 consistsof conclusions.


3/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 4732. Markovian replacement models

Consider a machine that produces tems and which can be in one of two states,"good" or "bad". Accordingly, f { x,, t :0, 1, . . . } representshe stateof themachine,we have x, X: {good, bad} = {0 , 1} with X the state space.Supposethat the machine deteriorateswith use and that there are three actions availableto the decision maker: keep the machine producing, inspect it , or replace it. Wedenoteby {u, , / :0, 1, . . . } the control process, i th z, e U = {produce, nspect,replace) {0 , 1,2}. Here "produce" stands for "produce without inspection",and "inspect" refers o "produce with inspection". The observationprocess,{1 ,t :7,2, . . . ) , takesvalues n Y= {0, 1}.Since he state of the machine s only partially observed PO), this problem isconverted into an equivalent completely observed (CO) Markov Decision (MD)problem (see, e.g., [5 , chapter 3], and [8,9,31,32]), n which the conditionalprobability distribution (also referred to as the information vector) n(t): (1 -p(t), p(l)) provides the necessary nformation to select the control at time /.Here p(l) is the probability that the machine s in the bad state given previousobservations up to time l) and actions up to time I - 1); see,e.g., 5, p. 100].Wewill often denote n(l) and p(t) by n and p respectively,omitting explicitdependence n /.

We will consider two di fferent casesof observation quality of this PO problem:the completely unobserved CU) case,where only two actions are available to thedecision maker, namely U: {0,2}; and the closed loop (CL) case, with U:{0 , 1,2}, in which there are no observations uring production, but costly perfectobservationsare available during inspection. Thesecasesare also of interest sincethey provide upper and lower bounds for the optimal cost function in the PO case(see 3,33]) .Of particular importance for the contents of this section are the resultsconcerning the structure of the optimal policy associatedwith the models consid-ered here.Ross [20, heorem 3.3]) and White ([31, heorem6.1], andl32, theo remp.2a0l gave sufficient conditions for the optimal policy to be characterized,as afunction of p, by threenumbers pi, i : 7, 2, 3, 0 < pr ( pz ( pr ( 1, such that " itis opt imal to produce or 0


4/41

474 E.L. Sernik, S.I. Marcus / Discrete time Markou modelsmachine is replaced at the beginning of period /, then it is assumed that themachine will be in the good stateat the end of that period (see[20, p. 587]),andtherefore the state of the machine at the beginning of period I * 1 is determinedby the transition probabilitics that govern the evolution of the process. n White's[31]model, replacement f the machineat period I places he production processin the good state at the next decision epoch (see [31, p. 236]). Hughes' modelrequires that thc replacementdecision be made following an inspection, andmade strictly dependentupon the outcome of the inspection (so that only twoactions {produce, inspect-replace} are being considerecl);also, the replacementaction at period I implies that the process s in the good stateat time / * l. Wang[29] studied severalvariations of the two-action, CU problem, including that ofRoss and that of Hughes but with the replace action placing the machine in thegood stateat thc end of the sameperiod in which the action is taken.Th e resultsto be presentedapply with minor modifications to all thesemodels and hence,without lossof generality,we will work with Ross' model in [20].We now complete he descriptionof the model. Each control action ha s a costassociatedwith it, as follows: the cost associatedwith replacing the machine isgiven by R independentlyof the underlying state; if the machine s in the goodstatc, he cost of production s 0, and it is C if the machine s in the bad state.Weassume < C < R (see,e.g., 20,31]).Also, the processevolvesaccording o transition probabilit ies p,r(2,) definedby p, i(u) : Pr{xt t r : j lx , : i , u, : D).Thetransit ionprobabi l i tymatr ices (u,) ,u, U, with entr iespir(u,) are:

p(o): [ r -e o] p(2\: l t -o olt o l l ' r tzt: j -"e "o] ' I :o ' l '2"" ' ( l )where d is the probability of machine failure in one time step. To avoidtrivialities we assume0 < p < 1.Assume that the in i t ia l probabi l i t ies z'(0) (Pr{xo:0}, Pr{xu:1)) aregiven.We are nterestedn the inf ini te hor izoncase.Let D*=(Xx U)* be thespace of infinite sequenceswith elements n x x U, and let D be the Borelo-algebraobtained by endowing D* with the discrete opology.Then, for x, e X,u,cU, /Nu{0}, {xs. us,x1, 111,. . . }e D- represents real izat ionof thestate and action sequences. he problem is to find an optimal admissible controlpolicy that minimizes he expecteddiscountedcost .{(z), given by :

(2)where Ef [' ] is the expectationwith respect o the unique probability measureonD induced by the initial probability distribution n and the strategyg (see[6, pp.140-1a4l) ;B is the discount actor,0 < B


5/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 475quence,since its computation does not involve any observations, and thus it canbe computed off line. In the two-action, CU case,{ g, }2, can also be written as asequence f Borel measurablemaps g, : [0 , 1l X [0 , 11 - U, such that u, : g,(n(t)),u , e U, for / : 0, 7,2, . . . .For the remainder of section 2 we take advantage of the fact that theconditional probability vector zr is characterizedby the scalar conditional prob-ability p, so that the optimization problem(2) is reduced to a scalarone (see,e.g.,120, p. 5901' or [31, p- 846])- Le t vp(p) : infuJr(il. Then (see, ".g.,15,8,13,20,25,29,371)p(p) is the expected ost accrued nihen n optimal policy isselected,given that the machine starts in state l with probability p, and futurecosts are discounredat rate B. Also, it is well known (see[20]) that vB(p) is theunique solution of the functional equation:

vu( ): min{ p + Bvp(r( )) , R + pvr)@)}, (3)where 7(p) is the updatedprobability that the machine s in the bad state.and isgivenby T( p): p( l - 0) + 0.Remark 2. 1

In White's [31] model, Va (D is the unique solution of :uu(t):- i"{ cp+ BvB(l:(p)), + pv.@)}. (4)

In Hughes' [13] model, VB (il satisfies:vo(n): min{cr(p)+ BUB(r(p)),(p)(c+n)+r + FvB@)}, (5)with 1 the cost of inspection.For one of wang's models (see [29]), vi l p)satisfies:vo(r) :mintcp*pv.e( i l ) ,p(c+n)+r+ Bvp(o)) (6)

Hence, the results to be presentednext for Ross' model are obtained for the otherreplacementmodels describedabove in a straightforward manner.Remark 2.2

( p) has the same orm for al l the replacementmodels mentioned above, sincein each of them the state of the machine is not observed during production.

As mentioned in the introduction, the two-action, cU problem was presentedin [25]. To make this work self contained, we list below the properties of themodel that result in the piecewise inearity of the optimal cost function:(P1) Z(p) sat isf ies (p)>p for,al l pe[0, 1).Similar ly, T- '(p) : (p-0)/ (1- d), 0 < 0


6/41

476 E.L. Sernik. S.I. Marcus / Discrete time Markou models(P2) For 0 < a < b < 1 and 0 (b-a).(P3) As mentioned above, t follows from [20, theorem 3.3], and [31, theorem6.1], that when only two actions are considered, the structured optimalpolicies associated with the model of this section are o'produce for allpeI} ,1]", and "produce for pe[0, p*] and replace or pe(p",1]" ( the

latter is referred to a lso as a control-limit policy). Also, Ross ([20, theorem3.4a])gave necessary nd sufficient conditions for the policy "produce foral l p e [0 , 1]" to b e optimal (this s the casewheneverR >- C (1 - P( 1 - 0))).Thus, as far as the optimal policy is concerned, he problem reduces ofinding the control-limit p* .(P4) Following Sawaki'snotation ([23, p. 116]),define th e subintervalsof [0 , 1] :E: {pe [0, p*] : T'(p)e(p*,1] ] , lN,where r ' (p)=T(Ti- ' (p)) , i>-7, To(p)=p, and 71( l : f ( i l :pQ-0) + 0. Observe that the subi ntervals E, satisfy the recursion E,: { p e[0, ) : T(p)eE,_r) , i>-2. In addit ion, f rom (P1) and the cont inuity ofZ(.), the Ei, i> 1, are convex disjoint intervals,and in Sawaki's erminol-ogy, they const i tute part i t ion,cal l i t E, of [0,7)( [23, p. 113]) .Now, from (3 ) it is cl ear that VB(p) l1o*.,1,he restriction of the optimal costfunction to pe (p*, 1] is constant. Based on properties (P1) through (P4), thepiecewise inearity of the infinite horizon, optimal cost function Vn( ) is provedin [25] by showing that the restriction of VB(p) to each nterval E, is an affinefunction of p, and that there is a lower bound, greater than zero, for the length of

the E,'s. These observations mply an upper bound on the number of linesegments escribingVilp) lro,o.l,giving he desired esult-We refer the reader o[25] for details. n the sequelwe will make use of properties P1 ) through (P4).Once piecewise inearity is established, ormulas to compute the optimal costand the control-limit p* can relatively easily be derived by following (3), andusing inductive arguments. These formulas are given in [25], and for the sake ofbrevity will not be repeatedhere.Remark 2.3

Wang's work [29] was aimed at showing that control-limit policies wereoptimal for the two-action, CU model. In addition, Wang also provided analyti-cal expressions or computing the optimal cost and the optimal policy for thisproblem. Although his results could be used to show the piecewise inearity of theoptimal cost function, Wang did not do so. Wang studied a more generalmodelfor the two-action, CU problem than the one treated here, but he did not considerthe three-action, CL model (cf. next section). In later work [30] Wang extendedhis results to higher dimensional (greater than two) models. These results could

(1)


7/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 477be taken as the starting point for extending the results stated below for thethree-action, CL model to higher dimensions.Remark 2.4

Lel J * 1 be the number of line segments escribingVp(p) lru,o.r.Wheneverp*does not belong to the set of points {0, T(0), T2(0), . . . ,Tt{0; i . the opt imalpolicies "produce fo r pe [0 , p"] and replace or pe (p*,1]" ar c finitcly tran-sient, as defined by Sondik in [26] and [27] (seealso [22]), and consequently theassociatedoptimal cost function is piecewise inear- In those cases, he expres-sions for the optimal cost and policy derived in [25] provide the same results asthose that can be obtained by using Sondik's algorithm. When p* {0, T(0), T'(0) , . . . ,T1(0)} , the opt imal pol icy is not f in i tely t ransient .Thereason, n Sondik's terminology, is that the intersection between the set of pointswhere the optimal policy is discontinuous,and the set of "next values" fo r theinformation vector, s neverempty (cf-125,example3] , or example7 in section5below). However, the optimal cost function is still piecewise inear; we refer thereader to [25] for details. The optimal cost function can still be computed usingthe equations derived in [25]. If Sondik's algorithm is used in this cas e, theoptimal cost and policy will be found if the initial guessfor the degree of theapproximatton ([27, p. 2921)s smaller than the actual number of line segments nthe optimal cost function. The expressionsderived in [25] to compute the optimalcost and the control-limit p* are particularly attractive for sensitivity analyseswith respect to the parameters of the model, since they do not involve anyiterative procedure.Remark 2.5

The piecewise inearity of Vp(p) can also be obtained by analyzing the valueiteration algorithm ([20, eq. (4)]) used to solve eq. (3), namely:

v|( r) : - i"{ Cp,R} ,,h t \ . t ^ nrrn- l tm/ \ \ (8 )vi(p): - i " t cp+ Bv,{- ' ( r ( ) ) ,R+ Bv;-1(0)},sincc from the theory of contraction mappings it is guaranteed hat algorithm (8)convergesuniformly to the unique solution VB( ) of (3) as t --) oo (see,e.9., 77,theorem 3.4.1]).Let p'denote the control-limit at iteration n, arrd or each n forwhich pn


8/41

478 E.L. Sernik, S.L Marcus / Discrete time Markou modelsRemark 2.6

Note that in Hughes' model, and in Wang's model (cf. eqs. (5) and (6)respectively), VB( ) |


9/41

E.L. Sernik, S.I. Marcus / Discrete time Mqrkou models 479differencebeing the sequencen which the eventsoccur,as explainedabove.ForWhite's hree-action, L model the optimal cost functionsatisfies he functionalequation:

vu(n): - in{ Cp BvB(r(p)), + Br(p)vp(l) B(l - r(p))vu(o),n + BzB(o)). (11)Hence, the results to be stated below can be obtained in a straiehtforwardmanner for White's [31] model.

The analysisof (9) gives, n this casealso, the piecewise inearity of the infinitehorizon, optimal cost functionVB(l)- The problem is more complex now than inthe two-action, CU case since there are more admissible actions available, thusmore structured policies to consider, and p now depends on the observations.However, from equation (9), and the results of Ross [20] and white [31,32] onstructured policies mentioned at the beginning of section 2, it is clear than ananalogous set of properties (P1)-(P4) can be associatedwith the model of thissection as well.Unfortunately, necessaryand sufficient conditions (expressedonly in terms ofthe basic data of the problem) for a policy structure with two, three or fourregions to be optimal do not exist (necessary nd sufficient conditions to have theoptimal policy " produce for al l p e lO, 1]" in this caseare exactly as those for thetwo-action, cU problem; cf. (P3)). Hence, each of thesepolicy structures has tobe analyzed ndividually.

2.2.1. Three-region tationary optimalpolicy structureIn this section we assume that the optimal policy structure is of the form"produce fo r p e [0 , p,], inspect fo r p e (pv pzl and replace f.or p (pr, 1]".Also, following the notation of the previous section, define the intervals of [0, 1] :

4: { p elo, prl :T'(p) e ( pr ,pr7}, i N. (12)Note that the {'s satisfy properties analogous to those satisfied by the E,'s in(P4).

Observe rom eq. (9) that VB() l(o, ,p, l S af f ine and VB(.)l1o, , r1: ZB(1)) sconstant. Once the optimal policy is assumed to have three regions, properties(P1) through (P4) are enough to characteize Vp(p) lro,o,t.We state this in thefollowing proposition.PROPOSITION 2.1Assume that there is a stationary optimal policy structure with three regionsfor the CL model described above (three actions, i.e., U: {produce, inspect,replace), with the state of the system CU during production but CO duringinspection).Then, Vp(p) lro,o,ts piecewise inear.


10/41

480 E.L. Sernik, S.I. Marcus / Discrete time Markou modelsProof

From [20], lemma 3.2, it is known that every optimal policy "produces forp eI0,01". Therefore,we need to consideronly two cases:(i) The case where I < h can be treated as in the two-action, CU model in [25],because once it is assumed that the stationary optimal policy structure has aninspect ion egion,we have that for pe F1,vr(p):Cp+ BvBQ(p)) since t isoptimal to produce.But for p a F1, T( p) e (pr, p.r),and VB(f( .) ) is affine. Sincethe properties of the {'s are the same as those of the {'s in (P ), the remainderof the proof proceedsexactly as that for the two-action, CU problem (see[25] fordetails), to conclude that there is a uniform lower bound, greater than zero, forthe length of the line segments.This in turn implies that there exists an upperbound in the number of line segments escribing Vp (p) | p,n,1.(ii) When pt:0, th e proof differs slightly from that in [25] fo r the tw o action,CU problcm, since now there are two different cases o consider, depending onthe relationship between p' and pz, &S ollows:(a ) If pr:0 and Z(pr) 1pz,we have hat the optimal co st associatedwith thestationary optimal policy "produce f.or p = [0 , pr], inspect fo r p e (p,, pr], andrcplace or p e (pr, 7f", is given by :

To see his, note that the situation s similar to that in (i ) since or p e [0 , p,]it isoptimal to produceand z( p) e (pt, pzl,hencevBQ-()) is affine.Thus, vp(-) | ro.u,tis affine, and Vp(') is piecewise inear.(b ) If pr: 0 bu t 7n ( r) >-pz , then the optimal cost associated ith the optimalpolicy described n (a) above, s given by:( r , * Bvu() | (o,,, r p c [0,y]

-. , lcp+pvp(l) pe(r,p1),vo(p):( - . , \ , _, r (14)F'r ' l lvp(p) l(o, ,o,r pe(h,pz7,I tzu(t) p e (pz, l ,

where y = T- t(pr) < pr . In this case, or p e. (y, pr], it is optimal to produce andT(p)e(pr, 11,so that VBQ(.)) is constant .Thus, V^p) l(y,p, t . :Cp+ BVp(\) .The result follows now by observing that for p[0, y] we have the situationdescribed in (a). Finally, note that since 7(0):0 and (. ) is continuous,Vp (p) | 10.o,1annot be of the form Cp + BVBQ) .o rp e [0 , pr] because hat wouldimply that there is no inspection region in the policy structure, contradicting ourhypothesis. tr

(ro * Btzu() | o,, ,r p c [0,pr]t tu( i l : \ ruf p)|10,.r ,1 pe(pr, pr,fI nu(r) pe (pr, l .

(13)


11/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 481

Remark 2.8As in the two-action, CU case, the piecewise linearity of the optimal costfunction for the model of this section could have also been obtained using thevalue iteration algorithm (10). However, the analysis becomescumbersome be-

causenow there are severalpolicy structures to consideroand sincenecessaryandsufficient conditions for their existence are unknown, policy structures that arenot optimal for some finite horizon cannot be eliminated as suboptimal for theinfinite horizon problem. We will return to this point in section 5, example 3.Before considering the case n which there are four regions in the stationaryoptimal policy structure, let us show how the piecewise inearity of VB(p) can be

used to find analytic expressions depending explicitly and only on the basic dataof the problem) to compute the optimal cost and policy.From proposition 2.1 there is a finite number, J + 7, of line segmentsdescrib-ing vB(P) lro.o, l . rom (9), Vp(p) (o, ,o, t an be writ ten as VB(p)| t10, ,0,1:+BpvBQ) + B(1 - p)vBQ).Also from eq. (9), vp(I) : n+ BvuQ). Thus, we ob-tain:vu( i l (o, ,o,rp[(p 1)vB0)R]+ lr + vu\) Rl. (1s)

From eq. (9), and for p e 4, we obtain from (15) that:vu(n) : Cp + Bvp(T( p)) |


12/41

482obtains:

P'(p): p

E.L. Sernik, S.I. Marcus / Disuete time Markou models

R-1

- o) i + B'(1 q'[(p - l)vB(Ll n]){''i,u'u{'i u'('(1 o),)+ B,[r+ vo\) a]

+B'(1 (1 a) ' ) [ tBpeF,, i :1, . . . ,J+7, (18)

where f , : ( ay pr), Ft*t : [0, a" r ] , F, : (a, , a,_r] , i : 2, . . . , J, and a,, i :1, . . . , J , g ives the intersect ion between adjacent l ine segments P' ( . ) and P'*t(. ).It can be easily shown, using (18), that:c(1 (1- o) ') (B r)[ / + vB\) R]

Qi : (1 0) '{ -c+ [(F - 1)vp(1,)+] [1 B(1 d)]]- [ (B- )ru(r) n] [ (B- )+ (r B) ' ( r B(r -a)) ]( r - 0) ' { -c+ [(B- 1)vB\)+n][r -B(1 -d)] ] '

i :7, . . . ,J. (1e)Note that we also have a, :T- '(p.) , i :1, . . . , , I . Now, from eq. (9) and thedef init ion of the { 's, ZB(1) s computed by VBQ):R+ BPr@), since the(" / + 1)st line segment s speci fied n [0 , ar], and qt < 0.Using (18), we obtain:

I t- rvu1): ln + Bce p'0 - 0) '+ p'* ' ( r- ( r - 0) '* )^ * B'* ' (1- R)I i :oJ- l+cI Bth-nj : r

-Ptt t f , pe(pr,1] . \20)Observehat the only unknownn (20) s J. SinceUu{n) (o,,o,ts givenby (15), tis only left to find "/, p, and pr. Thecontrol-limitsp', pzare ound by comparingvu{d l(o,.o,l ith Pt(p) and ZB(1),espectively,o obtain:

(2r)

t )v. \ ) . ^ l ) ,

I- 0) ' ) l /17p'* ' ( r (r d) '* ' ) (Br)

andPz: (B-1)r/p( l)+R' (22)


13/41

E.L. sernik, s-1.Marcus / Discrete time Markou moders 4g3In order to obtain the number of line segmentsdescribing vB(il|10,p,1,notethat from the definition of the {'s, and the propertiesof the map (p)'a;f. (p1)),J+ 1 is the smallest nteger k such that T-k(p,)


14/41

484 E.L. sernik, s-1.Marcus / Discrete time Markou moderswe will refer to the interval [0, pr ] as the "first produce region,,, and to theinterval (pr, pl as the "second produce region',.Since the analysisof this case s similar to that of the previous section, it willnot be included here for the sake of brevity. We note thai the formulas obtainedfor the three-region, stationary policy structure case are not affected by theexistence of a second produce region (note, e.g., that to compute vBe) andYo^(-o). ,o, .o,rl l that isneededistoevaluatevB(r)at p:g,and,f rom[25, lemma3.2l,it is shown that 0 alwaysbelongs o the'?irstproduce region). Furthermore,by following arguments similar to those n the pr"uiom section,we can prove thatthe restriction of the optimal cost function to p e (pr, prl, ,/o(pfh.- ^-,, ispiecewise linear, and that the line segments that descriue- trrJ opii'#ii"".or,function in this interval are computed in the same way as those describing theoptimal cost function in the two-action,CU model. We summarize hese esultsin thc fol lowingproposit ion.PROPOSITION.3Assume that there is a stationary optimal policy with four regions for thethree-action, L problem under study (that is , u: {produce, insp-ect, eplace),no observationsduring production, perfect observationsduring inspectionj. Thenthe infinite horizon, optimal cost function VB( ) is piecewiseinear. In addition,vp(p) and the control- l imitsp, , i :7,2,3, which character izehe opt imal pol icystructure can be computed following the procedure: (1) Find the number of linesegmentsn the first produce egion, ,r + 1, with eq . (23); (2 ) compute zr(1) usineeq. (20); (3) compute p, with eq. 27); (4) f ind VB(p) l(p, ,o, t s ing eql ' i fSl; tSlnext , compute he l ine segments escr ibinE B(p) lro,o,r"si tg "qs. (1g) and (19);(6 ) compute pr : rhis is given by (see[25, eq . tZl} '- '- ' 'p:: (1 B)voQ)/c; e4)(7) find K, the number of line segments n the second produce region, as thesmallest nteger k for which the folrowing inequality is nit satisfied:r + ( t - Bk)vo\) n- czj : i7 i ( t - ( t - o) i )cL j : ]Bi( t 0) ' - (B 1)vB0) R1+

P'-1, t(7 - 0). \2s)(note that e


15/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 485i :1, . . . , K, with the intersect ions etween adjacent ine segmentsgiven by:do= pt, dx= pz, and ([25,eq. (15)])

vp(1)(1- ) r : 1,. . , K - 1. (28)c\- i l t 'Observe that we have formulas for thc infinite horizon, optimal cost functionand the stationary optimal policy structure for all the structured optimal policiesof the type consideredin this work, and associatedwith the replacement modelunder study (as mentioned above; see 20,31,32]).However, we have not foundnecessary nd sufficient conditions to state,given the data of the model, which isthe optimal policy structure.This does not representa problem though, sincewe

can compute the opt imal costs and control- l imits p;, i :7,2,3, for al l thestructuredpolicies,and select he optimal one as the one that gives he smallestcost. This comparison makes sensesince for each (fixed) set of paramcter valuesonly one of the costs computed is the optimal one (recall that the functionalequation has a unique solution), while the others are costs associatedwithparticular (possiblynonstationary)policies,computed assuminga policy structuredifferent from the optimal one. In addition, computationally the com parison ofthe differen t costs calculatedhere s reasonable, ince obtaining VB(p) with theequations presented above representsa minimal computational effort comparedto that required for solving the problem via, say, the value iteration algorithm (insimulations performed, the computer time has been approximately four orders ofmagnitude smaller, or the samecomputer and computer load).Finally, note that the formulas presentedhere give the exact solution to theproblem, an d no t just an approximation, and so they representan easy way toobtain insightabout the behavior of the systemwith respect o (say)uncertaintiesof the parameters of the model.

3. An inspection model for standby unitsIn this section we analyze the model for inspection of standby units describedin detail in [28]. The idea here s that a standby unit (maybe more than one) isinstalled to improve the reliability of the system.Bu t the standby unit has to be

inspected, and repaired if necessary,since it can go down even when not inoperation, and this will cause t to fail to operate the next time it is needed.Thus,if inspection reveals the unit to be in an unsatisfactory state, repairs are made.The time when there is a need for the standby unit are called initiating events(see[28]), and if the unit is in a failed state when an initiating event occurs, then acatastrophic event is said to occur.

Suppose hat the standby unit can either be "up" or "down" when it is not inoperation i.e., he statespaceX= {.rp, down}). Le t snbe the probability that th e

a,:1----1-*' (1 -0) '


16/41

486 E.L. Sernik. S.I. Marcus / Discrete time Markou modelsunit will be up at the next time period given that it is up in this period, that ntime periods haveelapsedsince he unit wa s installed,and that the unit is not inoperation. For the remainder of this section we assume that s,: s for all n,,s> 0, meaning that the unit has a constant failure rate (see[28, p.265)).In each time period the decision maker can inspect the unit, repair it, or donothing (i.c., the availableset of actions s U: {do nothing, inspect, epair}). Ifthe inspection finds the unit up, then no repairs are made. However, there is aprobability (1 - i ) that the inspection is damaging given that the unit is up, andso with probability (1 - r) the unit is down immediately after inspection. f theunit is found down, a repair is attempted, which has probability r of returning theunit to the up state,and a probability (1 - r) of leaving t in the down state.Aninspectionwhich finds the unit up takes M periodsof time, while inspectionplusrepair takes N (N > M) periods. During any of theseperiods the unit cannotrespond to an initiating event. If the decision maker decides o repair withoutinspecting irst, the unit has probability r of being up immediatelyafter repair,irrespectively of its state before repair, and again it is out of action for N periodsof t ime.Also, an initiating event (i.e., one that requires he standb y unit to come intoaction) occurs each period with probability b (that is, times between initiatingeventsare independent andom variableshaving common geometricdistribution;see 28]). Finally, there is a probability (1 - c) that the use of the machine willcause t to go down (cs is the probability that the unit will be up in the next timeperiod given that it was used in the present period). As mentioned above, themodel s the one considered n [28].Note that it allows for nonzero nspection-re-pair and repair times, and that it takes into account possible mistakes duringinspection or repair. For a complete description of the model we refer the readerto [28].1'he objective considered s that of maximizing the expectednumber of periodsuntil a catastrophicevent occurs. The problem can be modeled (see [28]) as aPOMD process,with state spaceX={p: pe [0,71J, p being the probabi l i tythat the unit is up in the next period given all the past observations nd actions(observe that in this model there are no observations available to the decisionmaker when the action taken is to do nothing, but perfect observations areobtained during inspection and repair; hence, the model considered is a closedloop model).

We follow the notation of [28]. Let V( p) be the maximum expectednumber ofperiods until a catastrophic event occurs, given that p is the present probability,basedon the history of the system, ha t the unit is up. Then, Z(p) satisfies hefunctional equation (see 28]);v( p) : max{W,( p), W,( p), Wr(p)} , (29)

where Wr(p) , Wr ( p) and Wr(p) respectively correspond to the actions do


17/41

E.L. Sernik, S.I. Marcus / Discrete time Markou modelsnothing, inspect and repair, and they are given by:

,vr( ) : 1+ (1 b)v( 'p) + bpv(c),

487

(30)

(33)

(34){1s)(36)

w,(p):p[(1 (1 u) ' ) / r+ (1 u1'v1i1]+1 r) [( r - e - n)N)7tt+(1 t) 'v1,71. (31)

w,(p): (r - ( r - b) ') / t * (1 b)*v(,) , (32)where (1 - (1 - b)')/b is the expectednumber of periods to pass during aninspectionof M periodsbefore there s an initiating event.

In order to simplify the comparison between th e models of the previoussect ions nd the one descr ibed cre, et q:7 - p, i .e. , q is the present robabi l i tythat the unit is down given all the past observationsand actions.Then T(q) = t- sp:1-s( l - q) : sq+(1 -s) . Clear ly,T-t (q) : (q- (1 -s)) /s, s> 0. Also,define V(q) as the maximum expectednumber of periods until a catastrophicevent occurs given that q is the present probability, based on all the previousobservationsand actions, that the unit is down. V(q) satisfiesa functionalequationsimilar to (29),with p replacedby q, tp replacedby T(q), and i, r andc replacedby 1 - i,7 - r and 1 - c respectively.n addition, we make the changeof var iablesuggestedn [28], namely i(q) : V(q)-7/b, that is, t@1 is theexpectedextra time until a catastrophicevent occurs because here s a standbyunit ((1/b) is the expectednumber of periods until the first initiating event).Then, Z(q) sat isf ies:/ (q): mux{t i t r(q),wr(q), f rr(q)},

withw,(q): (r - q)(1 + bv(l - c)) + (r - b) i(r(q)),wJil : (r - q)(t DMf0 - i ) + q( t - n) ' t ( ' t - r t ,w,(q): (1 u)*t(t - ,) .

We point out the similarity betweeneq. (33) and the functional equation 9) fo rthe three-action,CL replacementproblem:- rtr(q) dependson T(q) and q just as VB(p) lro,o,l -oes n 7( p) and p in thereplacement model; Wlil is affine in 4 since V( l - l) and V(l - r) ar econstants or given i, r; W3(q) = Wz is constant.T(q) and ?"-r(4) sat isfy T(q).q and f t (q)


18/41

488 E.L. Sernik, S.I. Marcus / Discrete time Markou models

* It is possible to write a value iteration algorithm in order to compute I7(q;since the expression n (33) satisfies all the standard results of DP (see [28]):/ ' (q): max{7 q,0,0},tn+1(n;: '"u^((r- qXi + bt(t - ' ) ) + (1 b)v(r(q)),

(r - q)(r - t)* v( t - i ) + q(I - u)* v(t - r ) ,(1 b) v1t - 11),

= maxwf t i l , f r ; (q),w{ (q)} \37whcrewe haveusedZo(q; :0 as suggestedn [28, p.263].

- The optimal policy for this problem is a structured policy, and is given (see 28,theorem 2.11)bV a control-limit q* , such that: one does nothing fo r q< q*;inspect or q>q* i t t1l , - i )>( I -67n-u/0-r) ; andrepair for q>q* ifv( t - i) q* ", 0 ( 4* < 1, then it is alwaysoptimal to "do nothing for all q ( min{l -r ,7 - i) ".(b ) Similarly, if the stationary optimal policy is "do nothing fo r 4 ( q* an drepair fo r q> q*", O< q* < 1, then it is always optimal to "d o nothing for allq


19/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 489< 0 since the quantity in parentheses s less than 1 (i t is equal to 1 only for b: 0,but in that case the standby unit is never used). On the other hand, from (33):

v(1) Q+ nvQ-.)X1 r)+ (1 b)v(1) ( t - u)v( t) .That is, Ut\) ) 0, which implies(again for b * 0) that t1t1r,- 0. Putting theseresults together,we have V(7)>- V(q), wlnch is a contradiction since V(q) isnonincreasingn q. The proof of (b) is similar, and is omitted- nLEMMA3.2

It is never optimal to "d o nothing" for q :1 .Proof

Let rl.,Q)be the cost associated ith the policy "do nothing for all qe.10,7f".Then, using (37) one readily obtains that:l "-1 n- l . r - l l , . r*(q):( t -q) t im lf bkck+ ro(t -b)o I l ' . lbrt -* tctL-*t1. (3e)n-al* :o k:t j :k \Kl )

We recall that the limit in (39) exists since the expression n eq. (33) satisfies heaxioms of th e contraction operator approach to DP (see[28, p. 262]), and thevalue iteration algorithm (37) convergesuniformly to the unique solution of thefunctional eq . (33); see,e.9., 17, theorem3.4.11. ow, assume hat it is optimal todo nothing for q:1. Then, f rom (33) we have:o>(1 u) ' , t ( r - ' ) ,

a contradiction since ,1.,@):0 only for q:1(this can be easily verified: taki ngthe limit in (39) for the first term in brackets gives1/(1 - bc) > 0, while the limitof the second term in brackets is greater than or equal to zero since t is the sumof nonnegative terms). Hence, one either inspects or replaces for q: 1 (weexclude the case b :7, since in that case there is an initiating event every timeperiod, i.e., the standby unit is used every period, and the problem becomes areplacementproblem since there is no standby unit as such). n

It is now clear that following the approach of the previous section, we have thefollowing result.PROPOSITION.1

For the inspection model for standby units described in this section (i.e.,U: {do nothing, nspect, eplace},an d the unit has a constant ailure rate s > 0) ,the performance index V(q), the expected extra time until a catastrophic eventoccurs, s a piecewise inear function of q, the present probability that the unit isdown, given all the past observations and actions.

Next, in order to deve lop equations for computingViil and qx, we point outthe followins:


20/41

490 E.L. sernik, s.L Marcus / Discrete time Markou modelsRemark 3.1

As mentioned above, from [28, theorem ].11. we know that the optimal policyinspects or q > q* it t1t - i) > (7 - 611- i , then:w,(q): (1 q)(t - b)Mt(1- i) + q(t - t) , t( t - r)

,- (7 q)(1- b)Mv(t r) + q(r - u),t( t _ ry>- 1- i lQ - b) ' t(r - r) + q(t - u),t( t _ r): (1 o)Nt(t - r ):n,

which implies that whenever 1 - r > 7 - i, one knows from the data of theproblem that the optimal policy is "do nothing for q { q* and inspect forq> q* ", 0( q*


21/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 497more, there is no explicit relationship between c, i, r and s; this means thatseveral cases see below remark 3.3) will have to be considered in order to findformulas tor V1q\ and q*. In addition, since tlqS is piecewise linear and isevaluated at q: c, Q: i an d Q: r in the functional equation (33), bl t T(q)dependsonly on s (and not on 7 - c, 1 - I or 7 - r) , there s no way to know apriori which interval (and hence which line segment) has to be used to evaluateV(q) at Q: C, q: i and q: r (compare his to the replacementproblem, whereit is known that 0 belongs to the interval where the "/th line segment s specified).Therefore, some extra work (describedbelow) will be required to find the numberof line segments describing the optimal cost function (as opposed to whathappens in the replacement problem of section 2.2.7, where only inequality (23)has to be considered).For the remainderof this scctionwe assume as n [28,example2.71) hat c: t,meaning that the occurrence of an initiating event that finds the unit up isequivalent to an inspection that takes zero time.Remark 3. 3

When i:c and 7 - r> 1- i , the opt imal pol icy is to "do nothing for q(q*and inspect for q > 4* ". In this case, here are sevenpossiblesituations that haveto be studied for each assumpt ionon s, e.9. , r / > i and that the optimal


22/41

492 E.I-. Sernik. S.I. Marcus / Discrete time Markou modelspolicy is "do nothing for q < q* and replace fo r q ) q* " . Let T'(q) =T(T'- tQD. Then, T'(q) : snq+ (1 - s') . Assuming that {1 - t } e [0, q*] ,{r , 1 - i , i } e(q*, 1l and that (1 - r) fal ls n the / th interval the ntervalwherethc /th line segment s specified), one finds the following equationsby followingexactly the sameprocedureused n section2.2.1:

k- 1Ao(q): r * b%)L 0 - r i1n110u)'+ (1 b)r fr , ,,7:0k:7,.. . , L+7, (42)where Q-@) is the kth, out of L* 1, l ine segments escr ibingWr(q); also, etA1t


23/41

E.L. Sernik, S.I. Marcw / Discrete time Markou models 493Remark 3. 5

I f one wants o f ind the formulaswhen s > r> i, {1- r ,7 - i } c [0, q*] , and{i , r ) e(q*,11, then eqs. 42), 43) and (44) are eplaced y:k- lQ-(q): r* bv40 ,)) I 0 - r ' (q))( t u),+ r b). f r , ,J: \ )

k:1, . . . , L+7 (46\li/. :

(1 b) ' ,A(t)r- hiA(m) b(r- b1**^rA(/) (1 b) '*^ + bi( l * b) ' '^.4(*) '

Q":\- b(l - b iA(m))f i ,i+r(1 -b)^frr 'respectively,where we have assumed hat 1 - r falls in the /th interval and 1 - rfalls in the zth interval. We make two observations;(i) The procedure to find Z + 1 in this case s:(1) Assume : k - 1, m: k - 1.(2 ) Find 1- * I using (45),with q* given by (48); if there s no (positive) ntegersatisfying {5), go to step (4) (i.e., eject this case);otherwise, ontinue.(3 ) Compute W3 (in ( 7)) with the value found for L + 1; store this case;continue.(4) Decrease or m by I, keeping in mind that from the assumptions we have7-r< 1- l ,or l>m; gotostep(2).(ii) In this case,with I the upper bound for Z computed as before, there are /2iterat ions ( i .e. , ( / : k-7, m:k-I ) , . . . , ( l :k- I , m:k - t+ 1), ( l :k-2,m: k-2), . . . , ( l : k- t , m: k - l) ) . The reader s aware hat this is the worstcasescenario,and if t such that tz is absurdly large, then the procedure suggestedabove may not be an alternative to the value iteration algorithm (37) in terms ofthe computer time required to obtain the optimal cost and the optimal policy. Asmentioned in remark 3.4, for typical values of the parameters of the model, thesimple procedure suggestedhere is several orders of magnitude faster than thevalue iteration algorithm. In addition, some analysis reduces the number ofiterations in the procedure considerably: in this case or example, since 1 - r < I- i , and {7-r ,1-r} e [0,4*] . i f f ' (1 -r)>7- i , i


24/41

494 E.L. Sernik, S.I. Marcrc / Discrete time Markou modelsRemark 3.6

The iterative procedures suggestedabove for the two cases considered (andsimilar ones for the casesmentioned in remark 3.3) give the (optimal) number ofline segmentsdescribingrTrG) by comparing all the valuesobtained for lit: thenumber of line segmentsassociatedwith the largest \il, equals L + 7. Recall thatV(q) is the unique solution of the functional equation (33) (see [5]), and so thenumber of line segmentsdescribing WJq> is also unique. This means that if as aresult of the computation one obtains two (or more) times the same value of \il,for different valuesof L'l l, then one has to compute q* and Q" , k: 1, . . , L +1, for each of these cases:only one would give consistent results (the others willhave e.g., negative ength intervals, or line segmentsspecified for q < 0, etc.).Remark 3.7

Obtaining formulas for the cases specified in remark 3.3 is not complicated.Noteforexample, that i f ,s


25/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 4954.1. MAINTENANCE MODEL UNDER MARKOVIAN DETERIORATION

This problem is treated in [11] by Hopp and Wu. The idea here is to studymodels in which maintenance actions may not return the system to a "good asnew" state (as opposed to a replacement action), and where the underlying stateof the system is not directly observable. The authors in [11] study two cases:state-dependentmaintenance (the action taken at each time period depends onthe underlying state, which becomesknown after maintenanceis performed; see[11, p.448]), and state-independentmaintenance,and for both cases he authorspresent structural results concerning the optimal maintenancepolicy.We brief ly introducesome notat ion used n [11]. ^S: {1, . . . ,n} denotes hestate space,where 1 represents he " best" state, and n represents he " worst"state. When the system is in state i and no maintenance is performed, itdeteriorates o state , j> - i, with probability pil.The Markovian deteriorationmatr ix P: Ip, i l is assumedo have pi i17, i :1, . . . ,n- 1, and to be such hatLj: rp,1is nondecreasingin for al l k:7, . . . ,n - 1. Since he under ly ing tate snot CO, the probability distribution vector rr will denote the information state,i.e., the information available to the decision maker at each time period. A :{0 , 1,..., rz } is the set of actions,where 0 represents do nothing", and mainte-nance action a > 0 movesthe system o state c with probability one. In addition,action 4 costs c(a), and for eachperiod the systemspends n state i, it producesa return r(i). The one period discount factor is F, F e [0 , 1). We refer the readerto [11] for more details on this model.First take n : 2 (below we refer to the case ?> 2), and consider the state-inde-pendent maintenance case. In this case, f /(z) is the optimal return over theinfinite horizon given knowledge state tr at time 0, then /(n) satisfies thefunctional equation ([11, p. 458]) given by:

f (" ) : max/( r , a) , (4e)ae Awhere

r , lLi : ,"( i )r( i ) + Bf tP), a:0,I \n. a) : \\ -c(a)+f(", ) , a>0,and eo is the ith row of the identity matrix. The similarity between eqs.(49) and(3) (or (a)) is apparent if n : (1 - p, p), p being the probability that the systemis in state 2, then the expressioncorresponding to a :0 is affine in p while theexpressions or a > 0 are all constants).

Furthermore, since the authors in [11] prove that the optimal policy is struc-tured ([11, emma 7]), we need only to ch eck the propertiesof nP,the updatedinformat ion state.Since t is assumed hat pi i*1, i :0, . . . ,n-7,when n:2(and prr:1) we get that the updated probability, also denotedhere by ?(p), isgiven by T( p) : (l - p) prr* p, which satisfiesT( p) > p and f-t( p) < p as inthe models considered in previous sections. Therefore, that f ( p): f (tr) is a


26/41

496 E.L. Sernik, S.I. Marcus / Discrete time Markou modelspiecewise linear function of p (when pzz: 1) can be proved following theapproach of the previous sections.

In the state-dependentmaintenance case he model is more elaborate (see[11]),and we do not intend to get into the details of it. Let us ust observe hat: (i) forn :2, again one can obtain the piecewise linearity of the cost function byfollowing the approach of sections2 and 3; (ii) for general n, and assuming thatit is optimal to perform maintenance when the system s known to be in state n(the worst state), the authors in [11] prove that the functional equation satisfiedby the cost function represents a finite system of equations ([11, theorem 3]).Whether this could be used to find formulas to compute the cost and the optimalpolicy remains to be investigated. In this regard, we recall that for the replace-ment problem of section2.1, Wang [30] provides formulas to compute the costsand the optimal policy (a structured policy) for an /,-state (n > 2) model.Although in [30] the two actions considered can be applied in any o[ the states,these esultscould be combined with those n [11, theorem 3], to obtain specificformulas in the maintenance model described above.4.2.OPTIMALSTOPPINGN A BINARY-VALUEDMARKOVCHAIN

The optimal stopping problem for a PO binary valued Markov chain withcostly perfect information was considered by Monahan in [18]. At each time I thedecision maker can either stop and accept the current reward, or he can reject thisreward, pay a fixed fee and move to the next time period, when another rewardwill be considered. Since t is assumed hat the true state of the Markov chain isnot known, the information regarding the current state is sumrnarized by theprobability distribution (7 - p, p),0

0.5. This simply implies that with higherprobability the processwill remain in (or make a transition to) the good state.

The problem is to find a rule which will indicate the action to take based onthe information available,so as to maximize the expected nfinite horizon rewardvu(il:rr[:,u(,.,")f (sr

over all nonrandomized, stationary strategies 6 (i.e., V('), the optimal cost


27/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 497function, is defined as V(p) = supuZu(p), p e [0 , 1]), with p the initial state.Note that %(.) is not discounted; rom [18, p. 74], the existence f V(.) and anoptimal strategy ollow from standard arguments.U(s,, an), Sn[0, 1], ane l,are the single per iod rewards,given by U(p, R): -cR, U(p,T): -c- . andU(p, A):p.We refer the reader to [18] fo r further details n the description ofthe model.

It can be shown (see [5,18]) that the infinite horizon optimal value functionV( p), associatedwith the POMD problem just described,satisfies the followingfunctional equation:v(p): max{v(t(p)) - cR, - cr+ pV(7)+ (1 - p)V(o), p}. \s2)

Here /(p) : tro + (trr - ),n)p is the probability of bei ng in the good state n thcnext period, provided that action R is taken (we follow the notation in [18]).The similarity between this problem and the replacement problem in section2.2 is apparent. Furthermore, Monahan [18] proves that with this stoppingproblem there are associated tructured optimal policies,which can have 7, 2, 3 or4 regions, just as in the replacement problem. He also shows the piecewiselinearity of the optimal cost function (the proof is, however, not clear), andprovides an algorithm to compute both the cost and the optimal policy. Althoughthe transition probability matrix and the criterion for the stopping problem differfrom those n the replacement problem, the approach of sections 2 and 3 can beused here to prove the piecewise inearity of the optimal cost function, and todevelop formulas to compute the optimal cost and the optimal policy. Theseformulas (which we state below only for the three region optimal policy), providethe same results as those that can be obtained by using the algorithm developedby Monahan n [18] .Rennrk 4.1

It is important to note that, contrary to T( p) in the replacementproblem, /( p)has a fixed point in [0 , 1) , givenby pr:Xo/[7 - (tr, - ],0)1.However, t is easy oshow that it always falls in the accept region whenever the optimal policy has 2 or4 regions (reject-accept and reject-test-reject-accept respectively),and so, theapproach of sections 2 and 3 applies here. For the case in which the optimalpolicy has 3 regions (namely, reject-test-accept), whether the fixed point pn.belongs to the accept region depends on the parameters of the model and onZ(0). However, we have not found an example in which p" does not fall in theaccept region, and the formulas shown below were developed under the assump-tion that p. is in the accept region. Example 6 in section 5 illustrates the case ofan optimal policy with three regions.The same observations made regarding the selection of the optimal policy inthe replacement problem apply here. That is, for a given data set, one finds thecosts associatedwith the policies having I,2, 3 and 4 regions(this is computa-


28/41

498 E.L. Sernik,S.I. Marcus / Discrete ime Markoumodelstionally inexpensive ince t is doneusing formulasand not an iterativeproce-dure), and compares hem to select he optimal one. Also, as mentionedbefore,the savings n computation time plus the accuracygained,allow the performanceof sensitivityanalyses f the costand thepolicy with respecto theparameters fthe model.For the case n which the infinite horizon,optimal policy structurehas threeregions namely eject-test-accept), nd following he sameprocedurellustratedin sections and 3, one obtains hat the formulas o compute he optimal costand the optimal policy are given by:

(r ,( o) . p e [0,o*r] ,v(p) : \wr(n), p (anr,or^1,\wr( p), p e (aro,1l(53)

whereWr(p) is described y the ine segments ivenby:R'(p): - i .c"+RN(0)+(1 -R"(0)) . t , (p), i :1, . . . ,N, (sa)

withR"(o) 1 - N.c^ * c, (55)Ior i ; ( I r - tro),

andi- ll ' (p) : Io I (I , - t r ' ) '+ (tr , tro) 'p, i : 1,. . . ,N.j :oThe intersectionsbetweenadjacent line segmentsare given by:

, tro(.} , ,tro) '- c^u:-. t :1,-..r^-I.( r, - tro) '( l ( t r , Io))Wr( p) and \( p) are given by:wr(p) - - cr+ RN(o) (1 n'(o))pand

(57)

%( p) : p, (sq)respectively.The control limits between the reject and test regions, and betweenthe test and the accept regions are given by:ro(1 nx(o))-c*4pr: (1 -RN(o))(1 -(1,-I . ))

nN(0) - c,crn: R"(0) ,

(56)

(5s)

(60)and

(61)


29/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 499respectively. Finally, N, the number of line segments n the reject region, is thesmaller integer n that satisfies:

l ,o-z - c* - / ' (0) < 0, \62)6 in section5 illustrateshe problem

(1 (), , t r , , ))zwhere z=(n.cnt cr) / l ' (0) . Exampledescribed n this section.4.3. RISK SENSITIVE ARKOVDECISIONPROCESS

Now we consider the risk sensi tiveMD processstudied by Gheorghe [10]. Thispaper formulates a model for a Markovian decision process with PO states inwhich the decision maker bases he decision on risky propositions, and so, its riskpreference can be represented by a utility function that assigns a real value toeach possible outcome. Again we consider the case where the (core) process{x, , t : 0, 1, . . } can take two possible alues.Let P(u,) : In, i(u,) ]be the t ransit ionprobabi l i ty matr ix, with u, the controlaction at time r. Also, let { y,, I : 0, 1, . . . } b" the observation process aking twopossible values- The observation and the core processesare related by qio@):Pr{y,n. , : k lx,* t : j , u, : D}, with L|: rAio(u):1, j :1,2, for a l l u. l f n:(nr, nr) is the information state, and 4 is the probability of being in state l,i : 7,2, given past observations and actions, then (see [5,10]) the updatedprobability T(k, n, u) that the state of the system will be the second one, giventhat outcome k was observed and control u was applied, is given by:

T(k, n, u) : qi*Q)L?:ro, , i (u)L,? fl y ( u L! rn , ,, ( uLet c,ro be the reward obtained when the systemmakes a transition from statei to state 7 and produces an output k after transition (see [10]). Then definingV(n) as the utility functional (for the lifetime of the process) if its curren t

information state is n> we have (see [10]) that Z(z) satisfies the functionalequation:

(63)

v(n):where 7 is the risk-aversion coefficient, such that y < 0 implies risk-preference,y > 0 risk-aversion and 7:0 risk-indiference. The problem is to find the optimalcontrol policy to maximize the utility function V(n) over the infinite horizon. Werefer the reader to [10] for a more detailed description of the model.Gheorghe[10] and Satia and Lave [21], propose a branch and bound algorithmto solve eq. (64) (for y > 0, y < 0, y: 0 and fo r y: 0 respectively).n order toapply the branch and bound algorithm, upper and lower bounds of V(r) are

t2 2 2-u*{ f I L o,P,,b)qio(u)e-f" , , rv(T(k,' \ i : r 7: l k: l ' , ' ) )) '

(64)


30/41

500 E.L. Sernik, S.I. Marcus / Discrete time Markou modelsrequired. The upper bound is obtained by assuming that the states of the systemare CO U0,21]. The lower bound is obtained by computing the reward associatedwith a ny reasonable see[10]) policy. Specifically, he authors n [10,21]computethe reward associatedwith the control policy that makes the same decision forevery period. Presumably this control policy is chosen because t is relativelysimple to compute the utility function associatedwith it.

One possible application of the risk-sensitive Markovian decision processmodel (the authors provide several n [10] an d [21]) is the two-action, CUreplacementproblem considered n section2. 1 (see[21, examplep.477], and [10,examplep. 121). n this case, f Qi*(u):0.5 fo r all u (so that the statesare CU),and assuming that d is the probability of machine failure in one time step, thenT(k, n, u) : T( p) (T( p) as def ined n sect ion2'7) , k: I ,2, i f the act ion takenis produce,and T(k, n, u) :0, k: I ,z, i f the act ion taken is replace.Hence,properties (P1) through (P4) are satisfied n the risk sensitivemodel as well, andso we can use the approach presented above to find formulas to compute Z(z)(for the CU case). The point we want to make here is that the cost functioncomputed in the CU casecan be used as the lower bound required in the branchand bound algorithm, without introducing additional burden to the computa-tional procedure.

Similarly, if one considers the replacement problem of section 2-2, and letQr*(u):1 for . i : k, and q,o(u):0 for j * k when the act ion aken s to inspect(u:1), then again one can obtain formulas to easilycompute the associated is kutility function, and again it can be used as the lower bound required in thebranch and bound algorithm. Note that whether the bounds proposed here are orare not better than those used n [10] and [21] still remains to be established,andwe will not address his point here. We only want to mention the problem treatedin [10] and [21] as another one where the ideas prescnted n previous sectionscould be applied.4.4. INPUT OPTIMIZATION FOR INFINITE HORIZON PROGRAMS

We now turn our attention to the input optimization problem considered bythe authors in [4]. Thc problem in this case s that if Z(x) is the infinite horizon,maximal discounted cost given the initial state x, one is interested n finding theminimal input required to achievea target reward u; namely:

I (u)= inf{x: V(x)>-u}. (6s)In this case, x is a scalar variable taking values in an interval of the real line. Werefer the reader to [4] for conditions (cf. [4, assumptions 2.1-2.3]) that guaranteethe existence f V(.) and .I('), as defined above.The authors in [4] show that 1(u), as defined in (65), has certain properties(e.g., it is monotone nondecreasing and lower semicontinuous), and with theseresults they show that 1(u) also satisfies a functional equation. This in turn


31/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 501allows them to propose a value iteration algorithm to compute 1( u). We refer theinterested reader to [4] for details on the model and the results obtained.we note that the problem posed in [a] is deterministic. However, the sameproblem can be formulated for probabilistic models: for example, consider thereplacementmodel described n section 2.1, and tet i1u) = sup{ p: VB(p) < u).Then, I(u) can be interpreted as the maximal value of the initial probability p ofbeing in the bad state for which one obtains at most a cost u, with Vu(p) theminimal expectedcost obtained when an optimal policy is selectedgiven that themachine starts in the bad state with probability p.From the results obtained in section 2.1 it is clear that we can give an explicitsolution for 11u), since we have a formula to compute vB(p) (that is , given atarget cost t , we can give the maximal initial probability p that would allow onwto remain below the specified cost u). Similarly, for some of the other modelsconsidered n this work, the "dual problem" (i.e.,given an output level, find theoptimal input commensurablewith the corresponding optimal decisions)can alsobe solved explicitly. These resultscould be used as a first step in the design of(value iteration) algorithms to find optimal inputs corresponding to costs thatcannot be computed explicitly by formulas, but have to be found using the Dpalgorithm.4.5.OPTIMAL-COSTHECKINGWITH NO INFORMATION

We now consider the problem described in [19] by Pollock. The generalstatementof the problem considered s the following ([19, p. 455]). An event Eoccursat t ime t , t :7,2,3, . . . , with known probabi l i typ(x) . An observat ionsmade at t ime r, r :7,2,3, . - . , the resultbeing a random variablex(r) whichhas probabi l i tydensity unct ion po(x) i f t> r, and pr(x) i f I < z. Immediatelyafter each observation, one of the following decisions s made: "decide that theevent has occurred", or "wait for another observation". Hence, the action spacewil l be denotedby U = {D, W}.Action D, at time r, may or may not be a terminal action. If action D is pickedand I > r, then a falsecost F is incurred, the knowledge that I > r is gained, andthe process continues. If action D is selected and I ( r, then the process isterminated with cost c(t, r). The objective is to minimize the total expected costof the process.The following assumptions are made: (i) the terminal cost is of the formc(t , r ) : ( l - r )w, with w being interpretedas "late cost" ( [19, p. a56]) ( i t ismentioned n [19] that this assumption s not necessaryo obtain a solution, but itis convenient since it reduces the algebraic manipulations); (ii) the occurrencetime / of the event E follows a qeometricdistribution. i.e..

p( t ) : a(7 - o) '- ' , r :7,2,3, . . . , (66)


32/41

502 E.L. Sernik, S.I. Marcus / Discrete time Markou modelswhere a is the (constant) probability of occurrence per unit time, given that theevent has not yet occurred. We assume0 < c < 1-

Let P(r) be the present value of the probability that the event has occurredprevious to, or at, time r (P(r) will be denoted by P following the notation int19l). f one assumesha t po(x): pJx), then ([19, p. a5\) an observationx doesnot affect the a posteriori evaluation of P, since no information is gained betweendecision times. Therefore, the checker'soptimal strategy simply consists of e itherwaiting one time unit, or deciding that the event has occurred. Furthermore,denoting by f(P) the updated probability that the event has occurred, we havethat 7(P) is givenby ([19,p. 455]) :

r (P): P( l - a) + a. (67)Let V(P) be defined as the minimum expectedcost obtained using an optimalstrategyat time r. Then ([9, eq . (5)]), V(P) satisfies:

v(P): min{wP+ v(r(P)), (1 PX r+ v(a))}, (68)where the first choice in the minimization is associatedwith the "wait" action,and the secondone with the "decide" action. Observe hat from (68) one obtainsthat I / (0) :V(a), and that V(1):0, provided hat JC>0 and w>0. Also from(68) (and [19,p. 4621), (a) < (1 - aX F + V(c)) , or V(a)


33/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 503

Similarly, let {'(P) be now the cost associatedwith the policy that "decides foral l P e [0, 1]", and applying (68) recursively, one obtains that:(71)

But f rom (71) {(0) :F/a +(1 - a)F/a:r l ,@). Hence t cannot be opt imal todecide or P:0.With these esults at hand, it is clear that the procedureof sections2 and 3 canbe used to show the piecewise inearity of V(P) (cf. remark 2.6). Next, we state

the equations o compute the optimal cost Z(P) and the control limit P*.Let 1 be the number of line segments escribingV(P)l1o.r*1, he restriction of

the optimal cost function to P e [0, P * ]. Then, 1 is the minimum integer n suchthat:1- B(n, F, w, a)w + B(n, F, w, a)

*(r1: l iu (u r) / ' i ' 0 - o) ' ) 0- i l : ." -"o\ , :0 l

1-where

( t - o) '


34/41

504 E.L. Sernik. S.I. Marcus / Discrete ime Markoumodelswhere he inlersectionsetween djacent ine segmentsQ'1-7 and 0'*t(-)givcnby:

Qi :(n+v(o))(1 - o) 'o-r(t- ( t -o) ' )

(n + v(o))(1 - a) ' a * w(7 a) 'Remark 4.3

The expression or P* (eq.(7a)) s given n [19,eq . (12)].Also, Pollock givesanexpressiono compute he opt imal cost wheneverP * a (cf . [19,eq. (18)]) .FromPollock's results, hc piecewise inearity of V(P), and henceeqs. 73), (16), (77),ar e a direct consequence. ur intention here was to include Pollock'sproblem asanothcr example of the models for which an explicit solution of the DP can beobtained.

We illustrate the usc of eqs. 72)-(77) below in section5, example 7.4.6.MARKOVDECISION ROCESS I'IH LAGGED NFORMATION

Consider the problem addressed y Kim in [14], Kim and Jeong n [15], andWhite in [34]. n thesepapers, he a uthors are nterested n POMD processes ithlagged nformation. That is, in addition to the partial observations f the currentstates, here are some delayed observationsof previous states available to thedecision maker. This lagged nformation might also not be perfect. In [4 ] theauthor studied he case n which both current and laggedobservations re perfect(so that the current information vector has finite dimension), an d presentssufficient conditions for a control -limit policy to be optimal. In [15], the authorsanalyze he problem in which the current and lagged observationsare not perfect,fo r the finite horiz.onproblem. As stated n [15, p. aaQ,lagged information canalso be considered n the infinite horizon discounted case,and applied to, e.9.,maintenanceor replacement roblems.

As in previous sections, we consider the case in which the (core) process{x(t ) , t :0, 1, . . . } can take two values. he model then becomes imilar to thatpresentedn sect ion4.3, where P(u(t) ) : I p, i(u( t ) ) l is the t ransit ionprobabi l i tymatrix describing the evolution of the process,with a(r) the control action attime t. In this case,however, there are two observationprocesses:{ y,(t), t :0, 1,... ), the observationprocess elated o the current statesby the probabilit iesqi io:Pr{y,( t+1):k lx(r+1): , r } , wi th Li- 'q j^--1. i :7.2 and { ytQ), t:0, 1,...), the observation process associatedwith the previous states by theprobabi l i t iesat ir : Pr{y,( t + 1) : k lx( t ) :7} , such hatLl: rqt , r :7, i :1,2 (asin [5], the number of observations eednot be the sameas the number of valuesin the statespace;we let {y.(r ) } and {yt(r) } take two values).Since the states ar e PO , n: (n1, nr ) is the information state, with 4 theprobability of being in state i, i:7,2, given past observationsand actions.The


35/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models

authors find a rule for updating the information vector given current and laggedobservations. his rule is given by (see[15, eq . (6)]):

505

(78)where T((k, m), n, u) is the updated probability that the state of the process sthe second one, given that outcome ft was observed, delayed observation m v{asavailable, and control u was applied. Compare this expression with the updatedprobability of thc model considered n section 4.3. The problem here is tominimize the cost V(n), which satisfies he functional equation:

r ( (k,m),rr , r r : f f i ,

l2t , / \ . I s ' / n f-v(n): qi l l I n,c( i .u) + B L rip,1(u)qjoq!^v(r((k.). n.

re u lt - t i. j .k .m

1I, ),I(7e)with U the setof admissibleactions,B the discoun t factor, 0 < B < 1, and c(i, u)the cost accruedwhen the system s in state i and action u is applied.Consider now the replacement problem of sections 2.7 and 2.2. The laggedobservationscould be introduced in this problem as (any) extra informationavailable e-9., estswhich are carried out while the machine s working, bu t suchthat the resultsare not a vailable mmediately),and which is taken into accountwith a delay of on e time period. In that case, whether qjo: q!^: 0.5 fo ri , . j , k, m:\ ,2 (no observat ions ie ld informat ion),or Q' i i* :0.5 for j , k:1,2,and ql- :1.0 i f i :m and q!^:0.0 i f i*m, i . m:1.2 ( laggedobservat ionsgive perfect state nforma tion), then the optimal cost function is piecewise inear.In addition, in the former case, he expression or the optimal cost is exactly thatobtained n section2.1.

The previous observations mply thal expressions or upper and lower boundsof the optimal cost function in the lagged-observations model can be easilyobtained.Thus, thesebounds could be used n either numericalproccduresaimedat solving the general agged-observations roblem, or as an easy way to find outif it is at all beneficial o use he laggedobservations, inceas t is pointed out in[34], there are cases in which the lagged information does not improve theoptimal cost function.We conclude this section by observing that here, as n the model of section4.3,more interesting questionscan be addressedwhen higher dimensional models areconsidered, and so the work of Wang [30] can be taken as a first step in thatdirection. The purpose here was to bring attention to the MD processwith laggedinformation as a possibleapplication of the ideaspresented n previoussections.5. Examples

In this section we solve some examples using the formulas presented in theprevious sections.


36/41

506 E.L. Sernik, S.I. Marcus / Discrete time Markou modelsExample

Consider he Markovian replacementmodel of section 2, and let B:0.9,0 :0.7, C: 4, 1: 5 and R : 10.Using the formulasdevelopedn that sectionone finds that the optimal policy structure s "produce for p e [0.0,0.5826],inspect or p e. 0.5826, .68291andeplace or p e.(0.6829,11",nd the associ-ated cost unction s givenby:78.99p 76.7918.51p 16.80I7.97p+ 16.8877.77p 77.0476.26p 17.30

vu( ) l ts tan 777773.76p 18.3112.04p 79.749.93p 20.217.32p+ 27.79

26.79

p [0.0000, .0304]p e (0.0304,0.72731p e (0.7273,0.2746]p e (0.2146,0.29311p e (0.2937, .36381p e (0.36380.42741p e (0.4274,0.48471P e (0.4847,0.fi621p e (0.5362, .5826]p e (0.5826, .6829]p e (O.AtZl,1.0000]

(80)

(81)

Example 2For the same model of example 1, take 0 : 0-12and leave the rest of the dataunchanged. Now the optimal policy structure is "produce for p e [0.0,0.6353]and p e(0.6806,0.77721, nspect fo r pe (0.6353,0.68061 nd replace fo r pe.(0.7172,1.01",with the optimal cost function given by :77.36p 18.69 pe [0.0000,0.7077]76.87p 18.74 p e (0.7077, .21481

76.24p 18.88 pe(0.21,48,0.3090115.46p 79.12 p e (0.:0e0,0.3919174.47p 19.51 pe(0.3979,0.46491

v^(p) _ ltl .zz + 20.09 p e (0.4649,.5291,1P\t ' ln.0+p+20.92 p e (o.szst,.5856]9.64p 22.04 p e (0.5856,0.6353]7.73p 23.69 p e (0.6353, .6806]4.00p 25.82 p e (0.6806,0.71721

28.69 pe(O3tlZ, 1.00001


37/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 507Example 3

Consider the example presentedby Ross n [20]. The data are those of example1 bu t with I - 6. Although for iteration n :7 of algorithm (10) there ar e fourregions in the policy structure (see [20]), the stationary optimal policy structurehas only two regions; the cost function is given in [25].Similarly, for example 1, at iterations ,? 14 and n : 75 of algorithm (10) thereare four regions n the policy structure, but the stationary optimal policy structurehas only three regions.These examples llustrate the observation in remark 2.8, and (to some extent)motivate the researchpresented here. Unless one knows in advancethe structure

of the infinite horizon, optimal policy, it is very difficult to decide whcn anoptimal policy has been reachedwhen using the dynamic programming iterativeprocedure. Furthermore, as mentioned in [7], a particular structured policy mayoccur at any time during the iterative procedure, yet fail to be optimal for theinfinite horizon problem. Thus, policy structures which are not optimal for somefinite horizon, cannot be eliminated as suboptimal. In fact, estimation of theminimum number of iterations (for example in algorithm (10)) required toguarantee that a finite horizon, optimal policy is also optimal for the infinitehorizon case, emainsan outstandingproblem; see 7, p.29].Example 4

Consider the inspection model for standby units analyzed in section 3. Leti : 0.85, r: 0.955,c : 0.85, . r 0.96, b:0.7, N: 0.07 and M: 0.035.Usingeqs. (42) through (45) one finds that the optimal policy is "do nothing forq e [0.0,0.0982], nd replace or q e. (0.0982,1.01",with the optimal cost functiongiven by:

- 24.96q q e [O.OOOO,.0215]- 18.77q q e (0.0215, .0606]- 10.18q q e (0.0606,.09821

q e (0.0982, .0000](82)

For this case,we used {:0.99, and as explained n remark 3.4, only l :113iterations of the simple procedure suggestedwere considered.Example 5

For the inspection model of section 3, let now i : 0.9, r: 0.95, c : 0.9 andleave the other parameters as those n example4. Using gqs.(45) through (48), wehave that for this example the optimal policy is "do nothing for q e (0.0, 0.10291,an d replace fo r q e (0.1029,1.0]", and the associatedoptimal cost function is

|r 1 8eI e2.41v(q) { rz.aoI\e1 80


38/41

508givenby:

E.L. Sernik, S.I. Marcus / Discrete time Markou models

q [0.00000.0266]q e (0.0266,.06s6]4 e (o.oes6,.1029] (83)

In this example, T'(7 - r) > 1- l, and therefore only 3l :339 iterations ar erequired o f ind I * 1 (compare his with 1132).Also, note that: (a ) 1 - s < 1 - r


39/41

E.L. Sernik, S.I. Marcus / Discrete time Markou modelsZ(.) is concave this can be proved using [20, emma 2.1]),bu t from (85) we cansee hat V(.), as opposed o the cost functions of the modelsof sections2,3 and4.2, is not necessarilymonotone (nonincreasing, n this case).Sincefor this modelit must be true that V(a):V(0) (cf. eq . (68)), f(.) will be nonincreasing f the,Ith line segmenthas zero slope. For illustrative purposes, et a:0.1, w : 1.0 andF = 0.777421.Then, using eqs. (72) through (77), we find that the optimal policyis to "wait for Pe [0.0,0.2710] nd decide or P e(0.2710,1.01",and that theoptimal cost is given by:

r e [o.ooo0,.1000]P e (0.1000, .19001P(0.1900,0.27101P e \0.2770, .00001

Note that P*:0.277:fL(a)e. {a, T(a), Tz(a)} . and so (cf . remark 2.4) theoptimal policy is not finitely transient, but the optimal cost function is piecewiselinear.

6. ConclusionsWe have considered several applications of two sta te, finite action, infinitehorizon, discrete-time, Markov decision processeswith partial observations, fortwo special cases of the observation quaLity, and shown that the procedurefollowed in [25] can be used to prove the piecewise inearity of the cost functionin each of these cases.This result is important in its own right because it helps in the overall

understanding of this class of problems. In addition, in most of the casesconsidered, t allows one to obtain ei ther explicit formulas or simplified computa-tional algorithms to find the optimal cost function and the optimal controlpolicies, making the kind of models considered n this work more appealing andsuitable for obtaining insight about the behavior of the system under study.In order to make the results presentedhere useful for decision making in mostpractical applications though, these results should be extended to the closed oopcase n which the state spacehas dimension greater than two. A first step in this

direction could be the work of Wang [30] for the replacement, CU problemdescribed in section 2.1. Also, information patterns that do not necessarilyinvolve extremal situations (i.e., CU, CO) should be considered.The hope is thatthe knowledge gained will enable the solution of several open questions associ-ated with the kind of models considered here, ike the one posed n [2, p. 559], forthe replacementproblem of section 2.2, concerningwhether in the case of partialobservations during production and during inspection, the set of structuredpolicies remains the same considered above.

s09

(86)3.7774


40/41

510 E.L. Sernik. S.I. Marcus / Discrete time Markou models

References[1] S.C.Albright,Structural esults or partiallyobservable arkov decision rocesses,per. Res.27 (7979\ 041-1053.[2] V.A. Andriyanov, .A. Kogan and G.A. Umnov, Optimal control of a partially observablediscreteMarkovprocess, utom. RemoteContr. 4 (1980)555-561.[3] K.J. Astrom,Optimalcontrolof Markovprocesses ith incomplcte tate nformation, . Math.Anal.Appl. 10 1965) 74-205.[4] A. Ben-Israel nd S.D. Flam, Input optimization or infinite discounted rograms, . Optim.TheoryAppl. 61 (7989)347-357.[5 ] D.P. Bertsekas, Dynamic Programming (Prcnticc Hall, E,nglewoodCliffs, NJ, 1987).[6 ] D.P. Bertsekasand S.E. Shreve, StochasticOptimal Control: The Discrete Time Case (AcademicPrcss,New York, 1978).[7 ] A. Federgruen and P.J. Schweitzer, Discounted and undiscounted value iteration in Markov

decision problems: A survey, in Dynamic Programming and its Applications, ed. M. Puterman(Academic Press, 979) pp.23-52.[8 ] E. Fernandez-Gaucherand,A- Araposthatis and S.I. Marcus, On partially observable Markovdecision processes with an average cost criterion, Proc. 28th IEEE Conf. on Decision and

Control, Tampa, Florida (1989) 1267-1272.[9 ] E. Fernandez-Gaucherand,A. Arapostathis and S.I. Marcus, On the average cost opt imalityequation and the structure of optimal policies fo r partially observable Markov decision

processes, his volume.[0 ] A. Gh eorghe, Partially observable Markov processeswith a risk sensitivity decision maker,Rev. Roumaine Math. Pures Appl- 22 (19'77)461-482.[11] W.J. Hopp and S.C. Wu, Multiact ion maintenance under Markovian det erioration andincomplete state information , Naval Res. Log. Quart. 35 (1988) 447-462.p2j J.S. Hughes, Optimal internal audit timing, The Accounting Review 52 (7977\ 56-68.[13] J.S. Hughes, A note on quality control under Markovian deterioration, Oper. Res. 28 (1980)

42t-424.[14] S.H. Kim, State information lag Markov process with control limi t rule, Naval Res. Log.

Quart. 32 (1985)491-496.[15] S.H. Kim and B. H. Jeong, A partially observable Markov decision process with laggedinformation, J. Oper. Res. Soc. 38 (1987) 439-446.[16] P.R. Kumar and T.I. Seidman, On the optimal solution of the one armed bandit adaptivecontrol problem, IEEE Trans. Automatic Control, 26 (1981) 1176-1184.[17] J.J. Martin, BayesianDecision Problems and Markou Chains (Wiley, New York, 1967).[18] G. Monahan, Optimal stopping in a partially observable binary-valued Markov chain withcoslly perfect information, J. Appl. Prob. 19 (1982) 72-8I.[19] S.M. Pollock, Minimum-cost checking using imperfect information, Management Sci. 13 (1967)454-465.[20] S.M. Ross, Quality control under Markovian deterioration, Management Sci. 17 (1977)587-596.[21] J.K. Satia and R.E. Lave, Markovian decision processeswith probabilistic observation ofstates,ManagementSci. 20 (1973) -13.l22l K. Sawaki and A. Ichikawa, Optimal control for part ially observable Markov decisionprocesses ver an infinite horizon, J. Oper. Res. Soc. Japan2T (1978) i*15.[23] K. Sawaki, Transformatio n of partially observable Markov decision processes nto piecewiselinear ones,J. Math. Anal. Appl. 91 (1983) 112-118.I24l E.L. Sernik and S.l. Marcus, Comments on the sensitivity of the optimal cost and the optimalpolicy fo r a discrete Markov decision process, Proc. 27th Annual Allerton Conf. on Communica-

tion, Control and Computing, Monticello, Illinois (1989) pp. 935-9M.


41/41

E.L. Sernik, S.I. Marcus / Discrete time Markou models 511[25] E.L. Sernik and S.L Marcus, On the optimal cost and policy for a Markovian replacementproblem (1990), to appe ar in J. Optim . Theory Appl.t26l E.J. Sondik, The optimal control of partially observable Markov processes,Ph. D. Thesis,Department of Electrical Engineering Systems, Stanford University (1971).l27l E.J. Sondik, The optimal control of partially observable Markov processes over the infinitehorizon: Discounted costs, Oper. Res. 26 (1978) 282-304.[28] L.C. Thomas, P.A. Jacobs and D.P. Gaver, Optimal inspection policies for standby systems,Comm.Stat. Stochasticodels (19E7) 59-213.[29] R.C. Wang, Computing optimal quality control policies - Two actions, J. Appl. Prob. 13(1976) 826-832.[30] R.C. Wang, Optimal replacement policy with unobservable states,J. Appl. Prob. 14 (1977)340-348.[31] C.C. White, A Markov quality control process subject to partial observation, Management Sci.23 (1977\ 843-852.[32] C.C. White, Optimal inspection and repair of a production processsubject to deteriorati on, J.Oper. Res. Soc. 29 (1978) 235-243.[33] C.C. White, Bounds on the optimal cost for a replacement problem with partial observations,Naval. Res. Log. Quart. 26 (1979) 415-422.[34] C.C. White, Note on "A partially observableMarkov decisionprocesswith lagged nformation",J. Oper. Res-Soc. 39 (1988) 217-278.

Documents

On the Computation of the Optimal Cost Function for Discrete Markov Models With Partial Observations