C R RAO AIMSCS Lecture Notes Seriescrraoaimscs.in/uploads/lecturer_notes/Inferencefor...Gachibowli, Hyderabad-500046, INDIA. C R RAO AIMSCS Lecture Notes Series Inference for Stochastic

C R RAO Advanced Institute of Mathematics, Statistics and Computer Science (AIMSCS)

Author (s): B.L.S. PRAKASA RAO

Title of the Notes: Inference for Stochastic Processes:

An Introduction

Lecture Notes No.: LN2013-02

Date: July 15, 2013

Prof. C R Rao Road, University of Hyderabad Campus, Gachibowli, Hyderabad-500046, INDIA.

www.crraoaimscs.org

C R RAO AIMSCS Lecture Notes Series

Inference for Stochastic Processes: An Introduction

B.L.S. Prakasa Rao

CR Rao AIMSCSJuly 15, 2013

1

Preface

This Lecture notes consist of introductory lectures on “Inference for stochastic pro-

cesses” delivered by me at the Indian Statistical Institute, Delhi Centre, University of

Pune, Indian Institute of Technology, Mumbai, University of Hyderabad and at the ”Sum-

mer School” arranged by the University of Bocconi,Italy during July 1-20, 2002 and at

the University of Bocconi, Milano, Italy during December 3-16, 2006. The earlier books

dealing with this topic were Statistical Inference for Markov Processes by P. Billingsley,

University of Chicago Press,Chicago (1961), Statistics of Random Processes: General The-

ory by R.S. Liptser and A.N. Shiryayev, Springer, New York (1977), Statistics of Random

Processes: Applications by R.S. Liptser, and A.N. Shiryayev, Springer, New York (1978)

, Statistical Inference for Stochastic processes by I.V. Basawa and B.L.S. Prakasa Rao,

Academic Press, London (1980) and Abstract Inference by Ulf Grenander, Wiley, New

York (1981). There is a large amount of literature since then dealing with various aspects

on inference for stochastic processes. Some of the important review papers and other

papers and books are listed at the end of this notes. The journal ”Statistical Inference for

Stochastic processes” edited by Denis Bosq, published since 1998, deals with parametric,

semiparametric and nonparametric inference in discrete and continuous time processes.

B.L.S. Prakasa Rao

CR Rao AIMSCS

Hyderabad, India

July 15. 2013

2

Lecture 1

Stochastic Processes

Let (Ω,F , P ) be a Probability space. A stochastic process Xt, t ∈ τ is a family of

random variables defined as (Ω,F , P ). We consider τ = [0,∞) or τ = 1, 2, . . . in

general. Let t1, t2, . . . , tk ∈ τ. The joint distribution of (X(t1), . . . , X(tk)) is called a

finite dimensional distribution of the process. The probability structure of the process

will be completely known once we are able to find all the finite dimensional distributions.

The finite dimensional distributions of the process form a consistent family.If τ = [0,∞),

then Xt, t ∈ τ is called a continuous time stochastic process. If τ = 1, 2, . . ., then

we will call it a discrete time stochastic process.

Discrete time case

Suppose we have observed X1, . . . , Xn. Is it possible to determine the probability

structure of the process as n→ ∞?

Continuous case

Suppose the process X(t), 0 ≤ t ≤ T is observed. Is it possible to determine the

probability structure of the process as T → ∞?

For any n and t1, t2, . . . , tn ∈ τ, specify a probability distribution on Rn by the joint

distribution function

Ft1,.,tn(x1, ., xn).

Then F defines a probability measure on the σ -algebra of Borel sets B in Rτ . Let

Rτ be the space of all real-valued functions and

C = x ∈ Rτ : (x(t1), ., x(tn)) ∈ B

where t1, ., tn ∈ τ, B ⊂ Rn, n ≥ 1. Consider the σ -algebra Bτ generated by such

cylinder sets.

Kolmogorov’s consistency theorem

The family of finite dimensional distribution functions Ft1,.,tn(x1, ., xn), n ≥ 1, tk ∈τ, 1 ≤ k ≤ n induces a probability measure on (Rτ ,Bτ ) if and only if

(a) any Ft1,...,tn(x1, . . . , xn) is symmetric with respect to any permutation of the vector

(x1, ., xn) and the same permutation of the vector (t1, ., tn) and

(b) limx→∞

Ft1,...,tn(x1, . . . , xn−1, x) = Ft1,...,tn−1(x1, ., xn−1).

3

Let (Ω,F) be a measurable space and Pθ, θ ∈ Θ be a family of probability measures

defined on (Ω,F). Let Xn, n ≥ 1 be a stochastic process defined on (Ω,F , Pθ).

Suppose we observe the process Xk, 1 ≤ k ≤ n. The basic problem is to estimate the

parameter θ based on the observation Xk, 1 ≤ k ≤ n.Let FX1,.,Xn;θ(x1, ., xn; θ) be the joint distribution function of (X1, ., Xn) when theta

is the true parameter.

We say that the family of probability measures Pθ is dominated by the σ -finite

measure µ if

µ(A) = 0 ⇒ Pθ(A) = 0 for all A ∈ F . (Pθ ≪ µ)

We can write down the likelihood function for (X1, ., Xn) in case µ is a Lebesgue measure

on Rn or µ is a counting measure on Rn and the problem of estimation of the parameter

θ through the maximum likelihood method is well understood.

Suppose Xt, t ≥ 0 is a stochastic process defined on (Ω,F , Pθ), θ ∈ Θ. Suppose

further we observe the process Xt, 0 ≤ t ≤ T. The problem is to estimate θ. The

question arises to what is the joint distribution of Xt, 0 ≤ t ≤ T. How to define the

likelihood function? How to calculate the likelihood function even if it is defined? Let

us look at the process as a mapping from Ω to RT . From the Kolmogorov consistency

theorem, there exists a probability measure Qθ generated by XT = Xt, 0 ≤ t ≤ T on

(RT ,BT ) when θ is the true parameter. If we know that Qθ1 ≪ Qθ2 and Qθ2 ≪ Qθ1 ,

for all θ1, θ2 in Θ, then we can compute the Radon-Nikodym derivative

dQθ

dQθ0

with respect to a fixed θ0 ∈ Θ and try to maximize it to obtain an estimator for θ.

We now discuss some concepts leading to such methodology.

Examples of Stochastic Models

Stochastic models are used in scientific research in a spectrum of disciplines. We now

describe a few to indicate their use.

1) Random Walk model for neuron firing (Point process)

The neuron fires when the membrane potential reaches a critical threshold value, say

C. Excitatory and inhibitory impulses are the inputs for the neuron: these inputs arrive

according to a Poisson process. Each excitatory impulse increases and each inhibitory

impulse decreases the membrane potential by a random quantity X with same p.d.f.

f(x). After each firing, the membrane potential is reset at zero and the process is repeated.

Let Y1, Y2 . . . denote the times at which the neuron fires. The process of interspike

intervals Y1, Y2 − Y1, Y3 − Y2, . . . is of interest to the neurologist.

4

2) Epidemiology (Greenwood Model)(Markov Chain)

Suppose at the time t = 0, there are S0 suceptibles and I0 infectives. After a

certain latent period of the infection, (say) a unit of time, some of the suceptibles are

infected. Thus, at time t = 1, the initial S0 suceptibles split into two groups : those

who are infected, I1 in number say, and the remaining suceptibles say, S1. The process

continues until there are no more suceptibles in the population.

Note that

S(t) = S(t+ 1) + I(t+ 1), t = 0, 1, 2, . . .

Suppose the probability of a suceptible being infected is p. Then

P (S(t+ 1) = s(t+ 1)|S(t) = s(t)) =

(s(t)

s(t)− s(t+ 1)

)ps(t)−s(t+1)(1− p)s(t+1).

since s(t)− s(t+ 1) are infected and s(t+ 1) are suceptible.

The process S(t), t = 0, 1, 2, . . . is a Markov chain.

3) Population growth model (Branching process)

Suppose an organism produces a random number, say Y , of offspring with pk =

P (Y = k), k = 0, 1, 2, . . . ,∑pk = 1. Each offspring in turn produces organism indepen-

dently according to the same distribution pk. Suppose Z(0) = 1. If Z(t) denotes

the population size at the t -th generation, t = 0, 1, 2, ., then Z(t) is a Markov chain

with transition probabilities given by

P (Z(t) = j |Z(t− 1) = i) = P (Y1 + · · ·+ Yi = j)

where Y1, Y2, . . . are i.i.d. with distribution pk.

4) Population genetics (Diffusion process)

Consider a population of 2N genes each of which belongs to one of the two genotypes

(say) A and B. Let X(t) denote the proportion of type A genes in the t -th generation.

Assuming that the total number of genes remain the same from one generation to next

(we are neglecting selection and mutation effects), the genes in the (t+1) -th generation

may be assumed to be a random sample of size 2N of genes from the t -th generation.

The sequence X(t), t = 1, 2, . . . form a Markov chain. Conditionally on X(t− 1) = x,

2N X(t) will be a Bionomial random Variable with 2N as the number of trials and x

as the probability of success. One can approximate the Markov chain by a continuous

time Markov process with a continuous state space [0, 1]. Such an approximation is an

example of a diffusion process.

5

5) Storage model

Let X(t) denote the annual random input during the year (t, t+1) and M be the

annual non-random release at the end of each year. Let Z(t) denote the content of the

dam after the release. Then

Z(t+ 1) = minZ(t) +X(t), K −minZ(t) +X(t),M)

where K is the capacity of the dam and t = 0, 1, 2, . . . If the inputs X(t) are assumed

to be independent random variables, then the sequence Z(t), t = 0, 1, 2, . . . forms a

Markov chain.

6) Compound Poisson Model (Insurance)

Suppose an insurance company receives claims from its clients in accordance with a

Poisson process with intensity λ. Assume that Yk, k = 1, 2, . . . of successive claims are

independent random variable with common distribution function F (.). Then the total

amount X(t) of claims arising in the time interval [0, t] is given by

X(t) = Y1 + · · ·+ YN(t)

where N(t) is a Poisson random variable with mean λt. The process X(t), t ≥ 0 is

a compound Poisson process.

7) Queuing Model (for telephone calls)

Suppose the calls arrive at a telephone exchange according to a Poisson process. Dura-

tion of successive calls may be assumed to be independent exponential random variables.

The capacity of the exchange may be limited to (say) K calls at any given time. The

expected waiting time for a call to go through and the queue size at any particular time

are of interest.

8) Signal processing

Suppose X(t) is a signal satisfying the equation

X(t+ 1) = aX(t) + ξ(t)

where a is a fixed parameter and ξ(t) represents error. Suppose the true signal is

unobserved but Y (t) is observed where

Y (t) = X(t) + Z(t)

where Z(t) is noise. The problem is to estimate the signal X(n+1) given Y (0), . . . , Y (n).

9) Time Series

6

Let X(t) denote the price of a commodity at time t. Suppose we fit a ARMA model

X(t) + α1X(t− 1) + · · ·+ αpX(t− p) = Z(t) + β1Z(t− 1) + · · ·+ βqZ(t− q)

where Z ’s are i.i.d. unobservable random variables. Given X(0), ., X(n), we may

want to predict X(n+ 1) and in turn the problem is to estimate α ’s and β ’s.

7

Lecture 2

Discrete parameter martingales

Let (Ω,F , P ) be a probability space. Let Fn, n ≥ 1 be a non-decreasing sequence

of sub σ -algebras of F . Suppose Zn, n ≥ 1 is a sequence of random variables defined

on (Ω,F , P ) such that

(i) Zn is measurable with respect to Fn,

(ii) E|Zn| <∞,

(iii) E(Zn|Fm) = Zm a.s. for all 1 ≤ m < n, n ≥ 1.

Then the sequence Zn, n ≥ 1 is said to be a martingale with respect to Fn, n ≥ 1and we say that Zn,Fn, n ≥ 1 is a martingale.

Remark. It is clear that E(Zn) = E(Zm) for all n and m if Zn,Fn, n ≥ 1 is a

martingale. If (i) and (ii) hold and if (iii) E(Zn|Fm) ≥ Zm a.s. for all 1 ≤ m ≤ n, then

Zn,Fn, n ≥ 1 is said to be a submartingale. A submartingale Zn,Fn, n ≥ 1 is said

to be L1 -bounded if supnE|Zn| <∞.

Uniform integrability

A sequence of random variable Yn is said to be uniformly integrable if

limc→∞

supnE|Yn|I(|Yn| > c) = 0.

Remarks: A sufficient condition for uniform integrability of the sequence Yn is that

supnE|Yn|1+ε <∞ for some ε > 0.

Martingale Convergence Theorem : Let Zn,Fn, n ≥ 1 be an L1 -bounded sub-

martingale. Then there exists a random variable Z such that limn→∞

Zn = Z a.s. and

E|Z| ≤ lim infn→∞

E|Zn| <∞.

Remarks : (i)If the submartingale is uniformly integrable, then Zn → Z in L1 and

if Zn,Fn is an L2 -bounded martingale, then E|Zn − Z|2 → 0. (ii) Any nonnegative

martingale converges a.s.

Examples of martingales

8

1) Suppose X1, X2, . . . are independent random variable with E(Xi) <∞ for i ≥ 1.

Define

Sn = X1 + · · ·+Xn

and let Fn be σ -algebra generated by X1, ., Xn. Suppose E(Xi) = 0 for all i.

Then Sn,Fn, n ≥ 1 is a martingale.

2) Suppose X1, X2, . . . are independent random variables with E(|Xi|) < ∞ for

i ≥ 1. Let t be a real number and define

Zn =eitSn

E[eitSn ], n ≥ 1.

Then Zn,Fn, n ≥ 1 is a martingale where Fn is the σ -algebra generated by

X1, . . . , Xn.

3) Let Xn, n ≥ 1 be a stochastic process with f(x1, . . . , xn; θ) as the joint density of

(X1, . . . , Xn) when θ is a scalar parameter. Let Ln(θ) = f(X1, ., Xn; θ). Suppose

the function Ln(θ) is differentiable with respect to θ. Let

un(θ) =d

dθ[logLn(θ)− logLn−1(θ)]

and Fn be the σ -algebra generated by X1, . . . , Xn. Then ∑n

i=1 ui(θ),Fn, n ≥ 1forms a martingale under some regularity conditions.

Sketches of proofs

1.

E(Sn|X, . . . , Xn−1)

= E[Sn−1 +Xn|X1, ., Xn−1]

= Sn−1 + E[Xn|X1, . . . , Xn−1]

= Sn−1 + E(Xn)

= Sn−1.

2.

E[Zn|X1, . . . , , Xn−1]

= E

[eit(Sn−1+Xn)

E[eitSn−1eitXn ]

∣∣∣∣X1, . . . , Xn−1

]=

eitSn−1

E[eitSn−1 ]E(eitXn)E[eitXn

∣∣X1, . . . , Xn−1]

=eitSn−1

E[eitSn−1 ].

9

3.∞∫

−∞f(xn|x1, . . . , xn−1; θ)µ(dxn) = 1. Suppose we assume that differentiation under

the integral sign with respect to the parameter θ is allowed. Then∞∫

−∞

df(xn|x,...,xn−1;θ)dθ

µ(dxn) = 0

which implies that∞∫

−∞

ddθ[logLn(θ)− logLn−1(θ)]f(xn|x, . . . , xn−1; θ)µ(dxn) = 0.

Hence

E[un(θ)|X1, . . . , Xn−1] = 0.

Remarks: Note that ∑n

i=1Zi − E(Zi|Z1, . . . , Zi−1),Fn, n ≥ 1 forms a mar-

tingale for any sequence of random variables Zn defined on a probability space

(Ω,F , P ) with E|Zn| <∞ where Fn is the σ -algebra generated by Z1, . . . , Zn.

10

Lecture 3

Weak law of large numbers (WLLN)

Suppose Sn,Fn, n ≥ 1 is a zero mean martingale with Sn =∑n

i=1Xi. Further

suppose that E(X2i ) <∞ for i ≥ 1. Then it follows that

E(XiXj|Fi) = XiE(Xj|Fi) for 1 ≤ i < j

= 0

and hence E(XiXj) = 0 for 1 ≤ i < j. Therefore

Var(Sn) = E(S2n) =

n∑i=1

Var(Xi).

Hence

P (|Sn| ≥ ε) ≤ ε−2E(S2n) (by Chebyshev’s inequality)

which implies thatSn

n

p−→ 0 if1

n2

n∑j=1

EX2j → 0

which can be termed as a WLLN for martingales.

Remarks: Weaker condition can be given for the WLLN to hold.

Strong Law of Large Numbers (SLLN)(Feller (1971)), p.242; Loeve (1977), p.250)

Suppose Sn is a zero mean martingale with E(X2i ) <∞ for i ≥ 1. Further suppose

that there is a sequence bn ↑ ∞ such that

∞∑n=1

EX2n

b2n<∞.

Then limn→∞

Sn

bn= 0 a.s.

Remarks: For alternate conditions for the SLLN to hold , see the results stated later in

this section.

Central Limit Theorem (CLT)

The following central limit theorem was proved for martingales by Billingsley (1961) and

by Ibragimov (1963).

11

Theorem: Let Zn, n ≥ 1 be a strictly stationary ergodic process such that E(Z21) is

finite and E(Zn|Z1, . . . , Zn−1) = 0 a.s. for n > 1 and E(Z1) = 0. Then

n−1/2

n∑k=1

ZkL−→ N(0, E(Z2

1)) as n→ ∞.

CLT (Brown (1971)) Let Sn,Fn, n ≥ 1 denote a zero mean martingale where Sn =

X1 + · · ·+Xn. Suppose E(X2i ) <∞, i ≥ 1. Let

V 2n =

∑i−1

E(X2i |Fi−1),

and

s2n = EV 2n = ES2

n.

If V 2n

s2n

p→ 1 and

1

s2n

n∑i=1

E(X2i I(|Xi| ≥ εsn)) → 0

as n→ ∞ for all ε > 0, then

Sn

sn

L−→ N(0, 1) as n→ ∞.

Remark: For more general versions of CLT, see later in this section.

Maximal inequality

If Sn,Fn, n ≥ 1 is a zero mean martingale, then

P ( max1≤k≤n

|Sn| ≥ ε) ≤ 1

ε2E(S2

n).

Toeplitz Lemma.

If ai, i ≥ 1 are positive and bn =∑n

i=1 ai ↑ ∞, then xn → x implies b−1n

∑ni=1 aixi →

x.

Kronecker’s Lemma

Let xn be a real sequence such that∑∞

n=1 xn < ∞. Let bn be monotone

sequence of positive constants with bn ↑ ∞. Then

1

bn

n∑i=1

bixi → 0 as n→ ∞.

The following result involves classical Lindeberg condition for asymptotic normality

to hold for partial sums of independent random variables (Loeve, p.280)

12

Let X1, X2, . . . independent random variables with E(X2n) < ∞ for all n ≥ 1 and

E(Xn) = 0 for all n. Let Sn = X1+ · · ·+Xn, σ2k = Var(Xk), and s2n =

∑ni=1 Var(Xi).

ThenSn

sn

L→ N(0, 1) as n→ ∞

and

max1≤k≤n

σksn

→ 0 as n→ ∞

if and only if for every ε > 0,

1

s2n

n∑k=1

E[X2kI(|Xk| ≥ εsn)] → 0

(“if” part is due to Lindeberg and the “only if” part is due to Feller).

We now discuss some other versions of the WLLN, SLLN and the CLT for martingales.

WLLN Let Sn =∑n

i=1Xi,Fn, n ≥ 1 be a martingale and 0 < bn ↑ ∞ as n → ∞.

Let Xni = XiI(|Xi| ≤ bn), 1 ≤ i ≤ n. Then

Sn

bn

p→ 0

if

(i)n∑

i=1

P (|Xi| > bn) → 0

(ii) b−1n

n∑i=1

E(Xni|Fi−1)p→ 0, and

(iii) b−2n

n∑i=1

EX2ni − E[E(Xni|Fi−1)]

2 → 0.

Proof See Hall and Heyde (1980), p.30

(See Loeve (1977), p.290 for the independent case)

SLLN: Let

Sn =

n∑i=1

Xi,Fn, n ≥ 1

be a zero-mean square integrable martingale and

Un, n ≥ 1 be a nondecreasing sequence of positive random variables such that Un is

Fn−1 -measurable. Then

limn→∞

U−1n Sn = 0 a.s.

on the set limn→∞

Un = ∞,∑∞

i=1 U−2i E(X2

i |Fi−1) <∞.

13

Proof See Hall and Heyde (1980), p.35.

CLT Let Sni,Fni, 1 ≤ i ≤ kn be a zero-mean square-integrable martingale for each

n ≥ 1. Let Xni = Sni − Sn,i−1, 1 ≤ i ≤ kn, Sn0 = 0. Suppose kn ↑ ∞ as n→ ∞. Then

the double sequence Sni,Fni, 1 ≤ i ≤ kn, n ≥ 1 is called a martingale array. Let

V 2ni =

i∑j=1

E(X2nj|Fn,j−1)

be the conditional variance of Sni.

Special case If Sn =∑n

i=1Xi,Fn, n ≥ 1 is a martingale, then Sni =Si

sn, 1 ≤ i ≤ n

where sn is the standard deviation of Sn, Fni = Fi, kn = n forms a martingale array.

Theorem: Suppose Sni,Fni, 1 ≤ i ≤ kn, n ≥ 1 is a zero mean square integrable

martingale array. Further suppose that

Fni ⊆ Fn+1,i for 1 ≤ i ≤ kn, n ≥ 1 (nested condition)

and the following conditions hold:

(i) for all ε > 0,kn∑i=1

E(X2niI(|Xni| > ε)|Fn,i−1)

p→ 0,

(ii) V 2nkn

=∑kn

i=1E(X2ni|Fn,i−1)

p→ η2.

Then

Snkn =kn∑i=1

XniL→ Z = ηN(0, 1) (stably)

where η and N(0, 1) are independent random variables.

Remarks: Note that the random variable Z has the characteristic function E(e−

12η2t2).

In factSnkn

Vn,kn

L→ N(0, 1) as n→ ∞ provided P (η2 > 0) = 1.

(For the definition of stable convergence, see p.13)

Remarks: The nested condition holds automatically in case the martingale array is

built out of a single martingale as in the special case discussed above.

Sholomitski (Theory of Probability and its Applications, 43 (1999) 434-448) discussed

necessary conditions for normal convergence of a martingale.

14

Let (Ω,F , P ) be a probability space and Xjn, 1 ≤ j ≤ kn < ∞ be a double array of

random variables defined on (Ω,F , P ).Let

Fj,n = σ(X1n, . . . , Xjn) with F0n = ϕ,Ω.

Suppose

(1) E(Xjn|Fj−1,n) = 0 a.s [P ].

Further suppose that Xjn are square integrable and

(2)kn∑j=1

E(X2jn|Fj−1,n)

p→ σ2 > 0.

Then the “conditional Lindeberg condition”,

(3) ∧n(ε) =kn∑j=1

E(X2jnI(|Xjn| ≥ ε)|Fj−1,n)

p→ 0

as n→ ∞ for every ε > 0, implies that

(4)kn∑j=1

XjnL→ N(0, σ2) as n→ ∞

(cf. Brown (1971) Ann. Math. Statist. 42, 59-66. )

Conversely suppose that

(5) max1≤i≤kn

E(X2jn|Fj−1,n)

p→ 0 as n→ ∞

and the condition (2) holds. If, as n→ ∞.

(6)kn∑j=1

cjnXjnL→ N(0, σ2) as n→ ∞

for any double array cjn of ±1, then the conditional Lindeberg condition stated in (3)

holds. If the conditional distribution of Xjn given Fj−1,n is symmetric a.s, then the

condition (4) itself implies (6) and hence the conditional Lindeberg condition (3) holds in

the presence of the conditions (2) and (5).

Stable Convergence (Renyi(1963))

15

Let (Ω,F , P ) be a probability space. Suppose YnL→ Y. Then Yn

L→ Y (stably) if

for all continuity points y of the distribution function of Y and all events E ∈ F ,

limn→∞

P ((Yn ≤ y) ∩ E) = Qy(E)

exists and if Qy(E) → P (E) as y → ∞(note the Qy(E) is a measure on (Ω,F) if it exists).

Theorem Suppose that YnL→ Y where all the Yn are defined on the same probability

space (Ω,F , P ). Then YnL→ Y (stably) if and only if there exists a random variable Y ′

with the same distribution as that of Y (possibly on an extension of (Ω,F , P ) such

that for all real t

exp(itYn) → Z(t) = exp(itY ′) weakly in L1 as n→ ∞

and E[Z(t)I(E)] is a continuous function of t for all E ∈ F .

Remarks: This theorem is a consequence of the continuity theorem for characteristic

functions.

Note: A sequence Zn on (Ω,F , P ) is said to converge weakly in L1 to an integrable

random variable Z on (Ω,F , P ) if for all E ∈ F ,

E(ZnI(E)) → E(ZI(E)), that is,

∫E

ZndP →∫E

ZdP for all E ∈ F

and we write

Zn → Z weakly in L1.

Remarks:(1) Convergence weakly in L1 is weaker than L1 -convergence. In fact Zn →Z (weakly in L1 ) implies E(ZnX) → E(ZX) for all X which are F -measurable.

Remarks: (2) If for all E ∈ F and for all continuity points y of the distribution

function of Y ,

P (Yn ≤ y) ∩ E → P (Y ≤ y)P (E)

Then

YnL→ Y (mixing).

In other words Yn are asymptotically independent of each event E ∈ F . Mixing con-

vergence is a special case of stable convergence.

Remarks: (3) (Continuation of Theorem on martingales on p.12)

Suppose that P (η2 > 0) = 1. Since Snkn → Z (stably) where Z = ηN(0, 1) , for any

16

real t, it follows that

eitSnkn → eitZ weakly in L1. Hence E[eitSnknX] → E[eitZX] for any random variable

X which is F -measurable. Let X = eiuη+ivI(E) where −∞ < u, v < ∞ and E ∈ F .Then it follows that the joint characteristic function of (Snkn , η, I(E)) converges to that of

(ηN, η, I(E)) where N is a standard normal random variable independent of (η, I(E)).

Therefore

(η−1Snkn , I(E))L→ (N, I(E))

and hence, if

V 2nkn =

n∑i=1

E(X2ni|Fn,i−1)

p→ η2,

as in the martingale limit theorem on p.12, it follows that

(V −1nknSnkn , I(E)

L→ (N, I(E))

which implies that

V −1nknSnkn

L→ N (stably)

as n→ ∞.

Remarks: (4) The notion of stable convergence is helpful in interchanging the random

norming and non-random norming for obtaining limit theorems for partial sums of mar-

tingale differences in the martingale central limit theory.

17

Lecture 4

Likelihood ratio

Let (Ω,F , P ) be a probability space and Fn be a sequence of sub σ -algebras of

F such that Fn ⊆ Fn+1, n ≥ 1 and Fn ↑ F . Let P ⋆ be another probability measure

defined on (Ω,F). Note that P ⋆ is absolutely continuous with respect to P (P ⋆ ≪ P )

if P (A) = 0 ⇒ P ⋆(A) = 0 for any A ∈ F . If P ⋆ ≪ P, then there exists a random

variable

Z =dP ⋆

dPwhich is F - measurable such that

P ⋆(A) =

∫A

ZdP,A ∈ F .

The random variable Z is the density (Radon-Nikodym derivative) of P ⋆ with respect

to P. If Z > 0 a.s. [P], then the measure P ⋆ and P are equivalent and we write

P ≃ P ⋆.

Let P ⋆n denote the restriction of P ⋆ to Fn and Pn be the restriction of P to

Fn. If P ⋆n ≪ Pn for every n, then we say that P ⋆ is locally absolutely continuous with

respect to P and write P ⋆loc≪ P. Suppose P ⋆

loc≪ P. Let

Zn =dP ⋆

n

dPn

.

Zn is called the local density. For any A ∈ Fn,

∫A

Zn+1dP =

∫A

dP ⋆n+1

dPn+1

dP

=

∫A

dP ⋆n+1

= P ⋆n+1(A)

= P ⋆n(A) since A ∈ Fn

=

∫A

dP ⋆n

=

∫A

dP ⋆n

dPn

dP

=

∫A

ZndP.

Hence E(Zn+1|Fn) = Zn, which implies that Zn,Fn, n ≥ 1 is a martingale and it is

a nonnegative martingale. Hence Zn → Z a.s. [P]. If E[Z] = 1, then E|Zn − Z| → 0

by Scheffe’s theorem and P ⋆ ≪ P and Z = dP ⋆

dP. In fact Zn = E(Z|Fn). In general Z

is the density of the absolutely continuous component of P ⋆ with respect to P.

18

As an application of the above idea, one can obtain the following result which gives a

method for calculating the Radon-Nikodym derivative (Gikhman and Skorokhood (1974)

Theory of Stochastic Processes).

Theorem. Let (Ω,F , P ) be a probability space and Q be another probability measure

on (Ω,F) absolutely continuous with respect to P. Let Ank, k ≥ 1 be a measurable

partition of Ω for each n ≥ 1. Suppose the sequence of partitions is nested. Let

gn(ω) =Q(An,k(ω))

P (An,k(ω))

if P (An,k(ω)) > 0 where An,k(ω) is that set of the sequence Ank,k ≥ 1 which contains

ω. If P (An,k(ω)) = 0, let gn(ω) = 0. Then the sequence gn,Fn, n ≥ 1 is a martingale

where Fn = σ(An1, An2, . . .). Suppose Fn ↑ F as n→ ∞. Then there exists a limiting

function g(ω) such that

gn(ω) → g(ω) a.s as n→ ∞

independent of the sequence of partitions Ank, k ≥ 1;n ≥ 1, and for arbitrary B ∈ F ,

Q(B) =

∫B

g(ω)P (dω).

Estimation by the maximum likelihood method

Consider a stochastic process Xn1n ≥ 1 such that the finite dimensional distribu-

tions of the process are known but for a scalar parameter θ. Suppose θ ∈ Θ open. Let

us suppose that the process is observed up to time “n′′.

Let Ln(θ) be the likelihood function associated with the observation (X1, . . . , Xn).

Let pn(x1, . . . , xn; θ) = Ln(θ) be the joint probability (density) function of (X1, . . . , Xn).

Note that

pn(x1, . . . , xn; θ) = p1(x1; θ)p2(x1, x2, ; θ)

p1(x1; θ)· · · pn(x1, . . . , xn; θ)

pn−1(x1, . . . , xn−1; θ)

= p1(x1, θ)p2(x2; θ|x1) · · · pn(xn; θ|x1, . . . , xn−1).

Hence

log pn(x1, . . . , xn, θ) = log p1(x1; θ)

+log p2(x1, x2; θ)− log p1(x1; θ)

+log pn(x1 . . . , xn; θ)− log pn−1(x1, . . . , xn−1; θ).

19

In other words

log Ln(θ) = log L1(θ)

+log L2(θ)− log L1(θ) + · · ·+ log Ln(θ)− log Ln−1(θ).

For convenience, let us define L0(θ) ≡ 1. Then

log Ln(θ) =n∑

i=1

log Li(θ)− log Li−1(θ) .

Assume that

pn(xn; θ|x1, . . . , xn−1) =pn(x1, . . . , xn; θ)

pn−1(x1, . . . , xn−1; θ)=

Ln(θ)

Ln−1(θ)

is differentiable twice with respect to θ under the (summation) integral sign. and

Eθ

(d log Ln(θ)

dθ

)2

<∞, θ ∈ Θ.

Note that

dlog Ln(θ)

dθ=

n∑i=1

d

dθ[log Li(θ)− log Li−1(θ)]

=n∑

i=1

ui(θ) (say).

Then

Eθ(ui(θ)|Fi−1) = Eθ

(d

dθlog pi(Xi; θ|x1, . . . , xi−1)|Fi−1

)= 0 a.s.(7)

and

(8) Eθ

(u2i (θ)|Fi−1

)= −Eθ

(dui(θ)

dθ|Fi−1

)in view of the assumption made above. Let

(9) In(θ) =n∑

i=1

Eθ(u2i (θ)|Fi−1).

Observe that In(θ) is the partial sum of the conditional information in Xi given X1, . . . , Xi−1

summed over 1 ≤ i ≤ n. Let

(10) Jn(θ) =n∑

i=1

vi(θ) where vi(θ) =dui(θ)

dθ.

20

In view of (7),

(11) d log Ln(θ)

dθ,Fn, n ≥ 1

is a martingale. Furthermore

(12) Eθ(u2i (θ) + vi(θ)|Fi−1) = 0 a.s.

Existence of a consistent solution of the likelihood equation

Observe that

dlogLn(θ)

dθ

∣∣∣∣θ=θ′

=n∑

i=1

ui(θ′)

=n∑

i=1

ui(θ) + (θ′ − θ)n∑

i=1

dui(θ)

dθ

∣∣∣∣θ⋆

=n∑

i=1

ui(θ) + (θ′ − θ)Jn(θ⋆)

=n∑

i=1

ui(θ)− (θ′ − θ)In(θ) + (θ′ − θ)(Jn(θ⋆) + In(θ))

(13)

where θ⋆ = θ + γ(θ′ − θ) with |γ| < 1. Let Xi = ui(θ) and Un = In(θ). Applying the

SLLN stated on p.11, it follows that

(14)

∑ni=1 ui(θ)

In(θ)→ 0 a.s as n→ ∞

provided

(15) In(θ) → ∞ a.s as n→ ∞

and

(16)∞∑1

I−2i (θ)E(u2i (θ)|Fi−1) <∞ a.s.

21

Let an be any sequence of positive numbers and bn =∑n

j=1 aj. Then

∞∑n=1

(n∑1

aj

)−2

an

=∞∑n=1

b−2n (bn − bn−1) (b0 ≡ 0)

=∞∑n=1

bn(b−2n − b−2

n+1

)=

∞∑n=1

bn(b−1n − b−1

n+1)(b−1n + b−1

n+1)

≤ 2∞∑n=1

(b−1n − b−1

n+1) since bn ≤ bn+1

≤ c

b1<∞.

It can now be checked that the condition (15) implies the condition (16) (see Hall and

Heyde, p.158) since In(θ) =∑n

j=1E(u2j(θ)|fj−1). Equation (13) implies that

1

In(θ)

dlogLn(θ)

dθ

∣∣∣∣θ=θ′

=1

In(θ)

n∑i=1

ui(θ)− (θ′ − θ)

+(θ′ − θ)Jn(θ

⋆) + In(θ)

In(θ)(17)

Relation (17) implies that the likelihood equation

dlogLn(θ)

dθ= 0

has a solution in [θ − δ, θ + δ] a.s. if

(C1) In(θ)a.s→ ∞ as n→ ∞, and

(C2) limn→∞

|In(θ)+Jn(θ⋆)|In(θ)

< 1 a.s.

Remarks: Another set of sufficient conditions for the existence of a strongly con-

sistent root are

(C1) In(θ)a.s→ ∞ as n→ ∞, and

(C3) for any δ > 0 such that (θ − δ, θ + δ) ⊂ Θ, there exists K(δ) > 0 and h(δ) ↓ 0

such that

lim infn→∞

Pθ

sup

|θ′−θ|≥δ

1

In(θ)[logLn(θ

′)− logLn(θ)] < −k(δ)

≥ 1− h(δ).

22

Asymptotic Normality

Let us now consider the equation (13), viz.,

dlogLn(θ)

dθ

∣∣∣∣θ=θ′

=n∑

i=1

ui(θ) + (θ′ − θ)Jn(θ⋆).

Let θ′ = θn be a MLE. ThendlogLn(θ)

dθ

∣∣∣∣θ=θn

= 0

and hencen∑

i=1

ui(θ) = (θ − θn)Jn(θ′)

where |θ⋆ − θ| ≤ |θ − θn|. Divide both sides by [In(θ)]12 , we have

1

(In(θ))12

n∑i=1

ui(θ) = (In(θ))12 (θn − θ)

−Jn(θ⋆)In(θ)

Under some conditions

1

I12n (θ)

n∑i=1

ui(θ)L→ N(0, 1) as n→ ∞.

If Jn(θ⋆)In(θ)

p→ −1 as n→ ∞, then it follows that

(In(θ))12 (θn − θ)

L→ N(0, 1) as n→ ∞.

Theorem. Suppose the following conditions hold:

(C1) (i) In(θ)a.s→ ∞ as n→ ∞,

(ii) Jn(θ)EIn(θ)

p→ η2(θ) > 0 a.s. for some random variable η(θ)

(iii) Jn(θ)In(θ)

p→ −1 as n→ ∞uniformly on compact subsets of θ.

(C2) For δ > 0, suppose |θn − θ| ≤ δ/(EθIn(θ))12 . Then

(i) EθnI(θn) = EθI(θ)(1 + o(1)) as n→ ∞(ii) In(θn) = In(θ)(1 + o(1))a.s as n→ ∞(iii) Jn(θn) = Jn(θ) + o(In(θ)) a.s. as n→ ∞.

Then ((EθIn(θ))

− 12dlogLn(θ)

dθ,In(θ)

EθIn(θ)

)L→ (η(θ)N(0, 1), η2(θ))

23

where η(θ) and N are independent. Further more

θna.s→ θ a.s n→ ∞

and

I12n (θ)(θn − θ)

L→ N(0, 1) as n→ ∞.

Proof: Fix c > 0. Let θn = θ + c(EθIn(θ))− 1

2 . Let

∧n = logLn(θn)

Ln(θ).

Apply Taylor’s expansion:

∧n = (θn − θ)n∑

i=1

ui(θ) +1

2(θn − θ)2Jn(θ

⋆n) where |θ⋆n − θn| ≤ |θ − θn|.

Let

Wn(θ) = (EθIn(θ))− 1

2

n∑i=1

ui(θ) =(θn − θ)

c

n∑i=1

ui(θ)

and

Vn(θ) = −(EθIn(θ))−1Jn(θ

⋆n) = −(θn − θ)2

c2Jn(θ

⋆n).

Note thatLn(θn)

Ln(θ)= e∧n

where ∧n = cWn(θ)− 12c2V 2

n (θ).

In other words

ecWn(θ)Ln(θ) = ec2

2V 2n (θ) (⋆)

Let x0 be a continuity point of the distribution function of η(θ). Assumptions (C1)

and (C2) imply that

Vn(θ)p→ η2(θ)

under Pθn and

Pθn(|Vn(θ)| ≤ x0) → Pθ(η2(θ) ≤ x0).

Let f be a bounded continuous function on (−∞,∞) with f(x) = 0 for |x| > x0.

Then

Eθ

[f(Vn(θ))e

cWn(θ)∣∣ |Vn(θ)| ≤ x0

]= Eθn

[f(Vn(θ))e

c2Vn(θ)/2∣∣∣ |Vn(θ)| ≤ x0

]from (*)

→ Eθ

[f(η2(θ))ec

2η2(θ)/2∣∣∣ η2(θ) ≤ x0

]= Eθ

[f(η2x0

(θ))cc2η2x0(θ)/2

]24

where ηx0(θ) has the distribution of η(θ) conditional on η2(θ) ≤ x0. But

Eθ

[f(η2x0

(θ)ec2η2x0 (θ)/2)

]= Eθ

[f(η2x0

(θ))ecηx0(θ)N(0,1)]

where ηx0 is independent of N. Hence the joint distribution of (Wn(θ), Vn(θ)), condi-

tional on |Vn(θ)| ≤ x0, converges to that of (ηx0(θ)N(0, 1), η2x0(θ)). Let x0 → ∞. We

obtain that

g

((EθIn(θ))

− 12dlogLn(θ)

dθ,

In(θ)

Eθ(In(θ))

)L→ g

(η(θ)N(0, 1), η2(θ)

)by the continuous mapping theorem.

Remarks: IfIn(θ)

EθIn(θ)

p→ η2(θ) > 0 as n→ ∞,

then one can replace the random norming In(θ) by the non-random norming EθIn(θ)

and we obtain that

(EθIn(θ))12 (θn − θ)

L→ η(θ)N(0, 1)

where η(θ) and N are independent.

Definition: An estimator Tn of θ is said to be asymptotically first order efficient if

I12n (θ)

[Tn − θ − r(θ)I−1

n (θ)dlogLn(θ)

dθ

]p→ 0

as n→ ∞ for some r(θ) not depending on n or the observations.

Remarks: It can be checked that the MLE is asymptotically first order efficient in the

above sense under the conditions stated above.

25

Lecture 5

Note that

dlogLn(θ)

dθ

∣∣∣∣θ=θ′

=n∑

i=1

ui(θ) + (θ′ − θ)Jn(θ∗)

=n∑

i=1

ui(θ)− (θ′ − θ)In(θ) + (θ′ − θ)(Jn(θ∗) + In(θ)).

Suppose that Jn(θ∗) + In(θ) = 0 a.s. Then

dlogLn(θ)

dθ

∣∣∣∣θ=θ′

=n∑

i=1

ui(θ)− (θ′ − θ)In(θ). (α)

Substituting θ′ = θn, we have

0 =dlogLn(θ)

dθ

∣∣∣∣θ=θn

=n∑

i=1

ui(θ)− (θn − θ)In(θ) (β)

Subtracting ( β ) from ( α ), we have

dlogLn(θ)

dθ

∣∣∣∣θ′= (θn − θ′)In(θ)

and in generaldlogLn(θ)

dθ= (θn − θ)In(θ).

Special case (Conditional exponential family:)

Suppose

(18)dlogLn(θ)

dθ= In(θ)(θn − θ), θ ∈ Θ, n ≥ 1.

Then θn is the MLE. Differentiating with respect to θ on both sides of the equation

(18), we obtain that

(19)d2logLn(θ)

dθ2= I ′n(θ)(θn − θ)− In(θ)

and

Eθ

(d2logLn(θ)

dθ2

∣∣∣∣Fn−1

)= I ′n(θ)E( θn − θ

∣∣∣Fn−1)− In(θ)

= I ′n(θ)Eθ

(dlogLn(θ)

dθ

1

In(θ)

∣∣∣∣Fn−1

)− In(θ)

(from (18))(20)

=I ′n(θ)

In(θ)

dlogLn−1(θ)

dθ− In(θ)

(by the martingale property)(21)

26

But

Eθ

(d2logLn(θ)

dθ2

∣∣∣∣Fn−1

)= Eθ

(d2logLn−1(θ)

dθ2+d2logLn(θ)

dθ2− d2logLn−1(θ)

dθ2

∣∣∣∣Fn−1

)=

d2logLn−1(θ)

dθ2+ Eθ

(d2

dθ2logLn(θ)− logLn−1(θ)

∣∣∣∣Fn−1

)=

d2logLn−1(θ)

dθ2+ Eθ (vn(θ)| Fn−1)

=d2logLn−1(θ)

dθ2− Eθ

(u2n(θ)

∣∣Fn−1

)=

d2logLn−1(θ)

dθ2− (In(θ)− In−1(θ)).(22)

Relations (20) and (21) imply that

I ′n(θ)

In(θ)

dlogLn−1(θ)

dθ− In(θ) =

d2logLn−1(θ)

dθ2− (In(θ)− In−1(θ)).

Hence

(23)I ′n(θ)

In(θ)=

d2logLn−1(θ)dθ2

+ In−1(θ)dlogLn−1(θ)

dθ

.

Relations (18) and (19) imply that

d2logLn(θ)

dθ2=I ′n(θ)

In(θ)

dlogLn(θ)

dθ− In(θ)

and hence

In(θ)d2logLn(θ)

dθ2= I ′n(θ)

dlogLn(θ)

dθ− I2n(θ)

which implies that

In(θ)d2logLn(θ)

dθ2+ In(θ) = I ′n(θ)

dlogLn(θ)

dθ.

Therefore

(24)I ′n(θ)

In(θ)=

d2logLn(θ)dθ2

+ In(θ)dlogLn(θ)

dθ

.

Comparing (22) and (23), we obtain that

I ′n(θ)

In(θ)= C(θ) for all n

27

for some C(θ). This implies that

In(θ) = ϕ(θ)Hn(X1, . . . , Xn−1)

since In(θ) is Fn−1 -measurable for some function ϕ(θ) . Therefore, from the equation

(18), it follows that

(25)dlogLn(θ)

dθ= ϕ(θ)(θn − θ)Hn(X1, . . . , Xn−1)

which implies that

logLn(θ) = Hn(X1, . . . , Xn−1)

∫ϕ(θ)dθ θn −

∫ϕ(θ)θdθ

+Kn(X1, . . . , Xn).

By the factorization theorem, it follows that (θn, Hn(X1, . . . , Xn−1)) is a sufficient statis-

tic for θ. Furthermore

Ln(θ) = expHn(X1, . . . , Xn−1)(r1(θ)θn + r2(θ)) +Kn(X1, . . . , Xn).

Special case of Markov process:

Suppose Xn, n ≥ 0 is a time-homogenous Markov process. Suppose the conditional

probability (density) function of Xn given Xn−1 is f(xn|xn−1, θ). Then the likelihood

function of (x1, . . . , xn) is given by

L⋆n(x1, . . . , xn; θ) ≡ L⋆

n(θ) = g(x0)n∏

i=1

f(xi|xi−1, θ)

(we assume that the initial density g(.) of X0 does not depend on θ ). Since X0 does

not have information about the parameter θ, let us consider the likelihood function to

be

Ln(θ) ≡n∏

i=1

f(xi|xi−1, θ).

Hence

dlogLn(θ)

dθ=

n∑i=1

d

dθlogf(xi|xi−1, θ)

=n∑

i=1

ui(θ), ui(θ) =d

dθlogf(xi|xi−1, θ).

Suppose that

(26)dlogLn(θ)

dθ= In(θ)(θn − θ).

28

In particular, for n = 1, we have from (25),

d

dθlogf(X1|Xo, θ) = I1(θ)(θ1 − θ)

= ϕ(θ)H(X0)(θ1 − θ).

Note that θ1 depends on X0 and X1 and it is a solution of the equation

d

dθlogf(X1|X0, θ) = 0.

Let θ1 = m(x, y) be the solution of the equation

d

dθlogf(y|x, θ) = 0. (25a)

Thend

dθlogf(y|x, θ) = ϕ(θ)H(x)(m(x, y)− θ) (25b)

and hence

logf(y|x, θ) = H(x)m(x, y)

∫ϕ(θ)dθ −H(x)

∫θϕ(θ)dθ +K(x, y)

or equivalently

(27) f(y|x, θ) = [expH(x)m(x, y)J1(θ)−H(x)J2(θ)]K⋆(x, y).

Such a family of distributions is called a Conditional exponential family. Relation (25b)

implies that

(28)dlogLn(θ)

dθ= ϕ(θ)

n∑i=1

H(Xi−1)[m(Xi−1, Xi)− θ]

and

ui(θ) =d

dθlogf(Xi|Xi−1, θ) = ϕ(θ)H(Xi−1)[m(Xi−1, Xi)− θ].

Hence

Eθ[ui(θ)|Fi−1] = ϕ(θ)H(Xi−1)[Eθ(m(Xi−1, Xi)|Fi−1)− θ]

= 0 a.s

by earlier remarks.Therefore

(29) Eθ[m(Xi−1, Xi)|Fi−1]− θ = 0 a.s.

It also follows from (27) that

(30) θn = [n∑

i=1

H(Xi−1)]−1[

n∑i−=1

H(Xi−1)m(Xi−1, Xi)].

29

Note that

Eθ(u2i (θ)|Fi−1) = −Eθ

(d2

dθ2logf(Xi |Xi−1, θ)| Fi−1

).

But

d2logf(Xi|Xi−1, θ)

dθ2= ϕ′(θ)H(Xi−1)[m(Xi−1, Xi)− θ] + ϕ(θ)H(Xi−1)(−1).

Hence

(31) −Eθ

(d2logf(Xi|Xi−1; θ)

dθ2

∣∣∣∣Fi−1

)= ϕ(θ)H(Xi−1)

from (28). Hence

In(θ) =n∑

i=1

Eθ(u2i (θ)|Fi−1)

=n∑

i=1

ϕ(θ)H(Xi−1) = n∑

i=1

H(Xi−1)ϕ(θ).(32)

Using the equations (29) and (31), it can be checked that

(33)dlogLn(θ)

dθ= In(θ)(θn − θ)

from (27). In other words, the relation (32) is a necessary and sufficient condition for the

transition probability (density) function to belong to a conditional exponential family.

30

Lecture 6

Bienayme - Galton - Watson Branching process (Guttorp (1991))

Let Xij, i ≥ 1, j ≥ 1 be i.i.d. random variables taking values in the nonnegative

integers with the probability generating function (p.g.f.),

g(s) =∞∑k=0

skpk,−1 < s ≤ 1.

Let X be a random variable with P (X = k) = pk, k = 0, 1, 2, . . . The distribution of

X is called the offspring distribution. We define the branching process Zk, k ≥ 0 with

the offspring distribution pk, k ≥ 0 recursively by

(34) Z0 = z0, Zk =

Zk−1∑i=1

Xik.

We will assume that z0 = 1 and p0 + p1 < 1 in the following discussion.

The conditional p.g.f. of Zk given Zk−1 = z is given by

E[sZk |Zk−1 = z] = g(s)z

due to the independence of the random variables Xij and hence the p.g.f. of Zk is

gk(s) = E[g(s)Zk−1 ] = gk−1(g(s)).

This implies that E(Zk) = θk provided θ = E(Xij) <∞, and

V (Zk) = σ2θk−1

k−1∑j=0

θj provided σ2 = V (Xij) <∞.

Remarks: Note that once a generation is extinct, all the following generations will be

extinct as well. Extinction will occur in or before the k -th generation in one of the

following ways : the ancestor has

(i) 0 children

(ii) one child whose family becomes extinct in or before (k − 1) -th generation

(iii) two children both of whose families become extinct in or before (k−1) -th generation

and so on.

The probability qk of extinction after k generations is

qk = P (Zk = 0) =∞∑j=0

pjgjk−1(0)

= g(gk−1(0)) = g(qk−1).(35)

31

Assume that p0 > 0 and that p1 > 0. Then the function g(.) is a strictly increasing

function on the interval [0, 1]. Hence the sequence qk = g(qk−1) forms a strictly

increasing sequence of positive numbers bounded by one. Hence qk has a limit, say q,

with p0 ≤ q ≤ 1. Taking limits in (34) we get that

(36) q = g(q).

Henceg(q)− g(qk)

q − qk=q − qk+1

q − qk< 1.

Let k → ∞. We observe that g′(q) ≤ 1. Note that g′(s) is a strictly increasing function

on (0, 1) and hence g(s) is convex.

Proposition : The extinction probability q is the smallest nonnegative root of the

equation g(s) = s. If θ > 1, then 0 ≤ q < 1 with equality occurring if and only if

p0 = 0. If θ ≤ 1, then q = 1 unless p1 = 1 when q = 0.

Example: Suppose the offspring distribution is geometric. Then

pk = p(1− p)k, k = 0, 1, . . .

and hence

g(s) =p

1− (1− p)s.

Therefore θ = g′(1) = 1−pp. The process becomes extinct with probability one if p > 1

2.

If p < 12, then the extinction probability q is a solution of the equation

p

1− (1− p)s= s

with solutions 1 and 1θ. Hence q = 1

θ.

Proposition: If p1 = 1, then Zna.s.→ ∞ as n→ ∞ with probability 1− q.

Proof: If q = 1, there is nothing to prove. Suppose q < 1. Note that

gk(s) = gk−1(g(s))

and hence

g′k(s) = g′k−1(g(s))g′(s)

If s = q, then g(s) = s and hence

g′k(q) = g′k−1(q)g′(q) for all k ≥ 1.

32

Hence

g′k(q) = [g′(q)]k for all k ≥ 1.

Furthermore

P (1 ≤ Zn ≤ k) ≤k∑

j=1

P (Zn = j)

≤k∑

j=1

P (Zn = j)jqj−1

qksince q < 1

≤ g′n(q)

qk=

[g′(q)]n

qk

Therefore∞∑n=1

P (1 ≤ Zn ≤ k) ≤ 1

qk

∞∑n=1

[g′(q)]n <∞

provided g′(q) < 1. Hence, by the Borel-Cantelli lemma, it follows

P (1 ≤ Zn ≤ k infinitely often) = 0 for any k

that is

Zn → ∞ a.s as n→ ∞.

(Ref: Guttorp “Statistical Inference for Branching process”).

As pointed out earlier, we assume that Z0 ≡ 1 in the following discussion.

Theorem: Suppose that σ2 = VarZ1 < ∞ and EZ1 = θ. Let Wn = Zn

θnand Fn =

σ(Z0, . . . , Zn). Then

(i) Wn,Fn, n ≥ 0 is a martingale;

(ii) Wn → W a.s. where P (W = 0) = q where q is the smallest nonnegative root

of the equation g(s) = s (here g(.) is the probability generating function of the

offspring distribution, that is, g(s) =∑∞

j=0 sjpj );

(iii) W > 0 = Zn → ∞ a.s;

(iv) if θ > 1, then EW = 1 and V (W ) = σ2

θ(θ−1);

(v) the m.g.f. ϕ(s) = E[e−sW ] satisfies the equation ϕ(θs) = g(ϕ(s)), and

(vi) if θ > 1 and σ2 <∞, then the distribution of W is absolutely continuous except

for a jump of size q at 0.

33

Special case : Suppose a random variable X has the offspring distribution

P (X = j) = pj =ajλ

j

f(λ), j ≥ 0.

Then

E(X) =∞∑j=0

jpj =∞∑j=0

jajλ

j

f(λ)=

∞∑j=1

jajλ

j

f(λ)= λ

∞∑j=1

jajλj−1

f(λ)=λf ′(λ)

f(λ).

Note that

f(λ) =∞∑j=0

ajλj,

and

f ′(λ) =∞∑j=1

ajjλj−1, f ′′(λ) =

∞∑j=2

ajj(j − 1)λj−2.

Hence

E(X) = λf ′(λ)f(λ)

= θ and dθdλ

= f(λ)[λf ′′(λ)+f ′(λ)−λf ′2 (λ)]f2(λ)

.

Further more

σ2 = Var(X) = E(X2)− (E(X))2 = E(X(X − 1)) + EX − (EX)2

Now

E(X(X − 1)) =∞∑j=0

j(j − 1)pj =∞∑j=2

j(j − 1)ajλ

j

f(λ)

=λ2f ′′(λ)

f(λ).

Hence

σ2 =λ2f ′′(λ)

f(λ)+λf ′(λ)

f(λ)− λ2f ′2(λ)

f 2(λ)=λ2f ′′(λ)f(λ) + λf ′(λ)f(λ)− λ2(f ′(λ))2

f 2(λ)

= λ

λf ′′(λ)f(λ) + f ′(λ)f(λ)− λf ′2(λ)

f 2(λ)

= λ

dθ

dλ.

Example :(Branching process)(Bienayme-Galton-Watson process)

Let Z0, Z1, . . . , Zn, . . . be the consecutive generation sizes with Z0 = 1.

Let θ = E(Z1). Suppose that 1 < θ <∞. Assume that σ2 = Var(Z1) <∞. Let

pj = P (Z1 = j), = 0, 1, 2, . . .

34

Assume that pj belongs to a family of power series distributions as discussed above.

Then

pj =ajλ

j

f(λ), j = 0, 1, 2, . . . where λ > 0 is fixed constant,

aj ≥ 0 and f(λ) =∑∞

j=0 ajλj. We have noted that

θ =λf ′(λ)

f(λ)σ2 = λ

dθ

dλ.

We assume that θ > 1. Then Zn → ∞ with probability 1−q where q is the probability

of extinction (Biometrika 62, 49-59 (1975)). Let p(x|y, λ) be the transition probability

function of the process Zk, k ≥ 1. It can be checked that

d

dθlog p(x|y, λ) = σ−2(x− θy)

and

In(θ) = σ−2

n−1∑i=0

Zi.

Note that Z0 = 1, Z1, . . . , Zn is a realization of a Markov chain with the transition

probabilities

p(Zk|Zk−1, λ) ∝ λZkf(λ)−Zk−1

and hence the likelihood function

Ln(λ) = Πnk=1p(Zk|Zk−1, λ) ∝ f(λ)−

∑nk=1 Zk−1λ

∑nk=1 Zk .

Therefore

dlogLn(λ)

dλ=

d

dλ

(−

n∑k=1

Zk−1)logf(λ) + (n∑

k=1

Zk)logλ

= (−n∑

k=1

Zk−1)f ′(λ)

f(λ)+ (

n∑k=1

Zk)1

λ

=1

λ

[(−

n∑k=1

Zk−1)λf ′(λ)

f(λ)+ (

n∑k=1

Zk)

].

HencedlogLn(λ)

dλ=

1

λ

[(

n∑k=1

Zk)− θ(n∑

k=1

Zk−1)

]= 0

provided

θ =

∑nk=1 Zk∑n

k=1 Zk−1

.

35

Note that σ2 = λ dθdλ

and hence dθdλ

= σ2

λ> 0. Therefore θ(.) is a strictly increasing

function of λ and we can reparametrize the problem through θ. Observe that

dlogLn(λ)

dθ=

dlogLn(λ)

dλ

dλ

dθ=dlogLn(λ)

dλ

λ

σ2

=1

σ2

[(

n∑k=1

Zk)− θ(n∑

k=1

Zk−1)

]=

1

σ2

n∑k=1

(Zk − θZk−1)

=n∑

k=1

uk(θ), uk(θ) =Zk − θZk−1

σ2.

Note that E[uk(θ)|Zk−1] = 0 since E(Zk|Zk−1) = θZk−1 and the conditional information

is given by

In(θ) =n∑

k=1

E[u2k(θ)|Fk−1] = −n∑

k=1

E[duk(θ)

dθ|Fk−1]

=n∑

k=1

Zk−1

σ2=

1

σ2

n∑k=1

Zk−1.

Note that ζn(θ) ≡ E(In(θ)) =1σ2

∑nk=1 θ

k−1 = 1σ2

θn−1θ−1

. It is easy to see that Zn

θn,Fn, n ≥

0 is a martingale since

E

(Zn

θn

∣∣∣∣Fn−1

)=Zn−1

θn−1

and it is a nonnegative martingale. Hence

Wn ≡ Zn

θna.s→ W (say) n→ ∞

where W ≥ 0 a.s. We now apply the Toeplitz Lemma (Loeve (1963)), viz,

xn → x⇒ 1

σn

n∑k=0

akxk → x if σn =n∑

k=0

ak ↑ ∞ as n→ ∞.

Note that

θn =

∑nk=1 Zk∑n

k=1 Zk−1

=

∑nk=0 Zk − Z0∑n−1

k=0 Zk

=

∑nk=0 Zk∑n−1k=0 Zk

− 1∑n−1k=0 Zk

≃∑n

k=0 Zk∑n−1k=0 Zk

on [W > 0]

and hence

θn =

∑nk=0 Zk∑n−1k=0 Zk

=

∑nk=0 Zk∑nk=0 θ

k.

∑nk=0 θ

k∑n−1k=0 θ

k.

∑n−1k=0 θ

k∑n−1k=0 Zk

→ W limn→∞

θn+1 − 1

θn − 1.1

W= θ wheneverW > 0.

36

Therefore

θn =

∑nk=1 Zk∑n

k=1 Zk−1

→ θ a.s. on the set [W > 0]

This proves that strong consistency of the estimator on the set [W > 0]. The strong

consistency might not hold on the set [W = 0] which might have positive probability.

Furthermore

In(θ)

ζn(θ)=

∑nk=1 Zk−1∑nk=1 θ

k−1

=

n∑

k=1

θk−1.Zk−1

θk−1

1∑n

k=1 θk−1

→ W a.s. as n→ ∞.

Note that

(In(θ))− 1

2d log Ln(θ)

dθ=

(1

σ2

n∑k=1

Zk−1

)− 12 n∑

k=1

Zk − θZk−1

σ2

=

∑nk=1 uk(θ)

∑n

k=1E(u2k(θ)|Fk−1)

12

L→ Z ∼ N(0, 1) as n→ ∞

by the martingale CLT. However

In(θ)

ζn(θ)→ W a.s. as n→ ∞.

Note that

(In(θ))− 1

2d log Ln(θ)

dθ

=

(1

σ2

n∑k=1

Zk−1

) 12 ∑n

k=1(Zk − θZk−1)∑nk=1 Zk−1

= (In(θ))12 (θn − θ)

and

(In(θ))− 1

2d log Ln(θ)

dθ= (In(θ))

12 (θn − θ).

Hence

(In(θ))12 (θn − θ)

L→ N(0, 1) as n→ ∞ (Random norming)

and

(ζn(θ))12 (θn − θ)

L→ W12N(0, 1) as n→ ∞ (Nonrandom norming)

37

where ζn(θ) = E[In(θ)], and W and N are independent. In other words the asymptotic

distribution of the maximum likelihood estimator is not normal. Such models are called

non-ergodic models.

Special case: Suppose that θ > 1 and

(⋆) pj = P (Z1 = j|Z0 = 1) =1

θ(1− 1

θ)j−1, j = 1, 2, . . . ;

that is the off-spring distribution is geometric. Then E(X1) = θ and the probability of

extinction q = 0. Furthermore

In(θ)

ζn(θ)=σ−2

∑n−1i=0 Zi

σ−2∑n−1

i=0 θi=

∑n−1i=0 Zi∑n−1i=0 θ

i

a.s.→ W as n→ ∞

where W is standard exponential. In fact ϕ(s) = E[e−sW ] satisfies the equation

(⋆⋆) ϕ(θs) =1θϕ(s)

1− (1− 1θ)ϕ(s)

.

A solution of the equation ( ⋆⋆ ) is ϕ(s) = λλ+s

where λ > 0. Since E[W ] = 1, it follows

that λ = 1 and W is exponential with mean 1.

Bayesian Estimation: Suppose the off-spring distribution is Poisson with mean θ.

Further assume that the parameter θ has a prior density which is Gamma with the

parameters α and β, that is,

p(θ) =e−θβθα−1βα

Γ(α), 0 < θ <∞

= 0 otherwise

where α > 0 and β > 0 are known. We have seen that the likelihood function Ln(θ)

is proportional to

exp(−θn∑

k=1

Zk−1)θ∑n

k=1 Zk .

Hence the posterior density of θ , given (Z0, . . . , Zn), is proportional to

exp(−θ(β +n∑

k=1

Zk−1))θα+

∑nk=1 Zk−1.

Therefore the posterior density of the parameter θ is again Gamma with the parameters

α+∑n

k=1 Zk and β+∑n

k=1 Zk−1. The mean of the posterior density is the Bayes estimator

of the parameter θ under the quadratic loss function. It is given by

θn =α +

∑nk=1 Zk

β +∑n

k=1 Zk−1

.

38

It can be checked that θn is asymptotically equivalent to the MLE

θn =

∑nk=1 Zk∑n

k=1 Zk−1

on the set of non-extinction, that is, on the set [W > 0].

Least squares approach: Let us again consider the BGW branching process process as

discussed earlier. We have seen that

E(Zn+1|Zn) = Znθ

and

V ar(Zn+1|Zn) = Znσ2.

Let Un+1 be defined by the relation

(∗)Zn+1 = θZn + Z1/2n Un+1, n ≥ 0, Z0 = 1.

Check that (i) E(Uk) = 0, k ≥ 1 (ii) V ar(Uk) = σ2, k ≥ 1 ,and (iii) E(UkUj) = 0, 1 ≤j ≤ k − 1, E(UkZk−1) = 0, k ≥ 1.

The relation (*) is an autoregressive type model for the process Zk, k ≥ 0. Since

the error terms in (*) satisfy classical assumptions in the theory of least squares, we may

consider the least squares approach for the estimation of the parameter θ. This is done by

minimizing the error sum of square∑n

k=1 U2k with respect to θ. This gives the estimator

θ∗n =

∑nk=1 Zk∑n

k=1 Zk−1

which is the same as the MLE if the off-spring distribution is the power series distribution.

The variance σ2 can be estimated by the residual sum of squares, namely,

σ2∗ =1

n

n∑k=1

(Zk − θ∗nZk−1)2

Zk−1

.

39

Lecture 7

Estimation by conditional least squares

Let Xn, n ≥ 1 be a stochastic process defined on a probability space (Ω,F , Pθ),

θ = (θ1, . . . , θp) ∈ Θ ⊂ Rp, Θ open. Consider

Qn(θ) =n∑

k=1

[Xk − Eθ(Xk|Fk−1)]2 .

We estimate θ by minimizing Qn(θ) over Θ. We assume that Qn(θ) has partial deriva-

tives with respect to θi, 1 ≤ i ≤ p.

Assume that Eθ(Xn|Fn−1) is a.s. twice continuously differentiable with respect to

θ in some neighbourhood S of the true parameter (say) θ = (θ1, . . . , θp) ∈ Θ open.

Applying the Taylor series expansion, we have

Qn(θ) = Qn(θ) + (θ − θ)′

∂Qn(θ)

∂θ

∣∣∣∣θ=θ

+1

2(θ − θ)′

∂2Qn(θ)

∂θ2

∣∣∣∣θ=θ⋆

(θ − θ) (⋆)

where ∥θ − θ⋆| ≤ ∥θ − θ∥. Hence

Qn(θ) = Qn(θ) + (θ − θ)′

∂Qn(θ)

∂θ

∣∣∣∣θ=θ

+1

2(θ − θ)′Vn(θ − θ)

+1

2(θ − θ)′Tn(θ

⋆)(θ − θ)(37)

where

(38) Tn =∂2Qn(θ)

∂θ2

∣∣∣∣θ=θ⋆ − Vn and Vn =∂2Qn(θ)

∂θ2

∣∣∣∣θ=θ

.

Theorem 1:(Klimko and Nelson (1978)) (Consistency) Suppose that

(i)

(39) limn→∞

limδ↓0

sup||θ∗−θ||≤δ

1

nδ|Tn(θ⋆)ij| <∞. 1 ≤ i, j ≤ p,

(ii)

(40) (2n)−1Vna.s→ V

where V is a positive definite (symmetric) p× p matrix of constants and

40

(iii)

(41)1

n

∂Qn(θ)

∂θi

∣∣∣∣θ=θ

a.s→ 0, 1 ≤ i ≤ p.

Then there exits a sequence of estimators θn such that

θna.s→ θ

and for any ε > 0 there exists an event E with P (E) > 1− ε and an n0 such that on

E, for n > n, θn satisfies the equation

(42)∂Qn(θ)

∂θ

∣∣∣∣θ=θn

= 0

and Qn attains local minimum at θn.

Proof: Given ε > 0 and condition (38)-(40), applying Egorov’s theorem, we can find an

event E with P (E) > 1− ε, 0 < δ⋆ < δ, M > 0 and an n0 such that on E, for any

n > n0 and θ ∈ Nδ⋆ (an open sphere with center at θ and radius δ⋆ ) such that

(a)∣∣∣(θ − θ)′ ∂Qn(θ)

∂θ

∣∣∣θ=θ

| < nδ3,

(b) the minimum eigenvalue of (2n)−1Vn is greater than some ∆ > 0 and

(c) 12(θ − θ)′Tn(θ

⋆)(θ − θ) < nMδ3.

Using ( ⋆ ) for θ on the boundary Nδ⋆ , we have

Qn(θ) ≥ Qn(θ) + n(−δ3 + δ2∆−Mδ3)

= Qn(θ) + nδ2(−δ +∆−Mδ).

Since ∆− δ −Mδ can be made positive by choosing δ sufficiently small, Qn(θ) must

attain a minimum at some θn = (θn1, . . . , θnp) in Nθ⋆ at which point the least squares

equation (37) must be satisfied on E for any n > n0.

Replace ε by εk = 2−k and δ by δk = 1k, k ≥ 1 to determine sequence of events

Ek and sequence of sets Ek and an increasing sequence nk such that the equation

(41) has a solution on Ek for any n > nk. For nk < n ≤ nk+1, define θn on Ek to

be a solution of (41) within δk of θ and at which Qn attains a relative minimum and

define θn to be zero off Ek. Then

θn → θ on lim infk→∞

Ek,

41

but

1− P(lim infk→∞

Ek

)= P

(limsupk→∞

Eck

)= lim

k→∞P(∪∞

j=kEcj

)≤ lim

k→∞

∞∑j=k

2−j = 0.

Asymptotic normality

Asymptotic normality of the estimator θn can be obtained if the linear term in (36)

has asymptotically multivariate normal distribution. This can be verified by the Cramer-

Wold technique and appropriate Central limit theorem for martingales. Note that

n− 12λ′ ∂Qn(θ)

∂θ

∣∣∣∣θ=θ

= −2n− 12

n∑k=1

[p∑

i=1

λi

∂Eθ(Xk|Fk−1)

∂θi

∣∣∣∣θ=θ

](Xk − Eθ(Xk|Fk−1))(43)

where λ = (λ1, . . . , λp) ∈ Rp is arbitrary non zero vector. Furthermore

p∑k=1

λi∂Eθ(Xk|Fk−1)

∂θi

∣∣∣∣θ=θ

is an Fk−1 - measurable function. Hence, from (42), it follows thatn− 1

2 λ ′ ∂Qn(θ)

∂θ

∣∣∣∣θ=θ

,Fn, n ≥ 1

is a martingale. If

1

2n− 1

2 λ ′ ∂Qn(θ)

∂θ

∣∣∣∣θ=θ

L→ N(0, λ ′W λ )

for any λ = 0 where W is a p× p covariance matrix, then

(44)1

2n− 1

2∂Qn(θ)

∂θ

∣∣∣∣θ=θ

L→ N(0,W ) as n→ ∞.

Theorem 2: Suppose the conditions of Theorem 1 hold. In addition, suppose that

limn→∞

limδ↓0

sup||θ∗−θ||≤δ

1

nδ|Tn(θ⋆)ij| = 0, ≤ i, j ≤ p,

and

(45)1

2n− 1

2∂Qn(θ)

∂θ

∣∣∣∣θ=θ

L→ N(0,W )

42

as n→ ∞ where V is as defined by (39). Then

n1/2(θn − θ)L→ N(0, V −1WV −1)

as n→ ∞.

Proof: Let θn be as given by Theorem 1. Note that θn satisfies (41). Expanding

n− 12∂Qn(θ)

∂θ

∣∣∣∣θ=θn

in a Taylor’s expansion about θ, we have, by (37),

0 = n− 12∂Qn(θ)

∂θ

∣∣∣∣θ=θn

= n− 12∂Qn(θ)

∂θ

∣∣∣∣θ=θ

+ n−1 (Vn + Tn(θ⋆))n

12 (θn − θ).

Since n−1(Vn + Tn(θ⋆))

a.s→ 2V as n→ ∞ by (38) and (39), it follows that

n12 (θn − θ)

has the same asymptotic distribution as that of

−(2V )−1n− 12∂Qn

∂θ

∣∣∣∣θ=θ

.

This proves that

n12 (θn − θ)

L→ N(0, V −1WV −1)

in view of (44).

Example : (BGW process with immigration): Consider a subcritical BGW process

Zn, n = 0, 1, . . . with immigration. Suppose that the process has an initial distribution

for Z0 with EZ20 <∞. Let m and λ be the means of the offspring distribution and im-

migration distribution respectively. Assume that these distribution have finite variances.

The problem is to estimate θ = (m,λ) based on Z0, . . . , Zn. Note that the (n + 1) -th

generation is obtained from the independent reproduction of each of the individuals in the

n -th generation plus an independent immigration input with immigration distribution.

Thus,

Eθ(Zn+1|Fn) = mZn + λ.

Let

Qn(θ) =n∑

k=1

[Zk − Eθ(Zk|Fk−1)]2

=n∑

k=1

[Zk −mZk−1 − λ]2.

43

Then∂Qn(θ)

∂m= 2

n∑k−1

[Zk −mZk−1 − λ](−Zk−1)

and∂Qn(θ)

∂λ= 2

n∑k=1

[Zk −mZk−1 − λ](−1).

Equating the above to zero, we obtain that

mn =n∑n

i=1 Zi−1Zi − (∑n

i=1 Zi−1)(∑n

i=1 Zi)

n∑n

i=1 Z2i−1 − (

∑ni=1 Zi−1)2

and

λn =1

n

n∑

i=1

Zi − mn

n∑i=1

Zi−1

.

It can be shown that the process Zn is a Markov process with a stationary distribution.

If Z0 has this stationary distribution, then the process Zn is stationary and ergodic

and the erogodic theorem can be applied and we have

1

n

n∑i=1

Zia.s→ E(Z0) = λ(1−m)−1 ≡ r1 (say)

1

n

n∑i=1

Z2i

a.s→ E(Z20) = c2(1−m2)−1 + r21 = r2 (say)

and

1

n

n∑i=1

ZiZi−1a.s→ mc2(1−m2)−1 + r21

where c2 = b2 + σ2λ(1−m)−1

and σ2 and b2 are the variance of the offspring and immigration distribution. Note the

r1 and r2 are the first and second moments of Z0.

In fact, the above results hold even if the initial distribution is a general distribution( ⋆ )

and we get that

mna.s→ m, λn

a.s→ λ

( ⋆ : Follows from Billingsley (1961): Statistical Inference for Markov processes; Revesz

(1968): Law of large numbers).

Suppose the offspring and the immigration distribution have finite third moments.

Then r3 = E(Z30) <∞. It can be shown that

n12

(mn −m

λn − λ

)L→ N(0, V −1WV −1)

44

where

V −1 = c−2(1−m2)

(1 −r1−r1 r2

)and

W =

(σ21r3 + σ2

2r2 σ21r2 + σ2

2r1

σ21r2 + σ2

2r1 σ21r1 + σ2

2

)Here σ2

1 and σ22 are defined by the relation

Var(Z1|Z0) = σ21Z0 + σ2

2.

(Ref. : Hall and Heyde (1980), p. 180-181).

Method of moments

This method does not generally lead to an estimator with any “optimal property” but

it is easy to implement. We illustrate the method through two examples.

Example 1 : Let Z0 = 1, Z1, Z2, . . . be a super critical BGW branching process. Let

θ = EZ1 > 1 and 0 < VarZ1 = σ2 <∞. Suppose the problem is to estimate θ and σ2

on the basis of a single realization Zk, 0 ≤ k ≤ n+ 1.Since

Zk+1 = Xk1 + · · ·+XkZk

where, conditional on Zk, the Xki, 1 ≤ i ≤ Zk are i.i.d. random variables each with the

distribution of Z1, we have

E(Zk+1|Zk) = θZk a.s i.e., E(Zk+1

Zk

|Zk) = θ

and

E((Zk+1 − θZk)2|Zk) = σ2Zk a.s i.e., E

((Zk+1 − θZk)

2

Zk

∣∣∣∣Zk

)= σ2 a.s.

Suppose that P (Z1 = 0) = 0. Note thatZn

θn,Fn, n ≥ 0

is a nonnegative martingale and

Zn

θna.s→ W (say) as n→ ∞.

It is known that W is non-degenerate and positive a.s. (Harris (1963)) and hence

θn =Zn+1

Zn

→ θ a.s as n→ ∞.

45

Let∼θn=

1n

∑nj=0 Zj+1Z

−1j . Then

∼θn can be considered as a moment estimator for m. In

fact∼θn→ θ a.s as n→ ∞.

However θn is a better estimator than θn as far as the rate of convergence to θ is

concerned (Heyde and Leslie (1971) Bull. Austral. Math. Soc. 5, 145-155). An estimator

by the method of moments for σ2 is

σ2n =

1

n

n∑k=0

(Zk+1 − θnZk)2

Zk

.

It is clear that

E

((Zk+1 − θZk)

2

Zk

− σ2

∣∣∣∣Z0, Z1, . . . Zk

)= 0

and hence n∑

k=0

(Zk+1 − θZk)2

Zk

− σ2,Fk, k ≥ 0

forms a martingale where Fk = σ(Z0, Z1, . . . , Zk). An application of the SLLN proves

that

limn→∞

1

n

n∑k=1

(Zk+1 − θZk)

2

Zk

− σ2

= 0 a.s.

One can prove that “ θ ” in the above equation can be replaced by “ θn ” by applying the

result

θn − θ = σζ(n)(2Z−1n logn)

12

where lim supn ζ(n) = 1a.s and lim infn ζ(n) = −1a.s (Heyde (1974) Advances in Appl.

Prob. 3, 421-433.)

Example 2 : Consider a stochastic process Xn governed by the model

Xn = εn + αXn−1εn−1, n ≥ 1

where εi, i ≥ 1 are i.i.d. with Eε0 = 0, σ2 = Eε20 <∞, Eε30 = 0.

We assume that α2σ2 < 1. Note that, for k large,

a = EXk = ασ2

and

b = E(XkXk−1) = α[Eε30 + 2ασ4] = 2α2σ4.

Let

an =1

n

n∑k=1

Xk and bn =1

n

n∑k=1

XkXk−1.

46

It can be shown that

(⋆)an → a a.s

bn → b a.s

and one can estimate α and σ2. In fact

n12 (an − a)

L→ N(0,Var(X0) + 2 cov(X0, X1))

and if Eε60 <∞, then

n12 (bn − b)

L→ N(0,Var(X0X1) + 2 cov(X0X1, X1X2) + 2 cov(X0X1, X2X3).

Remarks : If the process Xk,−∞ < k <∞ is stationary, then

Xn = εn + αε2n−1 +∞∑k=2

αkε2n−k

k−1∏j=0

εn−j

and the results given in ( ⋆ ) can be proved.

47

Lecture 8

Likelihood ratios in abstract space

Let X be the sample space and Θ = 0, 1 be the parameter space. Let P0 and P1

be probability measures defined on a measurable space (X ,B). A fundamental problem

is to test the null hypothesis

H0 : θ = 0 againstH1 : θ = 1.

In the Neyman-Pearson formulation, we choose a critical region W ⊂ X such that if the

observed x ∈ W, we reject H0 and if x ∈ W, we do not reject H0 (accept H0 ). The

performance of the test is determined by

α = level of significance of the test = P0(W )

and the power γ = P1(W ). Neyman-Pearson lemma gives a method to find a test which

maximizes γ for a given α. It gives a test based on the likelihood ratio.

In a general abstract space, there is no measure equivalent to the Lebesgue measure

on Rn and hence the concept of likelihood is not possible to formulate. We let the Radon-

Nikodym derivative play the role of the likelihood ratio. The basic problem is therefore

to find a method for calculating the Radon-Nikodym derivative whenever it exists.

Given P0 and P1, there exists a measurable set H contained in X with P0(H) =

0 and a non -negative function f integrable with respect to P0 such that for any

measurable set E ⊂ X ,

P1(E) =

∫E

f(x)P0(dx) + P1(E ∩H).(⋆)

This result is known as the Lebesgue-decomposition. The function f is the Radon-

Nikodym derivative and it will be denoted by dP1

dP0(x). Note that dP1

dP0(x) is the Radon-

Nikodym derivative of the absolutely continuous component of P1 with respect to P0.

Recall that if the set H has P1(H) = 1, then the measures P0 and P1 are singular

with respect to each other. If this happens, then the critical region W = H will allow

perfect (probability one) discrimination between H0 and H1. The test will give the

correct result with the first and second kind of errors zero.

If P0 and P1 are both absolutely continuous with respect to each other, then they

are said to be equivalent. In such a case P1(H) = 0 and f = 0 with P0 -probability

one.

It is possible that P0 and P1 are neither singular nor equivalent. However if P0

and P1 are Gaussian measures, then they are either equivalent or singular with respect

to each other (Hajek (1958), Feldman (1958)).

48

Let X = R∞ and X n = (X1, . . . , Xn) and gni(X n) be the joint p.d.f. with

respect to the Lebesgue measure on Rn under Pi for i = 0 and 1. Let B be the

σ -algebra generated by all the cylinder sets with finite dimensional base and Bn be the

Borel σ -algebra in Rn. Let x n = (x1, . . . , xn) and

fn( x 0) =gn1( x n)

gn0( x n), x 0 = (x1, x2, . . . , ) ∈ X = R∞.

Suppose fn( x 0) is defined a.e. P0. Let H be the set as defined above in ( ⋆ ). Then

(i) fn( x )a.s→ f( x ) (P0)

(ii) fn( x )p→ f( x ) (P1) in Hc , and

(iii) fn( x ) → +∞ (P1) in H.

(Ref.: Grenander (1950)).

Proof: (i) Suppose H is such that P1(H) = 0. Hence P0 and P1 are absolutely

continuous with respect to each other. Then the sequence fn forms a martingale and

by the martingale convergence theorem and

fn( x )a.s→ f( x ) (P0) as n→ ∞

where f( x ) = dP1

dP0( x ) as in ( ⋆ ). Suppose H is such that 0 < P1(H) ≤ 1 . For proofs

of (ii) and (iii), see Grenander (1950) p. 108-110.

Neyman-Pearson Lemma : Suppose P1 is absolutely continuous with respect to P0.

Let f(x) = dP1

dP0(x). If the critical region W is of the form

W = x|f(x) ≥ c ⊂ X ,

then W is the “best” critical region of given size. In other words, no other critical region

at the same level of significance has greater power.

Proof: Let V ⊂ X such that P0(V ) = P0(W ). Note that

P0(W ∩ V c) = P0(W )− P0(W ∩ V )

= P0(V )− P0(W ∩ V ) = P0(V ∩W c)(46)

Hence

P1(W ∩ V c) =

∫W∩V c

f(x)P0(dx)

≥ c

∫W∩V c

P0(dx) sinceW = [x : f(x) ≥ c]

= c P0(W ∩ V c)

= c P0(V ∩W c) (by (46)).(47)

49

However

P1(V ∩W c) =

∫V ∩W c

f(x)P0(dx)

≤ c

∫V ∩W c

P0(dx) sinceW c = [x : f(x) < c]

= c P0(V ∩W c).(48)

Combining (47) and (48), we get that

(49) P1(W ∩ V c) ≥ P1(V ∩W c).

Adding P1(W ∩ V ) on both sides, we get that

P1(W ) ≥ P1(V )

which completes the proof of the result.

We have the following theorem for best Bayesian test.

Theorem: If P1 is absolutely continuous with respect to P0 and if the apriori proba-

bilities of the two hypotheses H0 and H1 are π0 and π1 respectively, then the ”best”

test, in the sense of minimizing the probability of an error, is given by the critical region

W = [ x : f( x ) >π0π1

].

Proof: The probability of the test leading to the wrong result is

α = π0 P0(RejectH0) + π1 P1(RejectH1)

= π0P0(W ) + π1P1(Wc)

when W is the critical region. Hence

α = π0

∫W

P0(dx) + π1

∫W c

P1(dx) = π0

∫W

P0(dx) + π1

∫W c

f(x)P0(dx)

= π1 +

∫W

(π0 − π1f(x))P0(dx)

since ∫W c

f( x )P0(d x ) = 1−∫W

f( x )P0(d x ).

To minimize α, we should choose W in such a way that the integral is as small as

possible. This can be done by choosing W as the set where the integrand is negative.

Note that the integrand is negative when

π0 − π1f( x ) < 0, that is, f( x ) >π0π1.

Remarks: The best critical region need not be unique.

50

Lecture 9

Representation of a second order stochastic process

Let X(t), t ∈ T be a stochastic process with E[X(t)]2 < ∞ for all t ∈ T.

Let m(t) = E[X(t)] and r(s, t) = cov(X(s), X(t)). The fundamental problem is how

to represent a stochastic process, possibly with a complicated dependence structure, as a

linear combination of “simple” elements. Here “simple” means orthogonal (uncorrelated).

Mercers’ Theorem : Consider a symmetric non-negative definite continuous function

r(s, t) on [a, b]× [a, b] and the integral equation

λϕ(t) =

∫ b

a

r(s, t)ϕ(s)ds.(⋆)

The eigenvalues λ1, λ2, . . . and the associated normalized eigenfunctions ϕ1, ϕ2, . . . sat-

isfy

r(s, t) =∞∑i=1

λiϕi(s)ϕi(t)

in L2 -sense as well as with absolute and uniform convergence. Note that ϕv are

orthogonal.

Remarks : Note that r(., t) ∈ L2([a, b]). Hence, for fixed t,

r(., t) =∞∑v=1

ρv(t)ϕv(.)

where

ρv(t) =

∫ b

v

r(s, t)ϕv(s)ds

= λvϕv(t).

Hence r(s, t) =∑∞

i=1 λiϕi(s)ϕi(t).

Karhunen-Loeve expansion :(Karhunen (1947), Loeve (1946)) Let X(t), t ∈ T =

[a, b] be a second order process continuous in the mean on [a, b] , that is

E|X(t+ h)−X(t)|2 → 0 as h→ 0.

Define λv and ϕv for v ≥ 1 as above through the covariance function r(s, t) of the

process X(t), t ∈ T. Introduce the variables

Zv =

∫ b

a

X(t)ϕv(t)dt.

51

Note that Zv are uncorrelated and form the expansion

Z(t) =∞∑v=1

ϕv(t)Zv.

Then the expansion holds in L2 -mean and P (Z(t) = X(t)) = 1, t ∈ T.

Applications:

Example : (Test for the mean value function of a Gaussian process X(t), t ∈ [0, 1]with known covariance function r(s, t) )

We want to test the hypothesis

H0;E[X(t)] = m0(t) against the alternativeH1 : E[X(t)] = m1(t).

We take the coordinates of the process

Zv =

∫ 1

0

X(t)ϕv(t)dt

as observables where ϕv(t) are as defined by ( ⋆ ). Note that the random variables Zvare independent normal random variables. In fact, under Hi, Zv ∼ N(aiv, λv) where

aiv =

∫ 1

0

mi(t)ϕv(t)dt.

It is clear that Ei(Zv) = aiv. Let

Y (t) = X(t)−mi(t), 0 ≤ t ≤ 1.

Note that E[Y (t)Y (s)] = r(t, s) and

Var(Zv) =

∫ 1

0

∫ 1

0

E[(Y (t)Y (s)]ϕv(t)ϕv(s)dt ds

=

∫ 1

0

∫ 1

0

r(t, s)ϕv(t)ϕv(s)dt ds

=

∫ 1

0

λvϕv(t)ϕv(t)dt (by (⋆))

= λv

∫ 1

0

ϕ2v(t)dt = λv.

Suppose that λv = 0 for all v (this holds when r(s, t) is positive definte). In other

words, assume that the covariance function r(s, t) is positive definite. Then

pn(z) =πnv=1(2πλv)

− 12 exp− 1

2λv(Zv − a1v)

2πnv=1(2πλv)

− 12 exp− 1

2λv(Zv − a0v)2

= expqn(z)

52

where

qn( z ) =n∑

v=1

Zv(

a1v − a0vλv

)− (a21v − a20v

2λv)

=

n∑v=1

ζv (say).

Suppose that∞∑v=1

(a1v − a0v)2

λv<∞.

Then

E0(ζv) = −(a1v − a0v)2

2λv

E1(ζv) =(a1v − a0v)

2

2λv

and

Var(ζv) =(a1v − a0v)

2

λv.

Note that∑∞

v=1Ei(ζv) < ∞ and∑∞

v=1Var(ζv) < ∞ and ζv, v ≥ 1 are independent

random variables. Hence the series∑∞

v=1 ζv converges a.s. under P0 and P1 and the

Radon-Nikodym derivative p of P0 with respect to P1 is the limit of pn. The most

powerful test for testing P0 versus P1 is given by

z : p( z ) ≥ c,

or equivalently, by

z : q( z ) ≥ c⋆.

Let

fn(t) =n∑

v=1

((a1v − a0v)

λv)ϕv(t).

Then,

qn( Z ) =

∫ 1

0

fn(t)X(t)− m0(t) +m1(t)

2dt.

Under the additional condition∑∞

v=1(a1v−a0v

λv)2 < ∞, it can be shown that fn → f in

L2 - mean and the test can be written in the form z : q( z ) ≥ c⋆ where

q( z ) =

∫ 1

0

f(t)X(t)− m0(t) +m1(t)

2dt and

∫ 1

0

r(s, t)f(t)dt = m1(s)−m0(s).

Example : Consider a nonhomogenous Poisson process N(t) on [0, 1] with positive

and continuous intensity λ(t). Then

P (N(t) = k) =(∫ t

0λ(u)du)ke−

∫ t0 λ(u)du

k!, k = 0, 1, 2, . . .

53

Note that

P (N(t)−N(s) = k) =(∫ t

sλ(u)du)ke−

∫ ts λ(u)du

k!, k = 0, 1, 2, . . . .

As the observables here, we take the number of events in the interval

Iv = [v

n,v + 1

n), v = 0, 1, ., n− 1.

Let us test H0 : λ(t) = λ0(t) against H1 : λ(t) = λ1(t). Then the likelihood ratio is

given by

pn = πn−1v=0

(∫ v+1

nvn

λ1(u)du)fv exp(−

∫ v+1n

vn

λ1(u)du)

(∫ v+1

nvn

λ0(u)du)fv exp(−∫ v+1

nvn

λ0(u)du)

where fv is the number of events in the interval [ vn, v+1

n). This follows from the fact

that the Poisson process has independent increments. As n → ∞, the sequence pnconverges a.s. [P0] and [P1] to the Radon-Nikodym derivative of P1 with respect to

P0 namely

p = [ΠNk=1

λ1(tk)

λ0(tk)] exp−

∫ 1

0

(λ1(t)− λ0(t))dt

where N is the number of events that occurred in the interval [0, 1] and t0, t1, . . . , tn

are the corresponding time points of occurrence and the most powerful test for testing

H0 versus H1 is given by the critical region[Πn

k=1

λ1(tk)

λ0(tk)> c

].

Remarks: In both the examples described above, the observables are independent ran-

dom variables.

54

Lecture 10

The following theorem due to Kakutani (1948) gives a necessary and sufficient condition

for the equivalence of two product measures.

Theorem: Consider two product measures P0 = P(1)0 ×P (2)

0 × . . . and P1 = P(1)1 ×P (2)

1 ×. . . defined on some product space X = X1 × X2 × . . . with the associated product σ -

algebra. Suppose that the probability measure P(n)0 is equivalent to P

(n)1 for all n ≥ 0.

Let

ρ(P(n)0 , P

(n)1 ) =

∫Xn

√fn(xn)P0(dxn) where fn(xn) =

dP(n)1

dP(n)0

(xn).

(Note that 0 ≤ ρ(P(n)0 , P

(n)1 ) ≤ 1 and ρ = 1 if and only if fn = 1 a.s. which holds if

P(n)1 = P

(n)0 ). Then P0 and P1 are equivalent if and only if

Π∞n=1ρ(P

(n)0 , P

(n)1 ) > 0.

Remarks: Note the Hellinger distance between the probability measures P(n)0 and P

(n)1

is defined by

ρ(P(n)0 , P

(n)1 )

=

∫Xn

√√√√dP(n)1 (xn)

dP(n)0 (xn)

dP(n)0 (xn)

=

∫Xn

√dP

(n)1 (xn)dP

(n)0 (xn).

Proof: Let

fn(xn) =dP

(n)1 (xn)

dP(n)0 (xn)

and ρn = ρ(P(n)0 , P

(n)1 ).

Then

ΠNn=1ρn =

∫X

√RN(x)P0(dx) whereRN(x) = ΠN

n=1fn(xn)

since P0 is a product measure on the space X . By the martingale convergence

theorem,

RN(x)a.s.→ f(x) (say) asN → ∞

with respect to the probability measure P0. Hence, by the Fatou’s Lemma,

0 ≤ EP0( limN→∞

√RN(x)) ≤ lim inf

N→∞EP0(

√Rn(x)) = liminf

N→∞ΠN

n=1ρn

which implies that

0 ≤ EP0(√f(x)) ≤ Π∞

n=1ρn.

55

If this infinite product is zero, then f(x) = 0 a.s. [P0] so that P1 and P0 are singular

with respect to each other. Suppose the infinite product is positive. Let M > N and

consider

EP0 [RN −RM ]2 ≤ EP0 [|R12N −R

12M |]2EP0 [|R

12N +R

12M |]2

by the Cauchy-Schwartz inequality. But

EP0 [|R12N −R

12M |2] = EP0 [(1− ΠM

n=N+1

√fn)

2 RN ]

= EP0 [RN +RM − 2ΠMn=N+1

√fn RN ]

= 2(1− ΠMn=N+1ρn) → 0

as M and N → ∞. Further more

EP0 [|R12N +R

12M |2] ≤ 2EP0(RN +RM) = 4

since (x + y)2 ≤ 2(x2 + y2). Hence RN , n ≥ 1 is a Cauchy sequence in L2(Ω,F , P ).But the L2 -space is complete. Hence RN converges in L2, which implies that∫

Xf(x)P0(dx) = 1.

Hence P1 is absolutely continuous with respect to P0.

Fatou’s Lemma: Let (Ω,F , P ) be a probability space. Suppose fn ≥ g and fn → f

a.s as n → ∞. Further suppose that E(fn) < ∞ and Eg is finite. Then E(f) ≤lim infn→∞

E(fn).

Remarks: Note that the lemma holds if fn ≥ 0 and fn → f a.s. as n→ ∞.

Example : (Grenander, p.269). Let X(t), t ∈ T, T = [a, b], −∞ < a < b < ∞be a Gaussian process with continuous covariance function r(s, t) and continuous mean

function mi(t), i = 0, 1 under the probability measures P0 and P1 respectively. Let

λv be eigenvalues with corresponding eigenfunctions ϕv such that

λvϕv(s) =

∫ b

a

r(s, t)ϕv(t)dt.

Here we choose ϕv to be orthogonal and orthonormal. Let

Zv =

∫ 1

0

X(t)ϕv(t)dt.

56

Then Zv, v ≥ 1 are independent random variables. Let P(v)i be the probability measure

of Zv under Pi. Then Zv ∼ N(aiv, λv) under Pi. Let

ρ(P(v)0 , P

(v)1 ) =

∫ ∞

−∞

√P

(v)0 (dx)P

(v)1 (dx) =

∫ ∞

−∞

√√√√dP(v)1 (x)

dP(v)0 (x)

dP(v)0 (x)

=1√2πλv

∫ ∞

−∞exp−1

4((x− a0v)

2

λv+

(x− a1v)2

λv)dx

= exp− 1

8λv(a0v − a1v)

2.

Hence, by Kakutani’s theorem, P0 and P1 are equivalent if and only if

∞Π

v=1

ρ(P v0 , P

v1 ) > 0

or equivalently∞∑v=1

(a0v − a1v)2

λv<∞.

Hence we have the following result.

Theorem : Let Pi be the probability measure generated by a Gaussian process X(t), t ∈T, T = [a, b],−∞ < a < b <∞ with a continuous covariance function r(s, t) and a con-

tinuous mean function mi(t) for i = 0, 1. Then the Gaussian probability measures P0

and P1 with continuous mean functions m0(.) and m1(.) and the common continuous

covariance function r(., .) are equivalent if and only if

∞∑v=1

(a0v − a1v)2

λv<∞.

Remarks : If the Gaussian measures P0 and P1 are equivalent, then the Radon-

Nikodym derivative is given by

limn→∞

fn(x) = limn→∞

πnv=1

1√2Πλv

exp− 12λv

(zv − a1v)2

πnv=1

1√2πλv

exp− 12λv

(zv − a0v)2

= exp∞∑v=1

(a1v − aov)

λv(zv − cv)

where cv =a1v+aov

2.

Example : Test for covariance function of a gaussian process (Sagdar (1974))

Let X(t), t ∈ [a, b] be a mean zero Gaussian process under the probability measures

P1 and P2 with the covariance functions r1(s, t) and r2(s, t) respectively. We want to

57

test the hypothesis

H0 : r(s, t) = r1(s, t) against H1 : r(s, t) = r2(s, t).

Consider the integral equation

λ

∫ b

a

r2(s, t)ϕ(t)dt =

∫ b

a

r1(s, t)ϕ(t)dt.

Let λk and ϕk be the sequence of nonzero eigenvalues and the corresponding eigen-

functions respectively satisfying the above integral equation. Consider the integral equa-

tion

r2(s, t)− r1(s, t) =

∫ b

a

r1(s, u)c(u, t)du.

Then ∫ b

a

r2(s, t)− r1(s, t)ϕk(s)ds =

∫ b

a

ϕk(s)

∫ b

a

r1(s, u)c(u, t)du

ds

=

∫ b

a

∫ b

a

r1(s, u)ϕk(s)ds

c(u, t)du

= λk

∫ b

a

∫ b

a

r2(s, u)ϕk(s)ds

c(u, t)du.

Let

gk(u) =

∫ b

a

r2(s, u)ϕk(s)ds.

Hence

(1− λk)

∫ b

a

r2(s, t)ϕk(s)ds = λk

∫ b

a

gk(u)c(u, t)du

which implies that

(1− λk)gk(t) = λk

∫ b

a

gk(u)c(u, t)du.

Hence gk(t) is an eigenfunction of the kernel c(u, t). Now∫ b

a

ϕk(s)gk(s)ds =

∫ b

a

ϕk(s)

∫ b

a

r2(s, t)ϕk(t)dt

ds

=

∫ b

a

∫ b

a

ϕk(s)ϕk(t)r2(s, t)ds dt = ak = 0

for some constant ak and, for k = j,∫ b

a

ϕj(s)gk(s)ds =

∫ b

a

ϕj(s)

∫ b

a

r2(s, t)ϕk(t)dt

ds

=

∫ b

a

∫ b

a

r2(s, t)ϕj(s)ϕk(t)dt ds

= 0.

58

Let us normalize the bi-orthogonal system ϕk and gk so that ak = 1, k ≥ 1. Then

c(s, t) =∞∑k=1

1− λkλk

ϕk(s)gk(t).

Define

Zk =

∫ b

a

X(t)ϕk(t)dt.

Then Zk ∼ N(0, λk), k ≥ 1 are i.i.d. under P1 and Zk ∼ N(0, 1) i.i.d. under P2.

Hence the likelihood ratio, given Z1, . . . , Zn, is

Ln = (λ1 . . . λn)12 e

12

∑nk=1

(1−λk)

λkZ2k .

Suppose that∞∑k=1

(1− λk)2

λk<∞.

Then the probability measures P1 and P2 are equivalent and the best test for H0

against H1 is of the form∞∑k=1

1− λkλk

Z2k ≥ u.

Let

η(t) =

∫ b

a

c(s, t)X(s)ds.

Suppose there exists a solution ζ(s) such that∫ b

a

r2(s, t)ζ(s)ds = η(t).

Then ∫ b

a

X(s)ζ(s)ds =∞∑k=1

1− λkλk

Z2k

and the best critical region for testing the hypothesis H0 against H1 is given by∫ b

a

X(s)ζ(s)ds ≥ u.

The following result due to Baxter (1956) gives sufficient conditions for checking

singularity of two gaussian measures.

Theorem : Let X(t), t ∈ [0, 1] be a Gaussian process with mean function m(t) with

bounded derivatives. Let the covariance function r(s, t) of the process be continuous

with uniformly bounded second partial derivatives for s = t. Let

f(t) = D−(t)−D+(t)

59

where

D−(t) = lims↑t

r(t, t)− r(s, t)

t− s

and

D+(t) = lims↓t

r(t, t)− r(s, t)

t− s.

Then

limn→∞

n∑k=1

[X(

k

2n)−X(

k − 1

2n)

]2=

∫ 1

0

f(t)dt a.s.

As a consequence of the above theorem, we have the following result.

Corollary : Let X(t), t ∈ [0, 1] be a Gaussian process under the probability measures

P0 and P1 for which the condition of the theorem given above hold. Define f0 and f1

as before. If ∫ 1

0

f0(t)dt =∫ 1

0

f1(t)dt,

then the probability measures P0 and P1 are singular with respect to each other.

60

Lecture 11

Stochastic Integrals and Stochastic Differential Equations

Let W (t), t ≥ 0 be the standard Wiener process, that is, the process W (t), t ≥ 0is a Gaussian process with (i) W (0) = 0 (ii) W (t)−W (s) ∼ N(0, |t− s|) and (iii) the

increments W (t1) −W (t2) and W (t4) −W (t3) are independent if 0 ≤ t1 < t2 ≤ t3 <

t4 <∞.

Remarks : (i) A Wiener process has a version which has continuous sample paths almost

surely.

(ii) A Wiener process has unbounded variation on any finite interval almost surely.

(iii) The sample paths of a Wiener process are nowhere differentiable almost surely.

Let C[0, T ] be the space of continuous functions on [0, T ] with the associated topology

generated by the uniform metric. The Wiener process W (t), 0 ≤ t ≤ T generates a

probability measure on C[0, T ]. Let us denote it by P TW .

Theorem : Doob(1953). Let ξ(t), t ≥ 0 be a stochastic process defined on a probabil-

ity spacew (Ω,F , P ) with continuous sample paths almost surely and Ft be a family

of sub- σ -algebras of F such that Ft ⊂ Fs if t ≤ s. Suppose that

(i) for all t ≥ 0, ξ(t) is Ft -measurable

(ii) E[ξ(t + h) − ξ(t)|Ft] = 0 a.s. for all t ≥ 0, h ≥ 0 that is ξ(t),Ft, t ≥ 0 is a

martingale; and

(iii) E[(ξ(t+ h)− ξ(t))2|Ft)] = h a.s. for all t ≥ 0 and h ≥ 0. Then ξ(t), t ≥ 0 is a

standard Wiener process.

Stochastic integral

Let (Ω,F , P ) be a probability space. We want to define a stochastic integral∫ T

0

f(t)dW (t)

for a suitable class of random functions f(t), t ≥ 0 with respect to the Wiener process

W (t), t ≥ 0. The integral cannot be defined in the Lebesgue-Stieljes sense since the

Wiener process W (t), t ≥ 0 is of unbounded variation a.s. on any finite interval [0, T ].

Let Ft be a family of sub σ -algebras of F satisfying

i) t1 < t2 ⇒ Ft1 ⊂ Ft2

ii) W (t) is, Ft -measurable, and

iii) W (t+ s)−W (t) is independent of Ft for any t and for every s ≥ 0.

61

Let H[0, T ] be the class of all random functions f(t), 0 ≤ t ≤ T such that f(t) is

Ft -measurable for 0 ≤ t ≤ T and∫ T

0

f 2(t)dt <∞ a.s.

Case (i) Suppose f ∈ H[0, T ] and f is a step function, that is, there exists a partition

0 = t0 < t1 < · · · < tm = T

such that

f(t) = f(ti) for ti ≤ t < ti+1 for 0 ≤ i ≤ m− 1

= f(tm−1) for tm−1 ≤ t ≤ tm.

Then define ∫ T

0

f(t)dW (t) =m−1∑k=0

f(tk)[W (tk+1)−W (tk)].

Case (ii) Consider the class of f ∈ H[0, T ] for which∫ T

0

E(f 2(t))dt <∞.

It can be shown that any such f can be approximated by a sequence of step functions

fn ∈ H[0, T ] such that

limn→∞

E[

∫ T

0

|f(t)− fn(t)|2dt] = 0

(Liptser and Shiryayev (1977)) Statistics of Random Processes)

We define ∫ T

0

f(t)dW (t) = limn→∞

∫ T

0

fn(t)dW (t).

Here the limit is taken in the sense of quadratic mean. One can show that the limit is

independent of the choice of sequence of step functions.

Case (iii) Let f ∈ H[0, T ]. Then there exists a sequence gn ∈ H[0, T ] such that∫ T

0

Eg2n(t)dt <∞

and ∫ T

0

[gn(t)− f(t)]2dtp→ 0 as n→ ∞.

62

Define ∫ T

0

f(t)dW (t) = limn→∞

∫ T

0

gn(t)dW (t)

where the limit is in the sense of probability. It can be proved that the limit will be

independent of the choice of gn.

Properties:

(i) Suppose f1, f2 ∈ H[0, T ] and α1 and α2 are random variables such that α1f1 +

α2f2 ∈ H[0, T ]. Then∫ T

0

[α1f1(t) + α2f2(t)]dW (t) = α1

∫ T

0

f1(t)dW (t) + α2

∫ T

0

f2(t)dW (t).

(ii) Let f ∈ H[0, T ] for which∫ T

0Ef 2(t)dt <∞. Then

E[

∫ T

0

f(t)dW (t)] = 0, and E[

∫ T

0

f(t)dW (t)]2 =

∫ T

0

E[f 2(t)]dt.

(iii) Let f ∈ H[0, T ]. Then, for any ε > 0 and δ > 0,

P|∫ T

0

f(t)dW (t)| > ε ≤ P∫ T

0

f 2(t)dt > δ+ δ

ε2.

(iv) Let f ∈ H[0, T ] for which∫ T

0Ef 2(t)dt <∞. Then

E[

∫ β

α

f(t)dW (t)|Fα] = 0

and

E[(

∫ β

α

f(t)dW (t))2|Fα] =

∫ β

α

E(f 2(t)|Fα)dt

whenever 0 ≤ α < β ≤ T.

Here∫ β

αf(t)dW (t) is defined to be

∫ T

0χ (t)[α,β]

f(t)dW (t) where χ[α,β](t) is the indicator

function of the interval [α, β]. For f ∈ H[0, T ], define

I(t) =

∫ t

0

f(s)dW (s).

Then I(t),Ft, t ≥ 0 is a martingale and has continuous sample paths a.s.

Stochastic differential

63

Suppose the process ζ(t), 0 ≤ t ≤ T satisfies the equation

ζ(t2)− ζ(t1) =

∫ t2

t1

a(t)dt+

∫ t2

t1

b(t)dW (t), 0 ≤ t1 ≤ t2 ≤ T

where∫ T

0|a(t)|dt < ∞ a.s and

∫ T

0b2(t)dt < ∞ a.s. Then the process ζ(t) is said to

have the stochastic differential

dζ(t) = a(t)dt+ b(t)dW (t), 0 ≤ t ≤ T.

Suppose f ∈ H[0, T ] and ζ is a random variable such that P [0 ≤ ζ ≤ T ] = 1.

Then we define ∫ ζ

0

f(t)dW (t) = I(ζ)

where I(t) is as defined above. If the random variables ζ1 and ζ2 are such that P [0 ≤ζ1 ≤ ζ2 ≤ T ] = 1, then, define∫ ζ2

ζ1

f(t)dW (t) =

∫ ζ2

0

f(t)dW (t)−∫ ζ1

0

f(t)dW (t).

We choose the random variables ζ to be stopping times most often. A random variable

ζ is a stopping time with respect to the family Ft, t ≥ 0 if [ζ ≤ t] ∈ Ft for every

t ≥ 0. Examples of stopping times are

ζ1 = inft ≥ 0 : W (t) ≥ a for a fixed constant a,

and

ζ2 = inft ≥ 0 :

∫ t

0

f(u)dW (u) ≥ a for a fixed constant a.

Theorem : Suppose f ∈ H[0, T ] for every T > 0 and∫ ∞

0

f 2(s)ds = ∞ a.s.

Let

τt = infu ≥ 0 :

∫ u

0

f 2(s)ds ≥ t.

Then

ζt =

∫ τt

0

f(s)dW (s), t ≥ 0

is a Wiener process.

Central Limit Theorem :(CLT) Suppose f ∈ H[0, T ] for every T > 0 and

1

T

∫ T

0

f 2(s)dsp→ σ2 as T → ∞.

64

Then1√T

∫ T

0

f(s)dW (s)L→ N(0, σ2) as T → ∞.

Stochastic Differential Equations

Theorem : (Existence of a solution) Suppose there exists a constant K such that

(i) |a(t, x)− a(t, y)|+ |σ(t, x)− σ(t, y)| ≤ K|x− y|, x, y ∈ R,

(iii) |a(t, x)|2 + |σ(t, x)|2 ≤ K2(1 + |x|2), and

(iii) η(0) is independent of the Wiener process W (t), t ≥ 0 with Eη2(0) <∞.

Then there exists a solution η(t), 0 ≤ t ≤ T satisfying the SDE

(i) dη(t) = a(t, η(t))dt+ σ(t, η(t))dW (t), 0 ≤ t ≤ T,

(ii) η(t) is continuous a.s. on [0, T ] with η(t) = η(0) for t = 0,

(iii) sup0≤t≤T

Eη2(t) <∞, and

(iv) η(t) is unique in the sense if η1(t) and η2(t) are two such process satisfying

(i), (ii) and (iii), then

P sup0≤t≤T

|η1(t)− η2(t)| = 0 = 1.

Remarks : The coefficient a(., .) is called the drift coefficient and the coefficient σ(., .)

is called the diffusion coefficient. The problem in statistical inference for diffusion process

is the estimation of these coefficients given the process η(t), 0 ≤ t ≤ T.

Absolute Continuity of measures generated by diffusion process

Let (Ω,F , P ) be a probability space and Ft, 0 ≤ t ≤ 1 be a nondecreasing family

of σ -algebras contained in F . Suppose Wt, 0 ≤ t ≤ 1 is a standard Wiener process

such that Wt is Ft -measurable. For instance, one can choose Ft = σWs : 0 ≤ s ≤ t.Let C[0, 1] be the space of continuous functions on [0, 1] endowed with supnorm.

Let ξt, 0 ≤ t ≤ 1 be a stochastic process defined on (Ω,F , P ) such that ξt is Ft -

measurable and ξt continuous a.s. on [0, 1]. Let µξ denote the probability measure

generated by ξt, 0 ≤ t ≤ 1 on C[0, 1] and µW denote the probability measure

generated by the Wiener process on C[0, 1].

65

Let B be the Borel σ -algebra on C[0, 1] and Bt = σx : xs, s ≤ t. Let τ be the

σ -algebra of Borel sets on [0, 1] independent of the future i.e., Bt -measurable for every

0 ≤ t ≤ 1.

Definition: A continuous process ξt,Ft, 0 ≤ t ≤ 1 defined on (Ω,F , P ) is called a

process of diffusion type if there exists a τ × B -measurable function αt(x) such that

P∫ 1

0

|αt(ξ)|dt <∞ = 1

and for each 0 ≤ t ≤ 1

dξt = αt(ξ)dt+ dWt, ξ0 = 0.

Theorem : If a process is of diffusion type, then

P∫ 1

0

α2t (ξ)dt <∞ = 1

if and only if

µξ << µW

and in such a case

dµξ

dµW

= exp∫ 1

0

αt(ξ) dξt −1

2

∫ 1

0

α2t (ξ)dt a.s. [P ]

Proof: See Liptser and Shiryayev(1977). The proof depends on the Girsanov theorem

stated below.

Theorem : (Girsanov) Let Wt,Ft, P be a standard Wiener process on a probability

space (Ω,F , P ). Let the process Yt,Ft, t ≥ 0 be such that

P∫ 1

0

Y 2t dt <∞ = 1.

Let

ϕ = exp∫ 1

0

YtdWt −1

2

∫ 1

0

Y 2t dt.

If EPϕ = 1, then the process ξt,Ft,∼P, where

ξt = −∫ t

0

Ysds+Wt, 0 ≤ t ≤ 1

and the probability measure∼P is defined by

d∼P

dP= ϕ,

66

is a Wiener process relative to the probability space (Ω,Ft,∼P ).

Heuristics for computation of the Radon-Nikodym derivative for diffusion pro-

cesses

Consider the stochastic differential equation

dXt = a(Xt)dt+ σ(Xt)dWt, 0 ≤ t ≤ 1 underH0(µ0)

and

dXt = σ(Xt)dWt, 0 ≤ t ≤ 1 underH1(µ1).

Let 0 = t0 < t1 < · · · < tn < tn+1 = 1 be a subdivision of [0, 1], and discretize the

above stochastic differential equation. Then

X(tk+1)−X(tk)− a(Xtk)(tk+1 − tk) ≃ N(0, σ2(Xtk)(tk+1 − tk))

and these increments X(tk+1)−X(tk), k = 0, 1, ., n−1 can be considered independent.

Hence the log -likelihood ratio fn can be written in the form

fn = −1

2

n∑k=0

X(tk+1)−X(tk)− a(X(tk))(tk+1 − tk)2

σ2(X(tk))(tk+1 − tk)

+1

2

n∑k=0

(X(tk+1)−X(tk))2


=n∑

k=0

X(tk+1)−X(tk)a(X(tk))

σ2(X(tk))

−1

2

n∑k=0

a2(X(tk))(tk+1 − tk)2


≃∫ 1

0

a(X(t))

σ2(X(t))dX(t)− 1

2

∫ 1

0

a2(X(t))

σ2(X(t))dt.

anddµ0

dµ1

≃ exp∫ 1

0

a(X(t))

σ2(X(t))dX(t)− 1

2

∫ 1

0

a2(X(t))

σ2(X(t))dt.

67

Lecture 12

Ito’s Lemma: Let F (t, x) be a continuous function on [0, T ] × R with continuous

derivatives ∂F∂t(t, x), ∂F

∂x(t, x), ∂2F

∂x2 (t, x) and Y (t), 0 ≤ t ≤ T be a stochastic process

satisfying the stochastic differential equation (SDE)

dY (t) = a(t) dt+ b(t) dW (t), Y (0) = η, 0 ≤ t ≤ T.

Then the random process Z(t) = F (t, Y (t)) satisfies the SDE

dZ(t) = [∂F

∂t(t, Y (t)) +

∂F

∂y(t, Y (t))a(t) +

1

2

∂2F

∂y2(t, Y (t))b2(t))]dt

+∂F

∂y(t, Y (t))b(t)dW (t), Z(0) = F (0, η), 0 ≤ t ≤ T.

Heuristics : Note that

Z(t+ h)− Z(t) = F (t+ h, Y (t+ h))− F (t, Y (t))

≃ (t+ h− t)∂F

∂t(t, Y (t))

+(Y (t+ h)− Y (t)))∂F

∂y(t, Y (t))

+1

2(t+ h− t)2

∂2F

∂t2(t, Y (t))

+1

2(Y (t+ h)− Y (t))2

∂2F

∂y2(t, Y (t))

+(t+ h− t)(Y (t+ h)− Y (t))∂2F

∂t∂y(t, Y (t))

≃ h∂F

∂t+ (Y (t+ h)− Y (t))

∂F

∂y

+1

2h2∂2F

∂t2+

1

2(Y (t+ h)− Y (t))2

∂2F

∂y2

+h(Y (t+ h)− Y (t))∂2F

∂t∂y.

Note that

Y (t+ h)− Y (t) ≃ a(t)h+ b(t)[W (t+ h)−W (t)].

68

Hence

Z(t+ h)− Z(t) ≃ h∂F

∂t+ a(t)h+ b(t)(W (t+ h)−W (t))∂F

∂y

+1

2h2∂2F

∂t2+

1

2a(t)h+ b(t)(W (t+ h)−W (t))2∂

2F

∂y2

+ha(t)h+ b(t)(W (t+ h)−W (t)) ∂2F

∂t∂y

≃ h∂F∂t

+ a(t)∂F

∂y+ b(t)

W (t+ h)−W (t)

h

∂F

∂y

+1

2h2∂2F

∂t2+

1

2

a2(t)h2 + b2(t)(W (t+ h)−W (t))2

+2a(t)hb(t)W (t+ h)−W (t))

∂2F

∂y2

+ha(t)h+ b(t)(W (t+ h)−W (t)) ∂2F

∂t∂y

≃ h∂F∂t

+ a(t)∂F

∂y+ b(t)

W (t+ h)−W (t)

h

∂F

∂y

+1

2h2∂2F

∂t2+

1

2

a2(t)h2 + b2(t)h+Op(|h|

32 ) ∂2F∂y2

+ha(t)h+ b(t)Op(|h|

12 ) ∂2F

∂t∂y

since E(W (t+ h)−W (t))2 = |h| and E|W (t+ h)−W (t)| ≃ |h| 12 . Hence

Z(t+ h)− Z(t)

h≃ ∂F

∂t+ a(t)

∂F

∂y+ b(t)

W (t+ h)−W (t)

h

∂F

∂y

+1

2b2(t)

∂2F

∂y2+Op(|h|

12 ).

Therefore

dZ(t) ≃[∂F

∂t+ a(t)

∂F

∂y

]dt+ b(t)

∂F

∂ydW (t) +

1

2b2(t)

∂2F

∂y2dt.

We now consider sufficient conditions under which a solution of a SDE is an ergodic

process.

Theorem : (Maruyama and Tanaka (1957)) Consider the SDE

dX(t) = a(X(t))dt+ b(X(t))dW (t), X(0) = X0, t ≥ 0.

Define

ϕ(x) = 2

∫ x

0

a(y)

b2(y)dy.

Suppose

g =

∫ ∞

−∞

eϕ(x)

b2(x)dx <∞.

69

Define µ(x) = 1g

∫ x

−∞eϕ(y)

b2(y)dy,−∞ < x < ∞. Then the process is ergodic with stationary

distribution having distribution function µ(.) and the strong law of large numbers holds,

that is, if f is a function such that∫ ∞

−∞f(x)µ(dx) <∞,

then1

T

∫ T

0

f(X(t))dt→∫ ∞

−∞f(x)µ(dx) a.s as T → ∞.

(For proof, see Gikhman and Skorokhod: Stochastic Differential Equations).

Example : Suppose

dX(t) = −θX(t)dt+ dW (t), X(0) = X0, 0 ≤ t ≤ T

where θ ∈ Θ ⊂ R. Let F (t, x) = eθtx. Then

∂F

∂t= θeθtx,

∂F

∂x= eθt,

∂2F

∂x2= 0.

Hence, by the Ito’s Lemma,

d(F (t,X(t)) = [θeθtX(t) + eθt(−θX(t))]dt+ eθtdW (t)

= eθtdW (t).

Therefore

d(eθtX(t)) = eθtdW (t)

which implies that

eθtX(t)−X(0) =

∫ t

0

eθsdW (s)

or equivalently

(⋆)X(t) =

∫ t

0

e−θ(t−s)dW (s) +X(0)e−θt.

If θ > 0, then the process X(t) is ergodic by the above theorem (Maruyama and

Tanaka, Mem. Fac. Kyushu Uni. 11 (1957) 117-141. Some properties of one-dimensional

diffusion processes) and the ergodic theorem holds i.e., for any measurable function f

integrable with respect to the stationary measure µ,

limT→∞

1

T

∫ T

0

f(X(t))dt =

∫ ∞

−∞f(x)µ(dx) a.s.

70

Suppose the process X(t), 0 ≤ t ≤ T is observed. Let Pθ be the probability measure

generated by the process on C[0, T ] and PW be the measure generated by the Wiener

process. Then

LT (θ) ≡dPθ

dPW

= exp

∫ T

0

−θX(t)dX(t)− 1

2

∫ T

0

θ2X2(t)dt

.

Note that the MLE θT of θ is given by

θT =−∫ T

0X(t)dX(t)∫ T

0X2(t)dt

= −

X2(T )−X2(0)− T

2∫ T

0X2(t)dt

.

(It can be shown that∫ T

0X(t)dX(t) = X2(T )−X2(0)−T

2by applying the Ito’s lemma to the

function F (t, x) = x2 ). Note that

VT (θ) =∂logLT (θ)

∂θ= −

∫ T

0

X(t)dX(t)− θ

∫ T

0

X2(t)dt

= −∫ T

0

X(t)[dX(t) + θX(t) dt]

= −∫ T

0

X(t)dW (t)

is the score function and Vt(θ),Ft, 0 ≤ t ≤ T is a zero mean martingale. Let θ be the

true parameter. Then

θT − θ =−∫ T

0X(t)dX(t)∫ T

0X2(t)dt

− θ

=−∫ T

0X(t)dX(t)− θ

∫ T

0X2(t)dt∫ T

0X2(t)dt

=−∫ T

0X(t)dW (t)∫ T

0X2(t)dt

Hence

θT − θ =VT (θ)

IT (θ)where IT (θ) =

∫ T

0

X2(t)dt.

In other words,

VT (θ) = IT (θ)(θT − θ).

Suppose the process is ergodic ( θ > 0 ). Then

1

T

∫ T

0

X2(t)dta.s→∫ ∞

−∞x2µ(dx) = σ2 (say)

71

where µ is the stationary distribution of the process which is the normal distribution

with mean zero and variance (2θ)−1. Hence, by the CLT for stochastic integrals, it follows

that1√T

∫ T

0

X(t)dW (t)L→ N(0, σ2), σ2 = (2θ)−1

which implies that √T (θT − θ)

L→ N(0, 2θ) as T → ∞.

Suppose θ < 0. Note that

eθtX(t)−X(0),Ft, t ≥ 0

is a zero mean martingale which is L2 -bounded. Hence, by the martingale convergence

theorem, we note that

eθtX(t)−X(0) → Z a.s as t→ ∞

for some random variables Z <∞ a.s. and

e2θtX2(t) → (Z +X(0))2 a.s as t→ ∞.

Apply an integral version of the Toeplitz lemma. We have

(⋆⋆) e2θtIt(θ) = e2θt∫ t

0

X2(s)ds→ − 1

2θ(Z +X0)

2 a.s as t→ ∞.

Hence It(θ) → ∞ a.s as t→ ∞. By the martingale central limit theorem, it follows that

I12T (θ)(θT − θ)

L→ N(0, 1) as T → ∞

Note that

(θT − θ)e−θT = (e2θT∫ T

0

X2(s)ds)−1−eθT∫ T

0

X(s)dW (s).

Check that Z ∼ N(0,− 12θ) from ( ⋆ ) on page 70 and

(−2θ)−1e−θT (θT − θ)L→ N(0, 1) as T → ∞

from ( ⋆⋆ ) since e2θtE(It(θ)) → −12θE(Z +X(0))2 as t→ ∞.

If θ = 0, then (θT − 0) = L(−

∫ T0 W (t)dW (t)∫ T0 W 2(t)dt

)= L

(− W 2(T )−T

2∫ T0 W 2(s)ds

). In this case, the

random variable θT does not have an asymptotically normal distribution.

Remarks on the structure of continuous parameter martingales: Let (Ω,F , P )be a probability space and Ft, t ≥ 0 be a right continuous nondecreasing family of

72

sub σ -algebras of F such that F0 is complete with respect to the probability measure

P. Suppose Vt,Ft, t ≥ 0 is a square integrable martingale with mean zero and that

the process Vt, t ≥ 0 has right continuous sample paths almost surely. Then Vt is

Ft -measurable, E[Vt] = 0, E[V 2t ] <∞ and E[Vt|Fs] = Vs a.s for 0 ≤ s ≤ t. Then it is

known that there exists a right continuous increasing process It, t ≥ 0 such that It is

Ft -measurable and

E[(Vt − Vs)2|Fs) = E(It − Is|Fs) a.s , 0 ≤ s ≤ t (∗)

(cf. Meyer (1962)). The process It, t ≥ 0 is the continuous analogue of the conditional

variance In =∑n

j=1E(X2j |Fj−1) for a discrete parameter square integrable martingale

Sn =∑n

j=1Xj, n ≥ 1. In analogy with the definition of In, one can formally define

It =

∫ t

0

E([dVs]2|Fs)

and this can be used as a check for computing It. Suppose there exists a procee ζt, t ≥ 0such that ζt is Ft -measurable for which

It =

∫ t

0

ζ2udu a.s. (∗∗)

Theorem 1:(SLLN) If Vt,Ft, t ≥ 0 satisfies (*) and the condition (**) holds, then

VtIt

→ 0 a.s. on [It → ∞].

Theorem 2:(Kunita and Watanabe (1967)) If Vt,Ft, t ≥ 0 satisfies (*) and has contin-

uous sample paths almost surely, then there exists a standard Wiener process Wt, t ≥ 0such that

Vt = WIt a.s , t ≥ 0.

Theorem 3: Suppose the conditions stated in Theorems 1 and 2 hold and there exists a

function mt ↑ ∞ as t→ ∞ such that

Itmt

p→ η2

where P (η2 > 0) > 0. Then

VtI−1/2t

L→ N(0, 1)

as t → ∞ and the convergence holds with respect to any probability measure µ on

(Ω,F) which is absolutely continuous with respect to the conditional probability measure

PB(.) = P (.|B) where B = [η2 > 0].

73

Lecture 13

Estimation from discrete sampling

Let us again consider the process

dX(t) = θX(t)dt+ dW (t), t ≥ 0, X0 = 0.

We have seen that the MLE of θ is given by

θT =

∫ T

0X(t)dX(t)∫ T

0X2(t)dt

.

when a continuous sample path X(t), 0 ≤ t ≤ T is available. Suppose the process

is observed at the time points 0 = t0 < t1 < . . . < tN = T say. In order to estimate

the parameter θ, one can either consider the likelihood function of the Markov chain

Xti , 0 ≤ i ≤ N and then estimate by the maximum likelihood method provided the

transition function can be explicitly computed or discretize the likelihood estimator from

the continuous sample version or apply other methods of estimation such as conditional

least squares.

Le Breton (1976)

Suppose we approximate ∫ T

0

X(t)dX(t)

byN∑i=0

(X(ti)−X(ti−1))X(ti−1)

and ∫ T

0

X2(t)dt

byN∑i=1

X2(ti+1)(ti − ti−1).

Then the estimate θT can be approximated by

∼θNT=

∑Ni=1(X(ti)−X(ti−1))X(ti−1)∑N

i=1X2(ti)(ti − ti−1)

.

Let

δN = max1≤i≤N

|ti − ti−1|.

Theorem : (Le Breton (1975)) Suppose δN → 0 as N → ∞. Then

74

(i)∼θN,T

p→ θT as N → ∞,

(ii) δ− 1

2N (

∼θN,T −θT ) = Op(1).

In general, suppose we consider the SDE

dXt = a(θ,Xt)dt+ dWt, X0 = x0, t ≥ 0.

ConsiderN∑i=1

[X(ti)−X(ti−1)− a(θ,X(ti−1))(ti − ti−1)]2

and choose θ minimizing this expression. It can be shown that the estimator so obtained

is consistent ifT

N→ 0 asN → ∞ (Dorgovcev (1976))

and asymptotically normal if

T√N

→ 0 asN → ∞ (Prakasa Rao (1983)).

Kasonga (1988) suggested the following approach. Let Uk(θ, t) be the solution of the

ordinary differential equation

dxtdt

= a(θ, xt) on [tk−1, tk], xtk−1= Xtk−1

.

Let Q(θ) =∑N

k=1 |X(tk)− Uk(θ, tk)2| with

Uk(θ, t) = X(tk−1) +

∫ t

tk−1

a(θ, Uk(θ, s))ds for tk−1 ≤ t ≤ tk

Choose θ to minimize Q(θ). Let θ⋆N,T be such an estimator.

Theorem :(Kasonga (1988)) Suppose that for every θ1 = θ2,

p− limN→∞

1

N

N∑k=1

|Uk(θ1, tk)− Uk(θ2, tk)|2 > 0

and δN = max1≤i≤N

|ti − ti−1| → 0 as N → ∞. Then θ⋆N,T

p→ θ as N → ∞ and T → ∞when θ is the true parameter.

Remarks :Consider the SDE

dXt = θXtdt+ σdWt, t ≥ 0, X0 = 0.

75

It is known that

limN→∞

2N∑i=1

[W (it

2N)−W (

(i− 1)t

2N)]2 = t a.s. (Doob (1953), p.395).

Applying this result, it can be shown that

limN→∞

2N∑i=1

[X(it

2N)−X(

(i− 1)t

2N)]2 = σ2t a.s. (Basawa and Prakasa Rao (1980), p.242).

Parametric estimation for linear SDE

Consider the SDE

dX t = θX tdt+ G dW t, t ≥ 0, X 0 = 0

where the process X t is an n -dimensional vector-valued process, θ ∈ Θ,Θ is a subset

of the space of square matrices of order n×n , G ∈ ζ where ζ is a subset of the space

of nonsingular matrices of order n×n and W t, t ≥ 0 is an n -dimensional stochastic

process with independent standard Wiener processes as its components. Let µT

θ, G be the

probability measure induced by the process X t, 0 ≤ t ≤ T on the space C([0, T ], Rn),

the space of continuous functions from [0, T ] to Rn. Using Girsanov’s theorem, it can

be shown that

dµT

θ, GdµT

0, G= exp

∫ T

0

< θX t, (G G ′)−1dX t > −1

2

∫ T

0

< θX t, (G G ′)−1θX t > dt

where < ., . > denotes the inner product in Rn and M ′ denotes the transpose of the

matrix M . Maximization of the Radon-Nikodym derivative given above with respect

to the parameter θ leads to a system of linear equations which can be solved to obtain

the MLE θTG . Furthermore, for any (θ, G ) ∈ Θ× ζ and for 0 ≤ t ≤ T,

limN→∞

2N∑i=1

(X it2−N − X (i−1)t2−N )(X it2−N − X (i−1)t2−N )′ = G G ′ a.s.

(Ref: Basawa and Prakasa Rao (1980), p.212).

Remarks : If the true value θ0 of the parameter θ is a real stable matrix, that is, the

eigenvalues of the matrix θ have negative real parts, then the MLE θTG is consistent

and asymptotically normal. In fact,

T 1/2(θTG − θ0)L→ K θ0 as T → ∞

76

where K θ0 = ((Kθ0ij )) is a Gaussian matrix with mean zero and covariance given by

Eθ0(Kθ0ij K

θ0kl ) = ( G G ′)ik(Q

−1θ0)jl

and Q theta0 is a positive definite matrix satisfying the relation

θ0 Q θ0 + Q θ0θ′ = −G G ′.

Remarks : Most of the results discussed above can be extended to stochastic differential

equations of the type

dX t = θA(t, x )dt+ G dW t, t ≥ 0, X 0 = 0.

Sequential estimation for linear SDE

Consider a SDE of the form

dξ(t) = λ A(t, ξ)dt+ dWt, t ≥ 0, ξ0 = 0,

where the unknown parameter is λ,−∞ < λ < ∞ and A(t, ξ) is Ft -measurable for

every t ≥ 0. Further suppose that, for every x(.) ∈ C[0,∞), x(0) = 0, there exists

ϵ = ϵ(x) > 0 such that ∫ ϵ(x)

0

A2(t, ξ)dt <∞

and for every λ and for every t ≥ 0,

Pλ∫ t

0

A2(t, ξ)dt <∞ = 1.(∗)

Here Pλ is the probability measure generated by the process ξt, t ≥ 0 when λ is the

true parameter. The measures Pλ and P0 are equivalent under the condition (*). Note

that the probability measure P0 is the Wiener measure. Let P tλ denote the probability

measure generated by the process ξ(u), 0 ≤ u ≤ t over the space C[0, t]. Observe that

dP tλ

dP t0

= expλ∫ t

0

A(s, ξ)dξs −1

2λ2∫ t

0

A2(s, ξ)ds.

It is now easy to check that the MLE of the parameter λ, given the observation ξ(s), 0 ≤s ≤ T, is

λT (ξ) =

∫ T

0A(s, ξ)dξs∫ T

0A2(s, ξ)ds

.

77

Observe that

Eλ[λT (ξ)] = Eλ[

∫ T

0A(s, ξ)dξs∫ T

0A2(s, ξ)ds

]

= Eλ[λ∫ T

0A2(s, ξ)ds+

∫ T

0A(s, ξ)dWs∫ T

0A2(s, ξ)ds

]

= λ+ Eλ[

∫ T

0A(s, ξ)dWs∫ T

0A2(s, ξ)ds

].

Suppose that

Pλ∫ ∞

0

A2(s, ξ)ds = ∞ = 1,−∞ < λ <∞.

For any H ≥ 0, define

τ(H) = inft ≥ 0 :

∫ t

0

A2(s, ξ)ds = H.

Define

λ(H) = λτ(H) =

∫ τ(H)

0A(s, ξ)dξs∫ τ(H)

0A2(s, ξ)ds

=1

H

∫ τ(H)

0

A(s, ξ)dξs.

The estimator λ(H) is called a sequential maximum likelihood estimator of the parameter

λ. Note that

λ(H) =1

H

∫ τ(H)

0

A(s, ξ)dξs

=1

Hλ∫ τ(H)

0

A2(s, ξ)ds+

∫ τ(H)

0

A(s, ξ)dWs

= λ+1

H

∫ τ(H)

0

A(s, ξ)dWs.

Hence the distribution of the estimator λ(H) is N(λ, 1H) from the properties of stochas-

tic integrals with respect to a standard Wiener process.

Cramer-Rao inequality

Let us consider a sequential plan (τ(ξ), λτ (ξ)) for estimating a function h(λ) such

that

Eλ[λτ (ξ)] = h(λ).

78

Note that τ(ξ) is the stopping time of the sequential plan (τ(ξ), λτ (ξ)) . Suppose that

h(λ) is differentiable and that differentiation with respect to λ under the expectation

operator is permissible in the above equation. Further suppose that

Eλ∫ τ(ξ)

0

A2(t, ξ)dt <∞.

Theorem:(Cramer-Rao inequality) Under the conditions stated above

V arλ(λτ (ξ)) ≥[h′(λ)]2

Eλ∫ τ(ξ)

0A2(t, ξ)dt

.

Proof: Let Pλ be the probability measure generated by the process ξ(s), 0 ≤ s ≤ tcorresponding to the parameter λ and P

τ(ξ)λ be the probabilty measure generated by

the process ξ(t), 0 ≤ t ≤ τ(ξ). Applying Sudakov’s lemma (cf. Basawa and Prakasa

Rao (1980)), it can be shown that

dPτ(ξ)λ

dPτ(ξ)λ0

exists and

dPτ(ξ)λ

dPτ(ξ)λ0

= exp(λ− λ0)

∫ τ(ξ)

0

A(t, ξ)dξ(t)− 1

2(λ2 − λ20)

∫ τ(ξ)

0

A2(t, ξ)dt.

Note that Eλ[λτ (ξ)] = h(λ) and hence∫λτ (ξ)dPλ = h(λ)

which can also be written in the form∫λτ (ξ)

dPλ

dPλ0

dPλ0 = h(λ).

Differentiating under the integral sign with respect to λ , we get that∫λτ (ξ)

d

dλ(dPλ

dPλ0

)dPλ0 = h′(λ).

Hence ∫λτ (ξ)(

dPλ

dPλ0

)(

∫ τ(ξ)

0

A(t, ξ)dξ(t)− λ

∫ τ(ξ)

0

A2(t, ξ)dt)dPλ0 = h′(λ).

Therefore

Eλ[λτ (ξ)(

∫ τ(ξ)

0


∫ τ(ξ)

0

A2(t, ξ)dt)] = h′(λ).

79

Observe that ∫ T

0

A(t, ξ)dξ(t) = λ

∫ T

0

A2(t, ξ)dt+

∫ T

0

A(t, ξ)dWt

and hence

Eλ[

∫ τ(ξ)

0


∫ τ(ξ)

0

A2(t, ξ)dt] = Eλ[

∫ τ(ξ)

0

A(t, ξ)dWt] = 0.

The above relations imply that

Eλ[(λτ (ξ)− h(λ))(

∫ τ(ξ)

0

A(t, ξ)dWt)] = h′(λ).

Applying the Cauchy-Schwarz inequality, we have

[h′(λ]2 ≤ V ar(λτ (ξ))Eλ[

∫ τ(ξ)

0

A(t, ξ)dWt]2

= V ar(λτ (ξ))Eλ[

∫ τ(ξ)

0

A2(t, ξ)dt].

Hence

V ar(λτ (ξ)) ≥[h′(λ]2

Eλ[∫ τ(ξ)

0A2(t, ξ)dt]

.

In particular, if h(λ) ≡ λ , then

V ar(λτ (ξ)) ≥1

Eλ[∫ τ(ξ)

0A2(t, ξ)dt]

.

Definition: A sequential plan (τ(ξ), λτ (ξ)) is said to be efficient if the variance of the

corresponding estimator λτ (ξ) attains the Cramer-Rao lower bound.

Observe that, for the sequential plan defined by the stopping time τ(H) ,

V ar(λτ(H)) =1

H

which is the Cramer-Rao lowerbound for the variance of unbiased estimators of λ. Hence

the estimator λτ(H) is an efficient estimator for estimating the parameter λ.

MLE for the drift parameter for the diffusion process

Suppose (Ω,F , P ) is a probability space and Xt, t ≥ 0 be a stochastic process

defined on it satisfying the SDE

dXt = a(t,Xt, θ)dt+ dWt, X0 = 0, t ≥ 0, θ ∈ Θ ⊂ R.

80

The problem is to estimate the parameter θ based on the observation Xs, 0 ≤ s ≤T. We assume that (A0)Pθ1 = Pθ2 (Identifiability condition) whenever θ1 = θ2 ∈ Θ,

and (A1)Pθ(∫ T

0a2(t,Xt, θ)dt <∞) = 1, θ ∈ Θ, T ≥ 0.

Let P Tθ denote the probability measure generated by the process Xs, 0 ≤ s ≤

T and P TW denote the probability measure generated by the standard Wiener process

Ws, 0 ≤ s ≤ T. Then

dP Tθ

dP TW

= exp∫ T

0

a(t,Xt, θ)dXt −1

2

∫ T

0

a2(t,Xt, θ)dt.

A maximum likelihood estimator (MLE) θT (XT ) maximizes the likelihood function

LT (θ) =dPT

θ

dPTW. If Θ is compact and LT (θ) is continous in θ, then there exists measurable

MLE (cf. Schemetterer (1974), Prakasa Rao (1987)). We assume the existence of a

measurable MLE in the following discussion. Let

F (t, x, θ) =

∫ x

0

a(t, y, θ)dy. (1)

(A2) (i)Suppose the function a(t, x, θ) is continuous in x and the function F (t, x, θ) is

jointly continuous in (t, x) with partial derivatives Fx, Ft, and Fxx.

Observe that Fx = a and Fxx = ax. Applying the Ito’s lemma, we have

dF (t,Xt, θ) = [Ft(t,Xt, θ) +1

2ax(t,Xt, θ)]dt+ a(t,Xt, θ)dXt.

Hence ∫ T

0

a(t,Xt, θ)dXt = F (T,XT , θ)−∫ T

0

f(t,Xt, θ)dt

where

f(t, x, θ) = Ft(t, x, θ) +1

2ax(t, x, θ).

Therefore

ℓT (θ) = logLT (θ) = F (t,Xt, θ)−∫ T

0

[f(t,Xt, θ) +1

2a2(t,Xt, θ)]dt. (2)

(A2) (ii) Suppose the function ℓT (θ) = logLT (θ) is twice differentiable in θ.

Observe that

ℓ′T (θ) =

∫ T

0

a′(t,Xt, θ)(dXt − a(t,Xt, θ)dt)

=

∫ T

0

a′(t,Xt, θ)dWθt

where

W θt = Xt −

∫ t

0

a(s,Xs, θ)ds

81

is a Wiener process under the parameter θ. . Similarly

ℓ′′T (θ) =

∫ T

0

a′′dXt −∫ T

0

(aa′′ + (a′)2)dt

=

∫ T

0

a′′(dXt − adt)−∫ T

0

(a′)2dt

=

∫ T

0

a′′dW θt −

∫ T

0

(a′)2dt.

(A2) (iii)Suppose that the function ℓ′′T (θ) is continuous in a neighbourhood Vθ of θ for

every θ ∈ Θ and

Eθ[

∫ T

0

(a′(t,Xt, θ))2dt] <∞, Eθ[

∫ T

0

(a′′(t,Xt, θ))2dt] <∞.

Further suppose that

(A3) for every θ, there exists a neighbourhood Vθ of θ in Θ such that

Pθ(

∫ ∞

0

(a(t,Xt, θ′)− a(t,Xt, θ)

2dt = ∞) = 1

for every θ′ ∈ Vθ − θ.Let

IT (θ) =

∫ T

0

(a′(t,Xt, θ))2dt

and

YT (θ) =

∫ T

0

(a′′(t,Xt, θ))2dt.

(A4) Suppose that there exista a function mt ↑ ∞ such that

IT (θ)

mT

p→ η2(θ)

andYT (θ)

mT

p→ ζ2(θ)

under Pθ -measure as T → ∞ where Pθ(η2(θ) > 0) > 0.

Theorem : Under the conditions (A0) − (A4) stated above, there exists a solution of

the likelihood equation ℓ′T (θ) = 0 which is strongly consistent as T → ∞. Furthermore

(IT (θ))1/2(θT − θ)

L→ N(0, 1)

as T → ∞ conditionally with respect to any probability measure µ << PAθ where

PAθ (.) = Pθ(.|A) and A = [η2(θ) > 0].

82

Proof : For detailed proof, see Prakasa Rao (1999),p.16. We sketch it. Let δ > 0 such

that θ and θ + δ belong to Θ. Then

ℓT (θ + δ)− ℓT (θ) = [

∫ T

0

a(t,Xt, θ + δ)dXt −1

2

∫ T

0

a2(t,Xt, θ + δ)dt]

−[

∫ T

0

a(t,Xt, θ)dXt −1

2

∫ T

0

a2(t,Xt, θ)dt]

=

∫ T

0

[a(t,Xt, θ + δ)− a(t,Xt, θ)]dXt

−1

2

∫ T

0

[a2(t,Xt, θ + δ)− a2(t,Xt, θ)]dt

=

∫ T

0

Aθ+δt dXt −

1

2

∫ T

0

[a2(t,Xt, θ + δ)− a2(t,Xt, θ)]dt

where

Aθ+δt = a(t,Xt, θ + δ)− a(t,Xt, θ).

It is easy to check that

ℓT (θ + δ)− ℓT (θ) =

∫ T

0

Aθ+δt dW θ

t − 1

2

∫ T

0

(Aθ+δt )2dt.

Let

KT =

∫ T

0

(Aθ+δt )2dt.

ThenℓT (θ + δ)− ℓT (θ)

KT

=

∫ T

0Aθ+δ

t dW θT∫ T

0(Aθ+δ

t )2dt− 1

2.

Applying Lepingle’s strong law of large numbers (cf. Prakasa Rao(1999)), it follows that∫ T

0Aθ+δ

t dW θT∫ T

0(Aθ+δ

t )2dt

a.s.→ 0

as T → ∞ since ∫ T

0

(Aθ+δt )2dt

a.s.→ ∞

as T → ∞ by the condition ( A3 ). Hence, for every θ and δ and for almost every

ω ∈ Ω, , there exists T0 depending on θ, δ and ω such that for every T ≥ T0,

ℓT (θ + δ) < ℓT (θ). (3)

Similarly we can show that

ℓT (θ − δ) < ℓT (θ). (4)

83

for sufficiently large T. Since the function ℓT (θ) is continuous on the closed interval

[θ− δ, θ+ δ], it has a local maximum and the maximum is attained at some point θT in

the open interval (θ−δ, θ+δ) in view of inequalities (3) and (4). Furthermore ℓ′T (θT ) = 0.

This proves that

θTa.s.→ θ

as T → ∞ under Pθ -measure. This proves the existence and strong consistency of a

maximum likelihood estimator.

Applying Taylor’s expansion to the function ℓ′T (θ) at θT , , we get that

ℓ′T (θ) = ℓ′T (θT ) + (θ − θT )ℓ′′T (θ

∗T )

where |θ∗T − θ| ≤ |θT − θ|. Hence

ℓ′T (θ)√IT (θ)

=(θ − θT )ℓ

′′T (θ

∗T )√

IT (θ)

≃ (θ − θT )ℓ′′T (θ)√

IT (θ)

as T → ∞ since θ∗Ta.s.→ θ and IT (θ)

a.s→ ∞ as T → ∞ and ℓ′′T (θ) is continuous. Let

FT be the sub- σ -algebra generated by the process Xs, 0 ≤ s ≤ T. Note that the

process ℓ′T (θ),FT , T ≥ 0 is a martingale and, by the earlier remarks,

ℓ′T (θ)√IT (θ)

L→ N(0, 1)

as T → ∞ under PAθ -measure. Hence

(θ − θT )ℓ′′T (θ)√

IT (θ)

L→ N(0, 1).

Observe that

ℓ′′T (θ)

IT (θ)=

∫ T

0a′′(t,Xt, θ)dW

θt − 1

2

∫ T

0(a′(t,Xt, θ))

2dt

IT (θ)

a.s→ −1

as T → ∞ under PAθ -measure (cf. Feigin (1976)). In particular,√

IT (θ)(θT − θ)L→ N(0, 1)

as T → ∞ under PAθ -measure. This result proves the asymptotic normality of the MLE

under random norming.

84

Example : Consider the SDE

dXt = θtXtdt+ dWt, X0 = 0, t ≥ 0.

Check that

Xt = eθt2/2

∫ t

0

e−θs2/2dWs, t ≥ 0

and the MLE is strongly consistent and asymptotically normal after random normaliza-

tion.

Remarks: For the vector parameter case, see Prakasa Rao(1999), p. 20.

In order to find ”efficient” estimators as in the classical problems of estimation in the

finite dimensional case, we now obtain analogue of Cramer-Rao lower bound and discuss

the concept of locally asymptotically normal (LAN) families of distributions.

Cramer-Rao lower bound

Consider the SDE

dX(t) = a(θ, t,X)dt+ dWt, X(0) = X0, t ≥ 0, θ ∈ Θ ⊂ R.

Suppose that

Pθ(

∫ T

0

a2(θ, t,X)dt <∞) = 1

and a(θ, t, x) is differentiable with respect to θ. Let

I(θ1, θ2) = Eθ1 [

∫ T

0

a2θ(θ2, t, X)dt].

Here aθ denotes denote the partial derivative of the function a(θ, t, x) with respect to

θ.

Theorem: Suppose that I(θ1, θ2) > 0 for all θ1, θ2 ∈ Θ and I(θ, θ) is continuous in

θ. . Let θ∗T be any estimator of the parameter θ, based on the observation X T =

Xs, 0 ≤ s ≤ T, such that Eθ(θ∗T − θ)2 is bounded over compact subsets of Θ. . Let

b(θ) = Eθ(θ∗T − θ). Then b(θ) is differentiable almost everywhere and

Eθ(θ∗T − θ)2 ≥ (1 + b′(θ))2

I(θ, θ)+ b2(θ)

where b′(θ) denotes the derivative of b(θ) with respect to θ whenever it exists.

For proof, see Prakasa Rao (1999), p. 28.

Local Asymptotic Normality (LAN):

85

Let (Ω,F , P ) be a probability space and for ϵ ∈ (0, 1], let F (ϵ) = F (ϵ)t , 0 ≤ t ≤ 1

be a fitration, that is, a family of nondecreasing family of sub σ -algebras contained in F .Let X ϵ = Xϵ(t), 0 ≤ t ≤ Tϵ be a diffusion process satisfying the SDE

dXϵ(t) = aϵ(θ, t, X ϵ)dt+ dWϵ(t), Xϵ(0) = ηϵ, 0 ≤ t ≤ Tϵ

where ηϵ is an F (ϵ)0 -measurable random variable and θ ∈ Θ open in R. Let P

(ϵ)θ be

the probability measure generated by the process X ϵ. Suppose that

P(ϵ)θ (

∫ Tϵ

0

a2ϵ(θ, t,Xϵ)dt <∞) = 1, θ ∈ Θ, 0 < ϵ ≤ 1.

Let θ0 ∈ Θ. Suppose further ϕϵ(θ0) → 0 as ϵ → 0. It can be shown that the

measures P(ϵ)θ0+ϕϵ(θ0)u

and P(ϵ)θ0

are absolutely continous with respect to each other in a

neighbourhood of θ0. Let

Zϵ(u) =dP

(ϵ)θ0+ϕϵ(θ0)u

dP(ϵ)θ0

(X ϵ).

Definition : A family of probability measures P (ϵ)θ , θ ∈ Θ is said to be locally asymp-

totically normal (LAN) at θ0 ∈ Θ if

logZϵ(u) = u∆ϵ(θ0, X ϵ)−1

2u2 + ψϵ(θ0, u, X ϵ)

where

∆ϵ(θ0, X ϵ)L→ N(0, 1)

and

ψϵ(θ0, u, X ϵ)p→ 0

as ϵ→ 0 under P(ϵ)θ0

-measure.

Remarks : The function ϕϵ(θ0) is called the normalization. Local asymptotic normality

of the family of probability measures P (ϵ)θ , θ ∈ Θ implies that the likelihood ratio

process

dP(ϵ)θ

dP(ϵ)θ0

(X ϵ)

has the properties of the process

Z(u) = expuζ − 1

2u2,−∞ < u <∞

where ζ is N(0,1) whenever θ is close to θ0 and for ϵ small. Typically, the normaliza-

tion ϕϵ(θ) = (Iϵ(θ))−1/2 where Iϵ(θ) is the Fisher information. Under some conditions, it

86

can be shown that the family of probability measures P (ϵ)θ , θ ∈ Θ is LAN (cf. Theorem

2.2.17, p.32, Prakasa Rao (1999)).

Hajek-Lecam inequality

Suppose the family of probability measures P (ϵ)θ , θ ∈ Θ is LAN with normalizing

function ϕϵ(θ). Let ℓ(.) be a symmetric function, continuous at zero, such that the

set x : ℓ(x) < c is convex for all c > 0. Further suppose that for any h > 0,

ℓ(x) < ehx2

for |x| large. Then, for every γ ∈ (0, 1),

lim infϵ→0

infθ∗ϵ

sup|θ−y|<ϕϵγ(θ)Ey[ℓ(

θ∗ϵ − y

ϕϵ(θ))] ≥ E[ℓ(ξ)] (∗)

where the random variable ξ has the standard normal distribution.

For a proof of this result, see Kutoyants (1984). If ℓ(x) = x2, then the inequality (*)

reduces to

lim infϵ→0

infθ∗ϵ

sup|θ−y|<ϕϵγ(θ)Ey[

θ∗ϵ − y

ϕϵ(θ)]2 ≥ 1.

Definition: An estimator θ∗ϵ is said to be asymptotically efficient if

limϵ→0sup|θ−y|<ϕϵγ(θ)Ey[

θ∗ϵ − y

ϕϵ(θ)]2 = 1.

Example : Consider the SDE

dX(t) = −θX(t)dt+ dWt, X(0) = 0, θ ∈ (α, β), α > 0.

Then the family of probability measures P Tθ , θ ∈ Θ is LAN with the normalizing

function ϕT (θ) =√2θT−1/2 as T → ∞. If θ ∈ (α, β), β < 0, then the family of

probability measures P Tθ , θ ∈ Θ is LAN as T → ∞ with the normalizing function

ϕT (θ) = 2θeθT .

Parametric estimation for diffusion type processes from sampled data

Consider the SDE

dXt = a(Xt, θ)dt+ σ(Xt)dWt, t ≥ 0.

We now describe some methods of estimation of the parameter when the process Xt

is sampled at discret time points at equal time intervals. For detailed exposition, see

Prakasa Rao (1999).

87

Estimation based on discretization by the Euler method

Suppose the drift and diffusion are constant over the interval [t, t+∆t). Then

Xt+∆t −Xt = a(Xt, θ)∆t+ σ(Xt)(Wt+∆t −Wt).

This discretized process is considered as a local approximation to the original process.

Note that

σ(Xt)(Wt+∆t −Wt)

has normal distribution with mean zero and variance σ2(Xt)∆t and the transition density

function of the discretized process is

p(Xt+∆t|Xt = xt) = (2πσ2(xt)∆t)−1/2 exp−(Xt+∆t − xt − a(xt, θ)∆t)

2

2σ2(xt)∆t.

Suppose we observe the process Xt, t ≥ 0 at the points t+ i∆t, 0 ≤ i ≤ n. Let Zi =

Xt+i∆t. Then the joint probability density function of the random vector (Z0, . . . , ZN)

is

p(z0, . . . , zN) = ΠNi=1p(zi|zi−1)p(z0)

and the parameters θ, σ can be estimated by the method of maximum likelihood.

Estimation based on local linearization method of Shoji-Ozaki

Consider the SDE

dxt = a(Xt)dt+ σdWt, t ≥ 0.

Suppose the diffusion parameter σ is a constant and the drift function a(.) is possibly

nonlinear and differentiable. We try to approximate the above SDE by a linear SDE.

Consider the ordinary differential equation

dxtdt

= a(xt).

Suppose the function xt is differentiable twice with respect to t. Then

d2xtdt2

= a′(xt)dxtdt.

Suppose a′(x) is constant over the interval [t, t+∆t). Let u ∈ [t, t+∆t). . Then

dxtdt

|t=u =dxtdtea

′(xt)(u−t)

and

xt+∆t = xt +a(xt)

a′(xt)[ea

′(xt)∆t − 1].

88

Suppose we approximate the drift function a(x) by a linear function Lx on [t, t+∆t).

Then we have the SDE

dXt = LXtdt+ σdWt

where L is a constant on the interval [t, t+∆t). Apllying Ito’s lemma, we get that

Xt+∆t = XteL∆t + σ

∫ t+∆t

t

eL(t+∆t−u)dWu. (∗)

Let us choose L such that the conditional mean E[Xt+∆t|Xt] coincides with the mean

of the process given by (*). Hence

XteL∆t = Xt +

a(Xt)

a′(Xt)[ea

′(Xt)∆t − 1]

or

L =1

∆tlog[1 +

a(Xt)

Xta′(Xt)(ea

′(Xt)∆t − 1)].

Observe that the constant L depends on t. Denote it by Lt. The discretized process

by the local linearization method is as follows:

Xt+∆t = XteLt∆t + σ

∫ t+∆t

t

eLt(t+∆t−u)dWu.

Since the random variable ∫ t+∆t

t

eLt(t+∆t−u)dWu

has the normal distribution with mean zero and variance e2Lt∆t−12Lt

, we can now write the

transition density function of the discretized observations Yi = Xt+i∆t given Yi−1 for

0 ≤ i ≤ N and compute the likelihood function. MLE of θ and σ can now be obtained.

Estimation via martingale estimating functions

Consider the SDE

dXt = a(Xt, θ)dt+ σ(Xt, θ)dWt, X(0) = X0, t ≥ 0, θ ∈ Θ ⊂ R.

(i)Let us first consider the case when the diffusion function σ(x, θ) does not depend

on the parameter θ. This is the case in all the earlier discussions on estimation of the drift

parameter θ. If the process Xs, 0 ≤ s ≤ t is observed completely, then the likelihood

function Lt(θ) based on the observations is

ℓt(θ) = logLt(θ)

=

∫ t

0

a(Xs, θ)

σ2(Xs)dXs −

1

2

∫ t

0

a2(Xs, θ)

σ2(Xs)ds.

89

Suppose now the process is observed at times i∆, 0 ≤ i ≤ n. We approximate the integrals

in the above expression by Riemann-type sums to obtain an approximate log-likelihood

function. It is given by

ℓn(θ) =n∑

i=1

a(X(i−1)∆, θ)

σ2(X(i−1)∆)(Xi∆ −X(i−1)∆)−

1

2

n∑i=1

a2(X(i−1)∆, θ)

σ2(X(i−1)∆)∆.

Suppose the function a(x, θ) is differentiable with respect to θ. Then

ℓ′n(θ) =n∑

i=1

a′(X(i−1)∆, θ)

σ2(X(i−1)∆)(Xi∆ −X(i−1)∆)−∆

n∑i=1

a(X(i−1)∆, θ)

σ2(X(i−1)∆)a′(X(i−1)∆, θ).

The process ℓ′n(θ) is a zero mean martingale with respect to the fitration Fi with

Fi generated by the set X0, X∆, . . . , Xi∆. Solving the equation

ℓ′n(θ) = 0,

which is called a martingale estimating equation, we can estimate the parameter θ.

Let us now consider the case when the diffusion σ(x, θ) depends on θ. Let us consider

analogue of the function ℓ′n(θ) given by

Jn(θ) =n∑

i=1

a′(X(i−1)∆, θ)

σ2(X(i−1)∆, θ)(Xi∆ −X(i−1)∆)−∆

n∑i=1

a(X(i−1)∆, θ)

σ2(Xi−1)∆, θ)a′(X(i−1)∆, θ).

This process is not a martingale with respect to the filtration Fi. Define

Gn(θ) = Jn(θ)−n∑

i=1

Eθ[Ji(θ)− Ji−1(θ)|Fi−1].

The process Gn(θ) is a zero mean martingale with respect to the fitration Fi with

Fi generated by the set X0, X∆, . . . , Xi∆. Solving the martingale estimating equation

Gn(θ) = 0

, we can estimate the parameter θ whether the function σ is a function of θ or otherwise.

90

The following list of references contains bibliographic details of some books and some

important review papers published in the area of “Statistical Inference for Stochastic

Processes” but are not cited in the text.

References:

Aalen, O.O. (1975) Statistical Inference for a family of Counting Processes, Ph.D. Thesis,

University of California, Berkeley.

Aubry, C. (1997) Estimation parametrique par la methode de la distance minimale pour

des processus de Poisson et de diffusion. Ph.D. Thesis, Universite du Maine, Le

Mans.

Andersen, P.K., Borgan, Φ. , Gill, R,D. and Keiding, N. (1993) Statistical Methods for

Counting Processes, Springer, New York.

Arato, M. (1982)Linear Stochastic Systems with Constant Coefficients; a Statistical Ap-

proach: Lecture Notes in Control and Information Sciences, 45, Springer, Berlin.

Bar-Shalom, Y. (1971) On the asymptotic properties of maximum likelihood estimate

obtained from dependent observations, J. Roy. Statist. Soc. Ser. B, 33, 72-77.

Basawa, I.V. and Prabhu, N.U. (1994) Statistical Inference in Stochastic Processes, Spe-

cial Issue of Journal of Statistical Planning and Inference, 39, No. 2, pp. 135-352.

Basawa, I.V. and Prakasa Rao, B.L.S. (1980) ”Statistical Inference for Stochastic Pro-

cesses”, Academic Press, London.

Basawa, I.V. and Prakasa Rao, B.L.S. (1980) Asymptotic inference for stochastic pro-

cesses, Stoch. Proc. Appl., 10, 221-254.

Basawa, I.V. and Scott, D.J. (1983) Asymptotic Optimal Inference for Non-ergodic Mod-

els, Lecture Notes in Statistics, 17, Springer, Heidelberg.

Baxter, G. (1956) A strong limit theorem for Gaussian processes, Proc. Amer. Math.

Soc., 7, 522-527.

Bhat, B.R. (1974) On the method of maximum likelihood for dependent observations,

J. Roy. Statist. Soc. Ser. B, 36, 48-53.

Bhat, B.R. (1996) Tests based on estimating functions, In Stochastic Processes and

Statistical Inference, Ed. B.L.S. Prakasa Rao and B.R. Bhat, New Age International,

New Delhi, pp. 20-38.

91

Billingsley, P. (1961) Statistical Inference for Markov processes, University of Chicago

Press, Chicago.

Bishwal, J.P.N. (2000) Asymptotic Theory of Estimation of the Drift Parameter in Dif-

fusion Processes, Ph.D. Thesis, Sambalpur University, Sambalpur, India.

Borwanker, J.D. Kallianpur, G. and Prakasa Rao, B.L.S. (1971) The Bernstein-von Mises

theorem for Markov Processes, Ann. Math. Statist., 42, 1241-1253.

Bose, A. and Politis, D.N. (1996) A review of the bootstrap for dependent samples, In

Stochastic Processes and Statistical Inference, Ed. B.L.S. Prakasa Rao and B.R.

Bhat, New Age International, New Delhi, pp. 73-89.

Brody, E. (1971) An elementary proof of the Gaussian dichotomy theorem, Z. Wahrs. ,

20, 217-226.

Brown, B.M. (1971) Martingale central limit theorems, Ann. Math. Statist., 42, 59-66.

Bosq, D. (1998) Nonparametric Statistics for Stochastic Processes, Lecture Notes in

Statistics, 110, Springer, New York.

Bosq, D. (2012) Statistique Mathematique et Statistique des Processus, hermes-science

publications, Lavoisier, Cachan.

Brillinger, D. (1975) Statistical inference for stationary point processes, In Stochastic

Processes and Related Topics, Ed. M.L. Puri, Academic Press, New York, pp. 55-

99.

Cox, D.R. and Lewis, P.A.W. (1966) The Statistical Analysis of Series of Events, Methuen

and Barnes and Nobel, New York.

Cressie, N. (1991) Statistics of Spatial Data, Wiley, New York.

Dalalyan, A. (2001) Estimation non-parametrique pour les processus de diffusion er-

godiques, Ph.D. Thesis, Universite du Maine, Le Mans.

Dewan, Isha and Prakasa Rao, B.L.S. (2001) Associated sequences and related inference

problems, In Handbook of Statistics: Stochastic Processes;Theory and Methods, 19,

Ed. D.N.Shanbhag and C.R.Rao , Elsevier Science B.V., Amsterdam, pp. 693-731.

Dion, J.-P. (1974) Estimation des probabilites initiales et de la moyenne d’un processus

de Galton-Watson, Ph.D. Thesis, Universite de Montreal, Montreal.

92

Dion, J.-P. and Keiding, N. (1978) Statistical inference in branching processes, In Branch-

ing Processes, Ed. A. Joffe and P. Ney, Marcel Dekker, New York, pp. 105-140.

Doob, J.L. (1953) Stochastic Processes, Wiley, New York.

Dorogovcev, A. Ja. (1976) The consistency of an estimate of a parameter of stochastic

differential equation, Theory Prob. Math. Statist., 10, 73-82.

Foutz, R. (1974) Studies in Large sample Theory, Ph.D. Thesis, Ohio State University,

Columbus.

Feigin, P.D. (1975) Maximum likelihood estimation for continuous time stochastic pro-

cesses - A Martingale approach, Ph.D. Thesis, Australian National University, Can-

berra.

Feigin, P.D. (1976) Maximum likelihood estimation for continuous time stochastic pro-

cesses, Adv. Appl. Prob., 8, 712-736.

Feldman, J. (1958) Equivalence and perpendicularity of Gaussian processes, Pacific J.

Math., 8 , 699-708, correction, ibid. 9, 1295-1296.

Feller, W. (1971) An Introduction to Probability Theory and its Applications, Vol.II, 2nd

ed., Wiley, New York.

Fleisher, I. and Kooharian, A. (1958) On the statistical treatment of stochastic processes,

Ann. Math. Statist., 29, 544-549.

Fleming, T.R. and Harrington, D.P. (1991) Counting Processes and Survival analysis,

Wiley, New York.

Gallant, A.R. and Tauchen, G. (1996) A Unified Theory of Estimation and Inference for

Nonlinear Dynamic Models, Basil Blackwell, Oxford.

Girsanov, I.V. (1960) On transforming a certain class of stochastic processes by abso-

lutely continuous substitution of measures, Theory Prob. Appl., 5, 285-301.

Grenander, U. (1950) Stochastic processes and statistical inference, Arkiv. fur Mathe-

matik, 1, 195-227.

Grenander, U. (1968) Eight lectures on statistical inference in stochastic processes, Tech.

Report No.2, Division of Appl. Math., Brown University, Providence, Rhode Island.

Grenander, U. (1981) Abstract Inference, Wiley, New York.

93

Guttorp, P. (1991) Statistical Inference for Branching Processes, Wiley, New York.

Hall, P. and Heyde, C.C. (1980) Martingale Limit Theory and its Application, Academic

Press, London.

Hajek , J. (1958) On a property of normal distribution of any stochastic processes, Chech.

Math. J., 8, 610-618.

Heyde, C.C. (1974) On estimating the variance of the offspring distribution in a simple

branching process, Adv. Appl. Probab., 6, 421-433.

Heyde, C.C. (1997) Quasi-Likelihood and its Applications: A General Approach to Op-

timal Parameter estimation, Springer, New York.

Ibragimov, I.A. (1963) A central limit theorem for a class of dependent random variables,

Theory Prob. Appl., 8, 83-89.Brown,

Jacobsen, M. (1982) Statistical Analysis of Counting Processes, Lecture Notes in Statis-

tics No. 12, Springer, New York.

Kalman, R.E. (1960) A new approach to linear filtering and prediction theory, J. Basic

Engg., 82, 35-45.

Kalman, R.E. and Bucy, R.S. (1961) New results in linear filtering and prediction theory,

J. Basic. Engg., 83, 95-108.

Kakutani, S. (1948) On the equivalence of infinite product measures, Ann. Math., 49,

214-224.b

Karhunen, K. (1947) Uber lineare methoden in der wahrscheinlichkeitsrechnung, Ann.

Acad. Sci. Finn. a1, 37, 1-79.

Karr, A. (1991) Point Processes and their Statistical Inference, Marcel Dekker, New

York.

Kasonga, R. (1988) The consistency of a nonlinear least squares estimator for diffusion

processes, Stoch. Proc. Appl. , 30, 263-275.

Klimko, L.A. and Nelson, P.I. (1978) On conditional least squares estimation for stochas-

tic processes, Ann. Statist., 6, 629-642.

Krickeberg, K. (1980) Statistical Problems on Point Processes, Banach Center Publica-

tions No. 6, pp.197-223.

94

Krickeberg, K. (1982) Processus ponctuels en statistique, In Lecture Notes Math. , 929,

Springer, Berlin, pp. 205-313.

Kuchler, U.and Sorensen, M. (1997) Exponential Families of Stochastic Processes, Springer,

New York.

Kunita, H. and Watanabe, S. (1967) On square integrable martingales, Nagoya Math.

J., 30, 209-245.

Kutoyants, Yu. A. (1984) Parameter Estimation for Stochastic Processes, Translated

and Ed. B.L.S. Prakasa Rao, Heldermann, Berlin.

Kutoyants, Yu. A. (1994) Identification of Dynamical Systems with Small Noise, Kluwer,

Dordrecht.

Kutoyants, Yu. A. (1998) Statistical Inference for Spatial Poisson Processes, Lecture

Notes in Statistics, 134,Springer, New York.

Kutoyants, Yu. A. (2004) Statistical Inference for Ergodic Diffusion Processes, Springer,

London.

Le Breton, A. (1976) On continuous and discrete sampling for parameter estimation in

diffusion type process, In Mathematical Programming Studies, 5, 124-144.

Lewis, P.A.W. (1972) Stochastic Point Processes: Statistical Analysis, Theory and Ap-

plications, Wiley, New York.

Linkov, Y.N. (2001) Asymptotic Methods in the Statistics of Stochastic Processes, Amer-

ican Mathematical Society, Providence, Rhode Island.

Liptser, R.S. and Shiryayev, A.N. (1977) Statistics of Random Processes: General The-

ory, Springer, New York.

Liptser, R.S. and Shiryayev, A.N. (1978) Statistics of Random Processes: Applications,

Springer, New York.

Loeve, M. (1946) Fonctions aleatoires de second ordre, C.R. Acad. Sci. Paris, 222.

Loeve, M. (1977) Probability Theory I, 4th ed., Springer, Berlin.

Maruyama, G. and Tanaka, H. (1957) Some properties of one-dimensional diffusion pro-

cesses, Mem. Fac. Kyushu Univ., 11, 117-141.

Meyer, P. (1962) A decomposition theorem for supermartingales, Illinois J. Math., 6,

193-205.

95

Naik-Nimbalkar, U.V. (1996) In Stochastic Processes and Statistical Inference, Ed. B.L.S.

Prakasa Rao and B.R. Bhat, New Age International, New Delhi, pp.52-72.

Negri, I. (1998) Efficacite globale de la fonction de repartition empirique dans le cas d’un

processus de diffusion ergodique, Ph.D. Thesis, Universite du Maine, Le Mans.

Norman, M.F. (1971) Statistical inference with dependent observations. Extensions of

classical procedures, :J. Mathematical Psychology, 8, 444-451.

Novikov, A.A. and Shiryayev, A.N. (1994) Statistics and control of random processes,

Proceedings of Steklov Institute of Mathematics, 202, Amer. Math. Soc. Provi-

dence, Rhode Island, USA.

Prabhu, N.U. (1988)Statistical Inference from Stochastic Processes, Contemporary Math-

ematics, 80, American Mathematical Society, Providence, Rhode Island.

Prabhu. N.U. and Basawa, I.V. (1991) Statistical Inference in Stochastic Processes,

Marcel Dekker, New York.

Prakasa Rao, B.L.S. (1972) Maximum likelihood estimation for Markov processes, Ann.

Inst. Statist. Math, 24, 333-345.

Prakasa Rao, B.L.S. (1974) Statistical inference for stochastic processes, Tech. Report

CRM-465, Centre de Recherches Mathematiques, Universite de Montreal.

Prakasa Rao, B.L.S. (1983) Asymptotic theory for nonlinear least squares estimator for

diffusion processes, Math. Oper. Stat. Series Statistik, 14, 195-209.

Prakasa Rao, B.L.S. (1987) Asymptotic Theory of Statistical Inference, Wiley, New York.

Prakasa Rao, B.L.S. (1988) Statistical inference from sampled data for stochastic pro-

cesses, In Contemporary Mathematics, 80, American Mathematical Society, Provi-

dence, Rhode Island, pp. 249-284.

Prakasa Rao, B.L.S. (1990) Nonparametric density estimation for stochastic processes

from sampled data, Publ. Inst. Stat. Univ. de Paris., 35, 51-83.

Prakasa Rao, B.L.S. (1991) Asymptotic theory of weighted maximum likelihood estima-

tion for growth models, In Statistical Inference in Stochastic Processes, Ed. N.U.

Prabhu and I.V. Basawa, Marcel Dekker, New York, pp. 183-208.

Prakasa Rao, B.L.S. (1996) Optimal asymptotic tests of composite hypotheses for con-

tinuous time stochastic processes, Sankhya Ser. A, 58, 8-24.

96

Prakasa Rao, B.L.S. (1996) Nonparametric approach to time series analysis, In Stochastic

Processes and Statistical Inference, Ed. B.L.S. Prakasa Rao and B.R. Bhat, New

Age International, New Delhi, pp.73-89.

Prakasa Rao, B.L.S. (1999) Statistical Inference for Diffusion Type Processes, Kendall’s

Library of Statistics No.8, Arnold, London and Oxford University Press, New York.

Prakasa Rao, B.L.S. (1999) Semimartingales and their Statistical Inference, Chapman

and Hall/ CRC Press, Boca Raton., Florida.

Prakasa Rao, B.L.S. (2001) Statistical inference for stochastic partial differential equa-

tions, In Selected Proceedings of the Symposium on Inference for Stochastic Pro-

cesses, Ed. I.V. Basawa, C.C.Heyde, and R.L.Taylor, IMS Monograph Series, 37,

pp.47-70.

Prakasa Rao, B.L.S. (2001) Nonparametric inference for parabolic stochastic partial

differential equations, Random Operators and Stochastic Equations, 9, 329-338.

Prakasa Rao, B.L.S. (2002) Nonparametric inference for a class of stochastic partial

differential equations based on discrete observations, Sankhya Ser.A, 64, 1-15.

Prakasa Rao, B.L.S. (2002) On some problems of estimation for some stochastic partial

differential equations, In Uncertainty and Optimality, Ed. J.C.Misra (2002) World

Scientific, Singapore, pp. 71-154.

Prakasa Rao, B.L.S. (2003) Parametric estimation for linear stochastic differential equa-

tions driven by fractional Brownian motion, Random Operators and Stochastic Equa-

tions, 11, 229-242.

Prakasa Rao, B.L.S. (2004) Self-similar processes, fractional Brownian motion and sta-

tistical inference, In Festschrift for Herman Rubin , Ed. A. Das Gupta, Institute of

Mathematical Statistics, Lecture Notes and Monograph Series, 45, 98-125.

Prakasa Rao, B.L.S. (2009) Conditional independence, conditional mixing and condi-

tional association, Ann. Inst. Statist. Math., 61, pp. 441-460.

Prakasa Rao, B.L.S. (2010) Statistical Inference for Fractional Diffusion Processes, Wi-

ley, London.

Prakasa Rao, B.L.S. (2012) Associated Sequences, Demimartingales and Nonparametric

Inference, Birkhauser, Springer, Basel.

97

Prakasa Rao, B.L.S. and Bhat, B.R. (1996) Stochastic Processes and Statistical Inference,

New Age International, New Delhi.

Prakasa Rao, B.L.S. and Prasad, M.S. (1976) Maximum likelihood estimation for de-

pendent random variables, J. Indian Statist. Assoc., 14, 75-79.

Prasad, M. S. (1971) Some Contribution to the theory of Maximum likelihood Estimation

for Dependent Random Variables, Ph.D. Thesis, Indian Institute of Technology,

Kanpur.

Rajarshi, M.B. (1996) Resampling methods for stochastic processes, In Stochastic Pro-

cesses and Statistical Inference, Ed. B.L.S. Prakasa Rao and B.R. Bhat, New Age

International, New Delhi, pp.90-120.

Rao, M.M. (2000) Stochastic Processes: Inference Theory, Kluwer, Dordrecht.

Renyi, A. (1963) On stable sequences of events, Sankhya Series A, 25, 293-302.

Revesz, P. 91968) The Laws of Large Numbers, Academic Press, New York.

Rippley, B.D. (1988) Statistical Inference for Spatial Point Processes, Cambridge Uni-

versity Press, Cambridge, UK.

Sagdar, D. (1974) On an approximate test of hypotheses about the correlation function

of a Gaussian random process, Theor. Probab. Math. Stat., 2, 231-238.

Sarma, Y.R. (1976) Sur les tests et sur l’estimation de parametres pour certains processus

stochastiques stationnaires, Publ. Inst. Statist. Univ. Paris, 17, 1-124.

Sagirow, P. (1970) Stochastic Methods in the Dynamics of Satellites, CISM Courses and

Lectures, 57, Springer, Berlin.

Schmetterer, L. (1974) Introduction to Mathematical Statistics, Springer, Berlin.

Silvey, S.D. (1961) A note on the maximum likelihood in the case of dependent obser-

vations, J. Roy. Statist. Soc. ser. B, 23, 444-452.

Striebel, C.T. (1959) Densities for stochastic processes, Ann. Math. Statist.,30, 559-567.

Swensen , A. (1980) Asymptotic Inference for a Class of Stochastic Processes, Ph.D.

Thesis, University of California, Berkeley.

Wald. A. (1948) Asymptotic properties of the maximum likelihood estimate of an un-

known parameter of a discrete stochastic process, Ann. Math. Statist., 19, 40-46.

98

Winnicki, J. (1988) Estimation theory for the branching process with immigration, In

Contemporary Mathematics, 80, pp. 301-322.

Woerner, J. (2001) Statistical Analysis for Discretely Observed Levy Process, Ph.D. The-

sis, Albert-Ludwig-Universitat, Freiburg.

Yanev, N.M. (1975) On the statistics of branching processes, Theory Prob. Appl., 20,

612-622.

99

Documents

C R RAO AIMSCS Lecture Notes Seriescrraoaimscs.in/uploads/lecturer_notes/Inferencefor...Gachibowli, Hyderabad-500046, INDIA. C R RAO AIMSCS Lecture Notes Series Inference for Stochastic