Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
C R RAO Advanced Institute of Mathematics, Statistics and Computer Science (AIMSCS)
Author (s): B.L.S. PRAKASA RAO
Title of the Notes: Inference for Stochastic Processes:
An Introduction
Lecture Notes No.: LN2013-02
Date: July 15, 2013
Prof. C R Rao Road, University of Hyderabad Campus, Gachibowli, Hyderabad-500046, INDIA.
www.crraoaimscs.org
C R RAO AIMSCS Lecture Notes Series
Inference for Stochastic Processes: An Introduction
B.L.S. Prakasa Rao
CR Rao AIMSCSJuly 15, 2013
1
Preface
This Lecture notes consist of introductory lectures on “Inference for stochastic pro-
cesses” delivered by me at the Indian Statistical Institute, Delhi Centre, University of
Pune, Indian Institute of Technology, Mumbai, University of Hyderabad and at the ”Sum-
mer School” arranged by the University of Bocconi,Italy during July 1-20, 2002 and at
the University of Bocconi, Milano, Italy during December 3-16, 2006. The earlier books
dealing with this topic were Statistical Inference for Markov Processes by P. Billingsley,
University of Chicago Press,Chicago (1961), Statistics of Random Processes: General The-
ory by R.S. Liptser and A.N. Shiryayev, Springer, New York (1977), Statistics of Random
Processes: Applications by R.S. Liptser, and A.N. Shiryayev, Springer, New York (1978)
, Statistical Inference for Stochastic processes by I.V. Basawa and B.L.S. Prakasa Rao,
Academic Press, London (1980) and Abstract Inference by Ulf Grenander, Wiley, New
York (1981). There is a large amount of literature since then dealing with various aspects
on inference for stochastic processes. Some of the important review papers and other
papers and books are listed at the end of this notes. The journal ”Statistical Inference for
Stochastic processes” edited by Denis Bosq, published since 1998, deals with parametric,
semiparametric and nonparametric inference in discrete and continuous time processes.
B.L.S. Prakasa Rao
CR Rao AIMSCS
Hyderabad, India
July 15. 2013
2
Lecture 1
Stochastic Processes
Let (Ω,F , P ) be a Probability space. A stochastic process Xt, t ∈ τ is a family of
random variables defined as (Ω,F , P ). We consider τ = [0,∞) or τ = 1, 2, . . . in
general. Let t1, t2, . . . , tk ∈ τ. The joint distribution of (X(t1), . . . , X(tk)) is called a
finite dimensional distribution of the process. The probability structure of the process
will be completely known once we are able to find all the finite dimensional distributions.
The finite dimensional distributions of the process form a consistent family.If τ = [0,∞),
then Xt, t ∈ τ is called a continuous time stochastic process. If τ = 1, 2, . . ., then
we will call it a discrete time stochastic process.
Discrete time case
Suppose we have observed X1, . . . , Xn. Is it possible to determine the probability
structure of the process as n→ ∞?
Continuous case
Suppose the process X(t), 0 ≤ t ≤ T is observed. Is it possible to determine the
probability structure of the process as T → ∞?
For any n and t1, t2, . . . , tn ∈ τ, specify a probability distribution on Rn by the joint
distribution function
Ft1,.,tn(x1, ., xn).
Then F defines a probability measure on the σ -algebra of Borel sets B in Rτ . Let
Rτ be the space of all real-valued functions and
C = x ∈ Rτ : (x(t1), ., x(tn)) ∈ B
where t1, ., tn ∈ τ, B ⊂ Rn, n ≥ 1. Consider the σ -algebra Bτ generated by such
cylinder sets.
Kolmogorov’s consistency theorem
The family of finite dimensional distribution functions Ft1,.,tn(x1, ., xn), n ≥ 1, tk ∈τ, 1 ≤ k ≤ n induces a probability measure on (Rτ ,Bτ ) if and only if
(a) any Ft1,...,tn(x1, . . . , xn) is symmetric with respect to any permutation of the vector
(x1, ., xn) and the same permutation of the vector (t1, ., tn) and
(b) limx→∞
Ft1,...,tn(x1, . . . , xn−1, x) = Ft1,...,tn−1(x1, ., xn−1).
3
Let (Ω,F) be a measurable space and Pθ, θ ∈ Θ be a family of probability measures
defined on (Ω,F). Let Xn, n ≥ 1 be a stochastic process defined on (Ω,F , Pθ).
Suppose we observe the process Xk, 1 ≤ k ≤ n. The basic problem is to estimate the
parameter θ based on the observation Xk, 1 ≤ k ≤ n.Let FX1,.,Xn;θ(x1, ., xn; θ) be the joint distribution function of (X1, ., Xn) when theta
is the true parameter.
We say that the family of probability measures Pθ is dominated by the σ -finite
measure µ if
µ(A) = 0 ⇒ Pθ(A) = 0 for all A ∈ F . (Pθ ≪ µ)
We can write down the likelihood function for (X1, ., Xn) in case µ is a Lebesgue measure
on Rn or µ is a counting measure on Rn and the problem of estimation of the parameter
θ through the maximum likelihood method is well understood.
Suppose Xt, t ≥ 0 is a stochastic process defined on (Ω,F , Pθ), θ ∈ Θ. Suppose
further we observe the process Xt, 0 ≤ t ≤ T. The problem is to estimate θ. The
question arises to what is the joint distribution of Xt, 0 ≤ t ≤ T. How to define the
likelihood function? How to calculate the likelihood function even if it is defined? Let
us look at the process as a mapping from Ω to RT . From the Kolmogorov consistency
theorem, there exists a probability measure Qθ generated by XT = Xt, 0 ≤ t ≤ T on
(RT ,BT ) when θ is the true parameter. If we know that Qθ1 ≪ Qθ2 and Qθ2 ≪ Qθ1 ,
for all θ1, θ2 in Θ, then we can compute the Radon-Nikodym derivative
dQθ
dQθ0
with respect to a fixed θ0 ∈ Θ and try to maximize it to obtain an estimator for θ.
We now discuss some concepts leading to such methodology.
Examples of Stochastic Models
Stochastic models are used in scientific research in a spectrum of disciplines. We now
describe a few to indicate their use.
1) Random Walk model for neuron firing (Point process)
The neuron fires when the membrane potential reaches a critical threshold value, say
C. Excitatory and inhibitory impulses are the inputs for the neuron: these inputs arrive
according to a Poisson process. Each excitatory impulse increases and each inhibitory
impulse decreases the membrane potential by a random quantity X with same p.d.f.
f(x). After each firing, the membrane potential is reset at zero and the process is repeated.
Let Y1, Y2 . . . denote the times at which the neuron fires. The process of interspike
intervals Y1, Y2 − Y1, Y3 − Y2, . . . is of interest to the neurologist.
4
2) Epidemiology (Greenwood Model)(Markov Chain)
Suppose at the time t = 0, there are S0 suceptibles and I0 infectives. After a
certain latent period of the infection, (say) a unit of time, some of the suceptibles are
infected. Thus, at time t = 1, the initial S0 suceptibles split into two groups : those
who are infected, I1 in number say, and the remaining suceptibles say, S1. The process
continues until there are no more suceptibles in the population.
Note that
S(t) = S(t+ 1) + I(t+ 1), t = 0, 1, 2, . . .
Suppose the probability of a suceptible being infected is p. Then
P (S(t+ 1) = s(t+ 1)|S(t) = s(t)) =
(s(t)
s(t)− s(t+ 1)
)ps(t)−s(t+1)(1− p)s(t+1).
since s(t)− s(t+ 1) are infected and s(t+ 1) are suceptible.
The process S(t), t = 0, 1, 2, . . . is a Markov chain.
3) Population growth model (Branching process)
Suppose an organism produces a random number, say Y , of offspring with pk =
P (Y = k), k = 0, 1, 2, . . . ,∑pk = 1. Each offspring in turn produces organism indepen-
dently according to the same distribution pk. Suppose Z(0) = 1. If Z(t) denotes
the population size at the t -th generation, t = 0, 1, 2, ., then Z(t) is a Markov chain
with transition probabilities given by
P (Z(t) = j |Z(t− 1) = i) = P (Y1 + · · ·+ Yi = j)
where Y1, Y2, . . . are i.i.d. with distribution pk.
4) Population genetics (Diffusion process)
Consider a population of 2N genes each of which belongs to one of the two genotypes
(say) A and B. Let X(t) denote the proportion of type A genes in the t -th generation.
Assuming that the total number of genes remain the same from one generation to next
(we are neglecting selection and mutation effects), the genes in the (t+1) -th generation
may be assumed to be a random sample of size 2N of genes from the t -th generation.
The sequence X(t), t = 1, 2, . . . form a Markov chain. Conditionally on X(t− 1) = x,
2N X(t) will be a Bionomial random Variable with 2N as the number of trials and x
as the probability of success. One can approximate the Markov chain by a continuous
time Markov process with a continuous state space [0, 1]. Such an approximation is an
example of a diffusion process.
5
5) Storage model
Let X(t) denote the annual random input during the year (t, t+1) and M be the
annual non-random release at the end of each year. Let Z(t) denote the content of the
dam after the release. Then
Z(t+ 1) = minZ(t) +X(t), K −minZ(t) +X(t),M)
where K is the capacity of the dam and t = 0, 1, 2, . . . If the inputs X(t) are assumed
to be independent random variables, then the sequence Z(t), t = 0, 1, 2, . . . forms a
Markov chain.
6) Compound Poisson Model (Insurance)
Suppose an insurance company receives claims from its clients in accordance with a
Poisson process with intensity λ. Assume that Yk, k = 1, 2, . . . of successive claims are
independent random variable with common distribution function F (.). Then the total
amount X(t) of claims arising in the time interval [0, t] is given by
X(t) = Y1 + · · ·+ YN(t)
where N(t) is a Poisson random variable with mean λt. The process X(t), t ≥ 0 is
a compound Poisson process.
7) Queuing Model (for telephone calls)
Suppose the calls arrive at a telephone exchange according to a Poisson process. Dura-
tion of successive calls may be assumed to be independent exponential random variables.
The capacity of the exchange may be limited to (say) K calls at any given time. The
expected waiting time for a call to go through and the queue size at any particular time
are of interest.
8) Signal processing
Suppose X(t) is a signal satisfying the equation
X(t+ 1) = aX(t) + ξ(t)
where a is a fixed parameter and ξ(t) represents error. Suppose the true signal is
unobserved but Y (t) is observed where
Y (t) = X(t) + Z(t)
where Z(t) is noise. The problem is to estimate the signal X(n+1) given Y (0), . . . , Y (n).
9) Time Series
6
Let X(t) denote the price of a commodity at time t. Suppose we fit a ARMA model
X(t) + α1X(t− 1) + · · ·+ αpX(t− p) = Z(t) + β1Z(t− 1) + · · ·+ βqZ(t− q)
where Z ’s are i.i.d. unobservable random variables. Given X(0), ., X(n), we may
want to predict X(n+ 1) and in turn the problem is to estimate α ’s and β ’s.
7
Lecture 2
Discrete parameter martingales
Let (Ω,F , P ) be a probability space. Let Fn, n ≥ 1 be a non-decreasing sequence
of sub σ -algebras of F . Suppose Zn, n ≥ 1 is a sequence of random variables defined
on (Ω,F , P ) such that
(i) Zn is measurable with respect to Fn,
(ii) E|Zn| <∞,
(iii) E(Zn|Fm) = Zm a.s. for all 1 ≤ m < n, n ≥ 1.
Then the sequence Zn, n ≥ 1 is said to be a martingale with respect to Fn, n ≥ 1and we say that Zn,Fn, n ≥ 1 is a martingale.
Remark. It is clear that E(Zn) = E(Zm) for all n and m if Zn,Fn, n ≥ 1 is a
martingale. If (i) and (ii) hold and if (iii) E(Zn|Fm) ≥ Zm a.s. for all 1 ≤ m ≤ n, then
Zn,Fn, n ≥ 1 is said to be a submartingale. A submartingale Zn,Fn, n ≥ 1 is said
to be L1 -bounded if supnE|Zn| <∞.
Uniform integrability
A sequence of random variable Yn is said to be uniformly integrable if
limc→∞
supnE|Yn|I(|Yn| > c) = 0.
Remarks: A sufficient condition for uniform integrability of the sequence Yn is that
supnE|Yn|1+ε <∞ for some ε > 0.
Martingale Convergence Theorem : Let Zn,Fn, n ≥ 1 be an L1 -bounded sub-
martingale. Then there exists a random variable Z such that limn→∞
Zn = Z a.s. and
E|Z| ≤ lim infn→∞
E|Zn| <∞.
Remarks : (i)If the submartingale is uniformly integrable, then Zn → Z in L1 and
if Zn,Fn is an L2 -bounded martingale, then E|Zn − Z|2 → 0. (ii) Any nonnegative
martingale converges a.s.
Examples of martingales
8
1) Suppose X1, X2, . . . are independent random variable with E(Xi) <∞ for i ≥ 1.
Define
Sn = X1 + · · ·+Xn
and let Fn be σ -algebra generated by X1, ., Xn. Suppose E(Xi) = 0 for all i.
Then Sn,Fn, n ≥ 1 is a martingale.
2) Suppose X1, X2, . . . are independent random variables with E(|Xi|) < ∞ for
i ≥ 1. Let t be a real number and define
Zn =eitSn
E[eitSn ], n ≥ 1.
Then Zn,Fn, n ≥ 1 is a martingale where Fn is the σ -algebra generated by
X1, . . . , Xn.
3) Let Xn, n ≥ 1 be a stochastic process with f(x1, . . . , xn; θ) as the joint density of
(X1, . . . , Xn) when θ is a scalar parameter. Let Ln(θ) = f(X1, ., Xn; θ). Suppose
the function Ln(θ) is differentiable with respect to θ. Let
un(θ) =d
dθ[logLn(θ)− logLn−1(θ)]
and Fn be the σ -algebra generated by X1, . . . , Xn. Then ∑n
i=1 ui(θ),Fn, n ≥ 1forms a martingale under some regularity conditions.
Sketches of proofs
1.
E(Sn|X, . . . , Xn−1)
= E[Sn−1 +Xn|X1, ., Xn−1]
= Sn−1 + E[Xn|X1, . . . , Xn−1]
= Sn−1 + E(Xn)
= Sn−1.
2.
E[Zn|X1, . . . , , Xn−1]
= E
[eit(Sn−1+Xn)
E[eitSn−1eitXn ]
∣∣∣∣X1, . . . , Xn−1
]=
eitSn−1
E[eitSn−1 ]E(eitXn)E[eitXn
∣∣X1, . . . , Xn−1]
=eitSn−1
E[eitSn−1 ].
9
3.∞∫
−∞f(xn|x1, . . . , xn−1; θ)µ(dxn) = 1. Suppose we assume that differentiation under
the integral sign with respect to the parameter θ is allowed. Then∞∫
−∞
df(xn|x,...,xn−1;θ)dθ
µ(dxn) = 0
which implies that∞∫
−∞
ddθ[logLn(θ)− logLn−1(θ)]f(xn|x, . . . , xn−1; θ)µ(dxn) = 0.
Hence
E[un(θ)|X1, . . . , Xn−1] = 0.
Remarks: Note that ∑n
i=1Zi − E(Zi|Z1, . . . , Zi−1),Fn, n ≥ 1 forms a mar-
tingale for any sequence of random variables Zn defined on a probability space
(Ω,F , P ) with E|Zn| <∞ where Fn is the σ -algebra generated by Z1, . . . , Zn.
10
Lecture 3
Weak law of large numbers (WLLN)
Suppose Sn,Fn, n ≥ 1 is a zero mean martingale with Sn =∑n
i=1Xi. Further
suppose that E(X2i ) <∞ for i ≥ 1. Then it follows that
E(XiXj|Fi) = XiE(Xj|Fi) for 1 ≤ i < j
= 0
and hence E(XiXj) = 0 for 1 ≤ i < j. Therefore
Var(Sn) = E(S2n) =
n∑i=1
Var(Xi).
Hence
P (|Sn| ≥ ε) ≤ ε−2E(S2n) (by Chebyshev’s inequality)
which implies thatSn
n
p−→ 0 if1
n2
n∑j=1
EX2j → 0
which can be termed as a WLLN for martingales.
Remarks: Weaker condition can be given for the WLLN to hold.
Strong Law of Large Numbers (SLLN)(Feller (1971)), p.242; Loeve (1977), p.250)
Suppose Sn is a zero mean martingale with E(X2i ) <∞ for i ≥ 1. Further suppose
that there is a sequence bn ↑ ∞ such that
∞∑n=1
EX2n
b2n<∞.
Then limn→∞
Sn
bn= 0 a.s.
Remarks: For alternate conditions for the SLLN to hold , see the results stated later in
this section.
Central Limit Theorem (CLT)
The following central limit theorem was proved for martingales by Billingsley (1961) and
by Ibragimov (1963).
11
Theorem: Let Zn, n ≥ 1 be a strictly stationary ergodic process such that E(Z21) is
finite and E(Zn|Z1, . . . , Zn−1) = 0 a.s. for n > 1 and E(Z1) = 0. Then
n−1/2
n∑k=1
ZkL−→ N(0, E(Z2
1)) as n→ ∞.
CLT (Brown (1971)) Let Sn,Fn, n ≥ 1 denote a zero mean martingale where Sn =
X1 + · · ·+Xn. Suppose E(X2i ) <∞, i ≥ 1. Let
V 2n =
∑i−1
E(X2i |Fi−1),
and
s2n = EV 2n = ES2
n.
If V 2n
s2n
p→ 1 and
1
s2n
n∑i=1
E(X2i I(|Xi| ≥ εsn)) → 0
as n→ ∞ for all ε > 0, then
Sn
sn
L−→ N(0, 1) as n→ ∞.
Remark: For more general versions of CLT, see later in this section.
Maximal inequality
If Sn,Fn, n ≥ 1 is a zero mean martingale, then
P ( max1≤k≤n
|Sn| ≥ ε) ≤ 1
ε2E(S2
n).
Toeplitz Lemma.
If ai, i ≥ 1 are positive and bn =∑n
i=1 ai ↑ ∞, then xn → x implies b−1n
∑ni=1 aixi →
x.
Kronecker’s Lemma
Let xn be a real sequence such that∑∞
n=1 xn < ∞. Let bn be monotone
sequence of positive constants with bn ↑ ∞. Then
1
bn
n∑i=1
bixi → 0 as n→ ∞.
The following result involves classical Lindeberg condition for asymptotic normality
to hold for partial sums of independent random variables (Loeve, p.280)
12
Let X1, X2, . . . independent random variables with E(X2n) < ∞ for all n ≥ 1 and
E(Xn) = 0 for all n. Let Sn = X1+ · · ·+Xn, σ2k = Var(Xk), and s2n =
∑ni=1 Var(Xi).
ThenSn
sn
L→ N(0, 1) as n→ ∞
and
max1≤k≤n
σksn
→ 0 as n→ ∞
if and only if for every ε > 0,
1
s2n
n∑k=1
E[X2kI(|Xk| ≥ εsn)] → 0
(“if” part is due to Lindeberg and the “only if” part is due to Feller).
We now discuss some other versions of the WLLN, SLLN and the CLT for martingales.
WLLN Let Sn =∑n
i=1Xi,Fn, n ≥ 1 be a martingale and 0 < bn ↑ ∞ as n → ∞.
Let Xni = XiI(|Xi| ≤ bn), 1 ≤ i ≤ n. Then
Sn
bn
p→ 0
if
(i)n∑
i=1
P (|Xi| > bn) → 0
(ii) b−1n
n∑i=1
E(Xni|Fi−1)p→ 0, and
(iii) b−2n
n∑i=1
EX2ni − E[E(Xni|Fi−1)]
2 → 0.
Proof See Hall and Heyde (1980), p.30
(See Loeve (1977), p.290 for the independent case)
SLLN: Let
Sn =
n∑i=1
Xi,Fn, n ≥ 1
be a zero-mean square integrable martingale and
Un, n ≥ 1 be a nondecreasing sequence of positive random variables such that Un is
Fn−1 -measurable. Then
limn→∞
U−1n Sn = 0 a.s.
on the set limn→∞
Un = ∞,∑∞
i=1 U−2i E(X2
i |Fi−1) <∞.
13
Proof See Hall and Heyde (1980), p.35.
CLT Let Sni,Fni, 1 ≤ i ≤ kn be a zero-mean square-integrable martingale for each
n ≥ 1. Let Xni = Sni − Sn,i−1, 1 ≤ i ≤ kn, Sn0 = 0. Suppose kn ↑ ∞ as n→ ∞. Then
the double sequence Sni,Fni, 1 ≤ i ≤ kn, n ≥ 1 is called a martingale array. Let
V 2ni =
i∑j=1
E(X2nj|Fn,j−1)
be the conditional variance of Sni.
Special case If Sn =∑n
i=1Xi,Fn, n ≥ 1 is a martingale, then Sni =Si
sn, 1 ≤ i ≤ n
where sn is the standard deviation of Sn, Fni = Fi, kn = n forms a martingale array.
Theorem: Suppose Sni,Fni, 1 ≤ i ≤ kn, n ≥ 1 is a zero mean square integrable
martingale array. Further suppose that
Fni ⊆ Fn+1,i for 1 ≤ i ≤ kn, n ≥ 1 (nested condition)
and the following conditions hold:
(i) for all ε > 0,kn∑i=1
E(X2niI(|Xni| > ε)|Fn,i−1)
p→ 0,
(ii) V 2nkn
=∑kn
i=1E(X2ni|Fn,i−1)
p→ η2.
Then
Snkn =kn∑i=1
XniL→ Z = ηN(0, 1) (stably)
where η and N(0, 1) are independent random variables.
Remarks: Note that the random variable Z has the characteristic function E(e−
12η2t2).
In factSnkn
Vn,kn
L→ N(0, 1) as n→ ∞ provided P (η2 > 0) = 1.
(For the definition of stable convergence, see p.13)
Remarks: The nested condition holds automatically in case the martingale array is
built out of a single martingale as in the special case discussed above.
Sholomitski (Theory of Probability and its Applications, 43 (1999) 434-448) discussed
necessary conditions for normal convergence of a martingale.
14
Let (Ω,F , P ) be a probability space and Xjn, 1 ≤ j ≤ kn < ∞ be a double array of
random variables defined on (Ω,F , P ).Let
Fj,n = σ(X1n, . . . , Xjn) with F0n = ϕ,Ω.
Suppose
(1) E(Xjn|Fj−1,n) = 0 a.s [P ].
Further suppose that Xjn are square integrable and
(2)kn∑j=1
E(X2jn|Fj−1,n)
p→ σ2 > 0.
Then the “conditional Lindeberg condition”,
(3) ∧n(ε) =kn∑j=1
E(X2jnI(|Xjn| ≥ ε)|Fj−1,n)
p→ 0
as n→ ∞ for every ε > 0, implies that
(4)kn∑j=1
XjnL→ N(0, σ2) as n→ ∞
(cf. Brown (1971) Ann. Math. Statist. 42, 59-66. )
Conversely suppose that
(5) max1≤i≤kn
E(X2jn|Fj−1,n)
p→ 0 as n→ ∞
and the condition (2) holds. If, as n→ ∞.
(6)kn∑j=1
cjnXjnL→ N(0, σ2) as n→ ∞
for any double array cjn of ±1, then the conditional Lindeberg condition stated in (3)
holds. If the conditional distribution of Xjn given Fj−1,n is symmetric a.s, then the
condition (4) itself implies (6) and hence the conditional Lindeberg condition (3) holds in
the presence of the conditions (2) and (5).
Stable Convergence (Renyi(1963))
15
Let (Ω,F , P ) be a probability space. Suppose YnL→ Y. Then Yn
L→ Y (stably) if
for all continuity points y of the distribution function of Y and all events E ∈ F ,
limn→∞
P ((Yn ≤ y) ∩ E) = Qy(E)
exists and if Qy(E) → P (E) as y → ∞(note the Qy(E) is a measure on (Ω,F) if it exists).
Theorem Suppose that YnL→ Y where all the Yn are defined on the same probability
space (Ω,F , P ). Then YnL→ Y (stably) if and only if there exists a random variable Y ′
with the same distribution as that of Y (possibly on an extension of (Ω,F , P ) such
that for all real t
exp(itYn) → Z(t) = exp(itY ′) weakly in L1 as n→ ∞
and E[Z(t)I(E)] is a continuous function of t for all E ∈ F .
Remarks: This theorem is a consequence of the continuity theorem for characteristic
functions.
Note: A sequence Zn on (Ω,F , P ) is said to converge weakly in L1 to an integrable
random variable Z on (Ω,F , P ) if for all E ∈ F ,
E(ZnI(E)) → E(ZI(E)), that is,
∫E
ZndP →∫E
ZdP for all E ∈ F
and we write
Zn → Z weakly in L1.
Remarks:(1) Convergence weakly in L1 is weaker than L1 -convergence. In fact Zn →Z (weakly in L1 ) implies E(ZnX) → E(ZX) for all X which are F -measurable.
Remarks: (2) If for all E ∈ F and for all continuity points y of the distribution
function of Y ,
P (Yn ≤ y) ∩ E → P (Y ≤ y)P (E)
Then
YnL→ Y (mixing).
In other words Yn are asymptotically independent of each event E ∈ F . Mixing con-
vergence is a special case of stable convergence.
Remarks: (3) (Continuation of Theorem on martingales on p.12)
Suppose that P (η2 > 0) = 1. Since Snkn → Z (stably) where Z = ηN(0, 1) , for any
16
real t, it follows that
eitSnkn → eitZ weakly in L1. Hence E[eitSnknX] → E[eitZX] for any random variable
X which is F -measurable. Let X = eiuη+ivI(E) where −∞ < u, v < ∞ and E ∈ F .Then it follows that the joint characteristic function of (Snkn , η, I(E)) converges to that of
(ηN, η, I(E)) where N is a standard normal random variable independent of (η, I(E)).
Therefore
(η−1Snkn , I(E))L→ (N, I(E))
and hence, if
V 2nkn =
n∑i=1
E(X2ni|Fn,i−1)
p→ η2,
as in the martingale limit theorem on p.12, it follows that
(V −1nknSnkn , I(E)
L→ (N, I(E))
which implies that
V −1nknSnkn
L→ N (stably)
as n→ ∞.
Remarks: (4) The notion of stable convergence is helpful in interchanging the random
norming and non-random norming for obtaining limit theorems for partial sums of mar-
tingale differences in the martingale central limit theory.
17
Lecture 4
Likelihood ratio
Let (Ω,F , P ) be a probability space and Fn be a sequence of sub σ -algebras of
F such that Fn ⊆ Fn+1, n ≥ 1 and Fn ↑ F . Let P ⋆ be another probability measure
defined on (Ω,F). Note that P ⋆ is absolutely continuous with respect to P (P ⋆ ≪ P )
if P (A) = 0 ⇒ P ⋆(A) = 0 for any A ∈ F . If P ⋆ ≪ P, then there exists a random
variable
Z =dP ⋆
dPwhich is F - measurable such that
P ⋆(A) =
∫A
ZdP,A ∈ F .
The random variable Z is the density (Radon-Nikodym derivative) of P ⋆ with respect
to P. If Z > 0 a.s. [P], then the measure P ⋆ and P are equivalent and we write
P ≃ P ⋆.
Let P ⋆n denote the restriction of P ⋆ to Fn and Pn be the restriction of P to
Fn. If P ⋆n ≪ Pn for every n, then we say that P ⋆ is locally absolutely continuous with
respect to P and write P ⋆loc≪ P. Suppose P ⋆
loc≪ P. Let
Zn =dP ⋆
n
dPn
.
Zn is called the local density. For any A ∈ Fn,
∫A
Zn+1dP =
∫A
dP ⋆n+1
dPn+1
dP
=
∫A
dP ⋆n+1
= P ⋆n+1(A)
= P ⋆n(A) since A ∈ Fn
=
∫A
dP ⋆n
=
∫A
dP ⋆n
dPn
dP
=
∫A
ZndP.
Hence E(Zn+1|Fn) = Zn, which implies that Zn,Fn, n ≥ 1 is a martingale and it is
a nonnegative martingale. Hence Zn → Z a.s. [P]. If E[Z] = 1, then E|Zn − Z| → 0
by Scheffe’s theorem and P ⋆ ≪ P and Z = dP ⋆
dP. In fact Zn = E(Z|Fn). In general Z
is the density of the absolutely continuous component of P ⋆ with respect to P.
18
As an application of the above idea, one can obtain the following result which gives a
method for calculating the Radon-Nikodym derivative (Gikhman and Skorokhood (1974)
Theory of Stochastic Processes).
Theorem. Let (Ω,F , P ) be a probability space and Q be another probability measure
on (Ω,F) absolutely continuous with respect to P. Let Ank, k ≥ 1 be a measurable
partition of Ω for each n ≥ 1. Suppose the sequence of partitions is nested. Let
gn(ω) =Q(An,k(ω))
P (An,k(ω))
if P (An,k(ω)) > 0 where An,k(ω) is that set of the sequence Ank,k ≥ 1 which contains
ω. If P (An,k(ω)) = 0, let gn(ω) = 0. Then the sequence gn,Fn, n ≥ 1 is a martingale
where Fn = σ(An1, An2, . . .). Suppose Fn ↑ F as n→ ∞. Then there exists a limiting
function g(ω) such that
gn(ω) → g(ω) a.s as n→ ∞
independent of the sequence of partitions Ank, k ≥ 1;n ≥ 1, and for arbitrary B ∈ F ,
Q(B) =
∫B
g(ω)P (dω).
Estimation by the maximum likelihood method
Consider a stochastic process Xn1n ≥ 1 such that the finite dimensional distribu-
tions of the process are known but for a scalar parameter θ. Suppose θ ∈ Θ open. Let
us suppose that the process is observed up to time “n′′.
Let Ln(θ) be the likelihood function associated with the observation (X1, . . . , Xn).
Let pn(x1, . . . , xn; θ) = Ln(θ) be the joint probability (density) function of (X1, . . . , Xn).
Note that
pn(x1, . . . , xn; θ) = p1(x1; θ)p2(x1, x2, ; θ)
p1(x1; θ)· · · pn(x1, . . . , xn; θ)
pn−1(x1, . . . , xn−1; θ)
= p1(x1, θ)p2(x2; θ|x1) · · · pn(xn; θ|x1, . . . , xn−1).
Hence
log pn(x1, . . . , xn, θ) = log p1(x1; θ)
+log p2(x1, x2; θ)− log p1(x1; θ)
+log pn(x1 . . . , xn; θ)− log pn−1(x1, . . . , xn−1; θ).
19
In other words
log Ln(θ) = log L1(θ)
+log L2(θ)− log L1(θ) + · · ·+ log Ln(θ)− log Ln−1(θ).
For convenience, let us define L0(θ) ≡ 1. Then
log Ln(θ) =n∑
i=1
log Li(θ)− log Li−1(θ) .
Assume that
pn(xn; θ|x1, . . . , xn−1) =pn(x1, . . . , xn; θ)
pn−1(x1, . . . , xn−1; θ)=
Ln(θ)
Ln−1(θ)
is differentiable twice with respect to θ under the (summation) integral sign. and
Eθ
(d log Ln(θ)
dθ
)2
<∞, θ ∈ Θ.
Note that
dlog Ln(θ)
dθ=
n∑i=1
d
dθ[log Li(θ)− log Li−1(θ)]
=n∑
i=1
ui(θ) (say).
Then
Eθ(ui(θ)|Fi−1) = Eθ
(d
dθlog pi(Xi; θ|x1, . . . , xi−1)|Fi−1
)= 0 a.s.(7)
and
(8) Eθ
(u2i (θ)|Fi−1
)= −Eθ
(dui(θ)
dθ|Fi−1
)in view of the assumption made above. Let
(9) In(θ) =n∑
i=1
Eθ(u2i (θ)|Fi−1).
Observe that In(θ) is the partial sum of the conditional information in Xi given X1, . . . , Xi−1
summed over 1 ≤ i ≤ n. Let
(10) Jn(θ) =n∑
i=1
vi(θ) where vi(θ) =dui(θ)
dθ.
20
In view of (7),
(11) d log Ln(θ)
dθ,Fn, n ≥ 1
is a martingale. Furthermore
(12) Eθ(u2i (θ) + vi(θ)|Fi−1) = 0 a.s.
Existence of a consistent solution of the likelihood equation
Observe that
dlogLn(θ)
dθ
∣∣∣∣θ=θ′
=n∑
i=1
ui(θ′)
=n∑
i=1
ui(θ) + (θ′ − θ)n∑
i=1
dui(θ)
dθ
∣∣∣∣θ⋆
=n∑
i=1
ui(θ) + (θ′ − θ)Jn(θ⋆)
=n∑
i=1
ui(θ)− (θ′ − θ)In(θ) + (θ′ − θ)(Jn(θ⋆) + In(θ))
(13)
where θ⋆ = θ + γ(θ′ − θ) with |γ| < 1. Let Xi = ui(θ) and Un = In(θ). Applying the
SLLN stated on p.11, it follows that
(14)
∑ni=1 ui(θ)
In(θ)→ 0 a.s as n→ ∞
provided
(15) In(θ) → ∞ a.s as n→ ∞
and
(16)∞∑1
I−2i (θ)E(u2i (θ)|Fi−1) <∞ a.s.
21
Let an be any sequence of positive numbers and bn =∑n
j=1 aj. Then
∞∑n=1
(n∑1
aj
)−2
an
=∞∑n=1
b−2n (bn − bn−1) (b0 ≡ 0)
=∞∑n=1
bn(b−2n − b−2
n+1
)=
∞∑n=1
bn(b−1n − b−1
n+1)(b−1n + b−1
n+1)
≤ 2∞∑n=1
(b−1n − b−1
n+1) since bn ≤ bn+1
≤ c
b1<∞.
It can now be checked that the condition (15) implies the condition (16) (see Hall and
Heyde, p.158) since In(θ) =∑n
j=1E(u2j(θ)|fj−1). Equation (13) implies that
1
In(θ)
dlogLn(θ)
dθ
∣∣∣∣θ=θ′
=1
In(θ)
n∑i=1
ui(θ)− (θ′ − θ)
+(θ′ − θ)Jn(θ
⋆) + In(θ)
In(θ)(17)
Relation (17) implies that the likelihood equation
dlogLn(θ)
dθ= 0
has a solution in [θ − δ, θ + δ] a.s. if
(C1) In(θ)a.s→ ∞ as n→ ∞, and
(C2) limn→∞
|In(θ)+Jn(θ⋆)|In(θ)
< 1 a.s.
Remarks: Another set of sufficient conditions for the existence of a strongly con-
sistent root are
(C1) In(θ)a.s→ ∞ as n→ ∞, and
(C3) for any δ > 0 such that (θ − δ, θ + δ) ⊂ Θ, there exists K(δ) > 0 and h(δ) ↓ 0
such that
lim infn→∞
Pθ
sup
|θ′−θ|≥δ
1
In(θ)[logLn(θ
′)− logLn(θ)] < −k(δ)
≥ 1− h(δ).
22
Asymptotic Normality
Let us now consider the equation (13), viz.,
dlogLn(θ)
dθ
∣∣∣∣θ=θ′
=n∑
i=1
ui(θ) + (θ′ − θ)Jn(θ⋆).
Let θ′ = θn be a MLE. ThendlogLn(θ)
dθ
∣∣∣∣θ=θn
= 0
and hencen∑
i=1
ui(θ) = (θ − θn)Jn(θ′)
where |θ⋆ − θ| ≤ |θ − θn|. Divide both sides by [In(θ)]12 , we have
1
(In(θ))12
n∑i=1
ui(θ) = (In(θ))12 (θn − θ)
−Jn(θ⋆)In(θ)
Under some conditions
1
I12n (θ)
n∑i=1
ui(θ)L→ N(0, 1) as n→ ∞.
If Jn(θ⋆)In(θ)
p→ −1 as n→ ∞, then it follows that
(In(θ))12 (θn − θ)
L→ N(0, 1) as n→ ∞.
Theorem. Suppose the following conditions hold:
(C1) (i) In(θ)a.s→ ∞ as n→ ∞,
(ii) Jn(θ)EIn(θ)
p→ η2(θ) > 0 a.s. for some random variable η(θ)
(iii) Jn(θ)In(θ)
p→ −1 as n→ ∞uniformly on compact subsets of θ.
(C2) For δ > 0, suppose |θn − θ| ≤ δ/(EθIn(θ))12 . Then
(i) EθnI(θn) = EθI(θ)(1 + o(1)) as n→ ∞(ii) In(θn) = In(θ)(1 + o(1))a.s as n→ ∞(iii) Jn(θn) = Jn(θ) + o(In(θ)) a.s. as n→ ∞.
Then ((EθIn(θ))
− 12dlogLn(θ)
dθ,In(θ)
EθIn(θ)
)L→ (η(θ)N(0, 1), η2(θ))
23
where η(θ) and N are independent. Further more
θna.s→ θ a.s n→ ∞
and
I12n (θ)(θn − θ)
L→ N(0, 1) as n→ ∞.
Proof: Fix c > 0. Let θn = θ + c(EθIn(θ))− 1
2 . Let
∧n = logLn(θn)
Ln(θ).
Apply Taylor’s expansion:
∧n = (θn − θ)n∑
i=1
ui(θ) +1
2(θn − θ)2Jn(θ
⋆n) where |θ⋆n − θn| ≤ |θ − θn|.
Let
Wn(θ) = (EθIn(θ))− 1
2
n∑i=1
ui(θ) =(θn − θ)
c
n∑i=1
ui(θ)
and
Vn(θ) = −(EθIn(θ))−1Jn(θ
⋆n) = −(θn − θ)2
c2Jn(θ
⋆n).
Note thatLn(θn)
Ln(θ)= e∧n
where ∧n = cWn(θ)− 12c2V 2
n (θ).
In other words
ecWn(θ)Ln(θ) = ec2
2V 2n (θ) (⋆)
Let x0 be a continuity point of the distribution function of η(θ). Assumptions (C1)
and (C2) imply that
Vn(θ)p→ η2(θ)
under Pθn and
Pθn(|Vn(θ)| ≤ x0) → Pθ(η2(θ) ≤ x0).
Let f be a bounded continuous function on (−∞,∞) with f(x) = 0 for |x| > x0.
Then
Eθ
[f(Vn(θ))e
cWn(θ)∣∣ |Vn(θ)| ≤ x0
]= Eθn
[f(Vn(θ))e
c2Vn(θ)/2∣∣∣ |Vn(θ)| ≤ x0
]from (*)
→ Eθ
[f(η2(θ))ec
2η2(θ)/2∣∣∣ η2(θ) ≤ x0
]= Eθ
[f(η2x0
(θ))cc2η2x0(θ)/2
]24
where ηx0(θ) has the distribution of η(θ) conditional on η2(θ) ≤ x0. But
Eθ
[f(η2x0
(θ)ec2η2x0 (θ)/2)
]= Eθ
[f(η2x0
(θ))ecηx0(θ)N(0,1)]
where ηx0 is independent of N. Hence the joint distribution of (Wn(θ), Vn(θ)), condi-
tional on |Vn(θ)| ≤ x0, converges to that of (ηx0(θ)N(0, 1), η2x0(θ)). Let x0 → ∞. We
obtain that
g
((EθIn(θ))
− 12dlogLn(θ)
dθ,
In(θ)
Eθ(In(θ))
)L→ g
(η(θ)N(0, 1), η2(θ)
)by the continuous mapping theorem.
Remarks: IfIn(θ)
EθIn(θ)
p→ η2(θ) > 0 as n→ ∞,
then one can replace the random norming In(θ) by the non-random norming EθIn(θ)
and we obtain that
(EθIn(θ))12 (θn − θ)
L→ η(θ)N(0, 1)
where η(θ) and N are independent.
Definition: An estimator Tn of θ is said to be asymptotically first order efficient if
I12n (θ)
[Tn − θ − r(θ)I−1
n (θ)dlogLn(θ)
dθ
]p→ 0
as n→ ∞ for some r(θ) not depending on n or the observations.
Remarks: It can be checked that the MLE is asymptotically first order efficient in the
above sense under the conditions stated above.
25
Lecture 5
Note that
dlogLn(θ)
dθ
∣∣∣∣θ=θ′
=n∑
i=1
ui(θ) + (θ′ − θ)Jn(θ∗)
=n∑
i=1
ui(θ)− (θ′ − θ)In(θ) + (θ′ − θ)(Jn(θ∗) + In(θ)).
Suppose that Jn(θ∗) + In(θ) = 0 a.s. Then
dlogLn(θ)
dθ
∣∣∣∣θ=θ′
=n∑
i=1
ui(θ)− (θ′ − θ)In(θ). (α)
Substituting θ′ = θn, we have
0 =dlogLn(θ)
dθ
∣∣∣∣θ=θn
=n∑
i=1
ui(θ)− (θn − θ)In(θ) (β)
Subtracting ( β ) from ( α ), we have
dlogLn(θ)
dθ
∣∣∣∣θ′= (θn − θ′)In(θ)
and in generaldlogLn(θ)
dθ= (θn − θ)In(θ).
Special case (Conditional exponential family:)
Suppose
(18)dlogLn(θ)
dθ= In(θ)(θn − θ), θ ∈ Θ, n ≥ 1.
Then θn is the MLE. Differentiating with respect to θ on both sides of the equation
(18), we obtain that
(19)d2logLn(θ)
dθ2= I ′n(θ)(θn − θ)− In(θ)
and
Eθ
(d2logLn(θ)
dθ2
∣∣∣∣Fn−1
)= I ′n(θ)E( θn − θ
∣∣∣Fn−1)− In(θ)
= I ′n(θ)Eθ
(dlogLn(θ)
dθ
1
In(θ)
∣∣∣∣Fn−1
)− In(θ)
(from (18))(20)
=I ′n(θ)
In(θ)
dlogLn−1(θ)
dθ− In(θ)
(by the martingale property)(21)
26
But
Eθ
(d2logLn(θ)
dθ2
∣∣∣∣Fn−1
)= Eθ
(d2logLn−1(θ)
dθ2+d2logLn(θ)
dθ2− d2logLn−1(θ)
dθ2
∣∣∣∣Fn−1
)=
d2logLn−1(θ)
dθ2+ Eθ
(d2
dθ2logLn(θ)− logLn−1(θ)
∣∣∣∣Fn−1
)=
d2logLn−1(θ)
dθ2+ Eθ (vn(θ)| Fn−1)
=d2logLn−1(θ)
dθ2− Eθ
(u2n(θ)
∣∣Fn−1
)=
d2logLn−1(θ)
dθ2− (In(θ)− In−1(θ)).(22)
Relations (20) and (21) imply that
I ′n(θ)
In(θ)
dlogLn−1(θ)
dθ− In(θ) =
d2logLn−1(θ)
dθ2− (In(θ)− In−1(θ)).
Hence
(23)I ′n(θ)
In(θ)=
d2logLn−1(θ)dθ2
+ In−1(θ)dlogLn−1(θ)
dθ
.
Relations (18) and (19) imply that
d2logLn(θ)
dθ2=I ′n(θ)
In(θ)
dlogLn(θ)
dθ− In(θ)
and hence
In(θ)d2logLn(θ)
dθ2= I ′n(θ)
dlogLn(θ)
dθ− I2n(θ)
which implies that
In(θ)d2logLn(θ)
dθ2+ In(θ) = I ′n(θ)
dlogLn(θ)
dθ.
Therefore
(24)I ′n(θ)
In(θ)=
d2logLn(θ)dθ2
+ In(θ)dlogLn(θ)
dθ
.
Comparing (22) and (23), we obtain that
I ′n(θ)
In(θ)= C(θ) for all n
27
for some C(θ). This implies that
In(θ) = ϕ(θ)Hn(X1, . . . , Xn−1)
since In(θ) is Fn−1 -measurable for some function ϕ(θ) . Therefore, from the equation
(18), it follows that
(25)dlogLn(θ)
dθ= ϕ(θ)(θn − θ)Hn(X1, . . . , Xn−1)
which implies that
logLn(θ) = Hn(X1, . . . , Xn−1)
∫ϕ(θ)dθ θn −
∫ϕ(θ)θdθ
+Kn(X1, . . . , Xn).
By the factorization theorem, it follows that (θn, Hn(X1, . . . , Xn−1)) is a sufficient statis-
tic for θ. Furthermore
Ln(θ) = expHn(X1, . . . , Xn−1)(r1(θ)θn + r2(θ)) +Kn(X1, . . . , Xn).
Special case of Markov process:
Suppose Xn, n ≥ 0 is a time-homogenous Markov process. Suppose the conditional
probability (density) function of Xn given Xn−1 is f(xn|xn−1, θ). Then the likelihood
function of (x1, . . . , xn) is given by
L⋆n(x1, . . . , xn; θ) ≡ L⋆
n(θ) = g(x0)n∏
i=1
f(xi|xi−1, θ)
(we assume that the initial density g(.) of X0 does not depend on θ ). Since X0 does
not have information about the parameter θ, let us consider the likelihood function to
be
Ln(θ) ≡n∏
i=1
f(xi|xi−1, θ).
Hence
dlogLn(θ)
dθ=
n∑i=1
d
dθlogf(xi|xi−1, θ)
=n∑
i=1
ui(θ), ui(θ) =d
dθlogf(xi|xi−1, θ).
Suppose that
(26)dlogLn(θ)
dθ= In(θ)(θn − θ).
28
In particular, for n = 1, we have from (25),
d
dθlogf(X1|Xo, θ) = I1(θ)(θ1 − θ)
= ϕ(θ)H(X0)(θ1 − θ).
Note that θ1 depends on X0 and X1 and it is a solution of the equation
d
dθlogf(X1|X0, θ) = 0.
Let θ1 = m(x, y) be the solution of the equation
d
dθlogf(y|x, θ) = 0. (25a)
Thend
dθlogf(y|x, θ) = ϕ(θ)H(x)(m(x, y)− θ) (25b)
and hence
logf(y|x, θ) = H(x)m(x, y)
∫ϕ(θ)dθ −H(x)
∫θϕ(θ)dθ +K(x, y)
or equivalently
(27) f(y|x, θ) = [expH(x)m(x, y)J1(θ)−H(x)J2(θ)]K⋆(x, y).
Such a family of distributions is called a Conditional exponential family. Relation (25b)
implies that
(28)dlogLn(θ)
dθ= ϕ(θ)
n∑i=1
H(Xi−1)[m(Xi−1, Xi)− θ]
and
ui(θ) =d
dθlogf(Xi|Xi−1, θ) = ϕ(θ)H(Xi−1)[m(Xi−1, Xi)− θ].
Hence
Eθ[ui(θ)|Fi−1] = ϕ(θ)H(Xi−1)[Eθ(m(Xi−1, Xi)|Fi−1)− θ]
= 0 a.s
by earlier remarks.Therefore
(29) Eθ[m(Xi−1, Xi)|Fi−1]− θ = 0 a.s.
It also follows from (27) that
(30) θn = [n∑
i=1
H(Xi−1)]−1[
n∑i−=1
H(Xi−1)m(Xi−1, Xi)].
29
Note that
Eθ(u2i (θ)|Fi−1) = −Eθ
(d2
dθ2logf(Xi |Xi−1, θ)| Fi−1
).
But
d2logf(Xi|Xi−1, θ)
dθ2= ϕ′(θ)H(Xi−1)[m(Xi−1, Xi)− θ] + ϕ(θ)H(Xi−1)(−1).
Hence
(31) −Eθ
(d2logf(Xi|Xi−1; θ)
dθ2
∣∣∣∣Fi−1
)= ϕ(θ)H(Xi−1)
from (28). Hence
In(θ) =n∑
i=1
Eθ(u2i (θ)|Fi−1)
=n∑
i=1
ϕ(θ)H(Xi−1) = n∑
i=1
H(Xi−1)ϕ(θ).(32)
Using the equations (29) and (31), it can be checked that
(33)dlogLn(θ)
dθ= In(θ)(θn − θ)
from (27). In other words, the relation (32) is a necessary and sufficient condition for the
transition probability (density) function to belong to a conditional exponential family.
30
Lecture 6
Bienayme - Galton - Watson Branching process (Guttorp (1991))
Let Xij, i ≥ 1, j ≥ 1 be i.i.d. random variables taking values in the nonnegative
integers with the probability generating function (p.g.f.),
g(s) =∞∑k=0
skpk,−1 < s ≤ 1.
Let X be a random variable with P (X = k) = pk, k = 0, 1, 2, . . . The distribution of
X is called the offspring distribution. We define the branching process Zk, k ≥ 0 with
the offspring distribution pk, k ≥ 0 recursively by
(34) Z0 = z0, Zk =
Zk−1∑i=1
Xik.
We will assume that z0 = 1 and p0 + p1 < 1 in the following discussion.
The conditional p.g.f. of Zk given Zk−1 = z is given by
E[sZk |Zk−1 = z] = g(s)z
due to the independence of the random variables Xij and hence the p.g.f. of Zk is
gk(s) = E[g(s)Zk−1 ] = gk−1(g(s)).
This implies that E(Zk) = θk provided θ = E(Xij) <∞, and
V (Zk) = σ2θk−1
k−1∑j=0
θj provided σ2 = V (Xij) <∞.
Remarks: Note that once a generation is extinct, all the following generations will be
extinct as well. Extinction will occur in or before the k -th generation in one of the
following ways : the ancestor has
(i) 0 children
(ii) one child whose family becomes extinct in or before (k − 1) -th generation
(iii) two children both of whose families become extinct in or before (k−1) -th generation
and so on.
The probability qk of extinction after k generations is
qk = P (Zk = 0) =∞∑j=0
pjgjk−1(0)
= g(gk−1(0)) = g(qk−1).(35)
31
Assume that p0 > 0 and that p1 > 0. Then the function g(.) is a strictly increasing
function on the interval [0, 1]. Hence the sequence qk = g(qk−1) forms a strictly
increasing sequence of positive numbers bounded by one. Hence qk has a limit, say q,
with p0 ≤ q ≤ 1. Taking limits in (34) we get that
(36) q = g(q).
Henceg(q)− g(qk)
q − qk=q − qk+1
q − qk< 1.
Let k → ∞. We observe that g′(q) ≤ 1. Note that g′(s) is a strictly increasing function
on (0, 1) and hence g(s) is convex.
Proposition : The extinction probability q is the smallest nonnegative root of the
equation g(s) = s. If θ > 1, then 0 ≤ q < 1 with equality occurring if and only if
p0 = 0. If θ ≤ 1, then q = 1 unless p1 = 1 when q = 0.
Example: Suppose the offspring distribution is geometric. Then
pk = p(1− p)k, k = 0, 1, . . .
and hence
g(s) =p
1− (1− p)s.
Therefore θ = g′(1) = 1−pp. The process becomes extinct with probability one if p > 1
2.
If p < 12, then the extinction probability q is a solution of the equation
p
1− (1− p)s= s
with solutions 1 and 1θ. Hence q = 1
θ.
Proposition: If p1 = 1, then Zna.s.→ ∞ as n→ ∞ with probability 1− q.
Proof: If q = 1, there is nothing to prove. Suppose q < 1. Note that
gk(s) = gk−1(g(s))
and hence
g′k(s) = g′k−1(g(s))g′(s)
If s = q, then g(s) = s and hence
g′k(q) = g′k−1(q)g′(q) for all k ≥ 1.
32
Hence
g′k(q) = [g′(q)]k for all k ≥ 1.
Furthermore
P (1 ≤ Zn ≤ k) ≤k∑
j=1
P (Zn = j)
≤k∑
j=1
P (Zn = j)jqj−1
qksince q < 1
≤ g′n(q)
qk=
[g′(q)]n
qk
Therefore∞∑n=1
P (1 ≤ Zn ≤ k) ≤ 1
qk
∞∑n=1
[g′(q)]n <∞
provided g′(q) < 1. Hence, by the Borel-Cantelli lemma, it follows
P (1 ≤ Zn ≤ k infinitely often) = 0 for any k
that is
Zn → ∞ a.s as n→ ∞.
(Ref: Guttorp “Statistical Inference for Branching process”).
As pointed out earlier, we assume that Z0 ≡ 1 in the following discussion.
Theorem: Suppose that σ2 = VarZ1 < ∞ and EZ1 = θ. Let Wn = Zn
θnand Fn =
σ(Z0, . . . , Zn). Then
(i) Wn,Fn, n ≥ 0 is a martingale;
(ii) Wn → W a.s. where P (W = 0) = q where q is the smallest nonnegative root
of the equation g(s) = s (here g(.) is the probability generating function of the
offspring distribution, that is, g(s) =∑∞
j=0 sjpj );
(iii) W > 0 = Zn → ∞ a.s;
(iv) if θ > 1, then EW = 1 and V (W ) = σ2
θ(θ−1);
(v) the m.g.f. ϕ(s) = E[e−sW ] satisfies the equation ϕ(θs) = g(ϕ(s)), and
(vi) if θ > 1 and σ2 <∞, then the distribution of W is absolutely continuous except
for a jump of size q at 0.
33
Special case : Suppose a random variable X has the offspring distribution
P (X = j) = pj =ajλ
j
f(λ), j ≥ 0.
Then
E(X) =∞∑j=0
jpj =∞∑j=0
jajλ
j
f(λ)=
∞∑j=1
jajλ
j
f(λ)= λ
∞∑j=1
jajλj−1
f(λ)=λf ′(λ)
f(λ).
Note that
f(λ) =∞∑j=0
ajλj,
and
f ′(λ) =∞∑j=1
ajjλj−1, f ′′(λ) =
∞∑j=2
ajj(j − 1)λj−2.
Hence
E(X) = λf ′(λ)f(λ)
= θ and dθdλ
= f(λ)[λf ′′(λ)+f ′(λ)−λf ′2 (λ)]f2(λ)
.
Further more
σ2 = Var(X) = E(X2)− (E(X))2 = E(X(X − 1)) + EX − (EX)2
Now
E(X(X − 1)) =∞∑j=0
j(j − 1)pj =∞∑j=2
j(j − 1)ajλ
j
f(λ)
=λ2f ′′(λ)
f(λ).
Hence
σ2 =λ2f ′′(λ)
f(λ)+λf ′(λ)
f(λ)− λ2f ′2(λ)
f 2(λ)=λ2f ′′(λ)f(λ) + λf ′(λ)f(λ)− λ2(f ′(λ))2
f 2(λ)
= λ
λf ′′(λ)f(λ) + f ′(λ)f(λ)− λf ′2(λ)
f 2(λ)
= λ
dθ
dλ.
Example :(Branching process)(Bienayme-Galton-Watson process)
Let Z0, Z1, . . . , Zn, . . . be the consecutive generation sizes with Z0 = 1.
Let θ = E(Z1). Suppose that 1 < θ <∞. Assume that σ2 = Var(Z1) <∞. Let
pj = P (Z1 = j), = 0, 1, 2, . . .
34
Assume that pj belongs to a family of power series distributions as discussed above.
Then
pj =ajλ
j
f(λ), j = 0, 1, 2, . . . where λ > 0 is fixed constant,
aj ≥ 0 and f(λ) =∑∞
j=0 ajλj. We have noted that
θ =λf ′(λ)
f(λ)σ2 = λ
dθ
dλ.
We assume that θ > 1. Then Zn → ∞ with probability 1−q where q is the probability
of extinction (Biometrika 62, 49-59 (1975)). Let p(x|y, λ) be the transition probability
function of the process Zk, k ≥ 1. It can be checked that
d
dθlog p(x|y, λ) = σ−2(x− θy)
and
In(θ) = σ−2
n−1∑i=0
Zi.
Note that Z0 = 1, Z1, . . . , Zn is a realization of a Markov chain with the transition
probabilities
p(Zk|Zk−1, λ) ∝ λZkf(λ)−Zk−1
and hence the likelihood function
Ln(λ) = Πnk=1p(Zk|Zk−1, λ) ∝ f(λ)−
∑nk=1 Zk−1λ
∑nk=1 Zk .
Therefore
dlogLn(λ)
dλ=
d
dλ
(−
n∑k=1
Zk−1)logf(λ) + (n∑
k=1
Zk)logλ
= (−n∑
k=1
Zk−1)f ′(λ)
f(λ)+ (
n∑k=1
Zk)1
λ
=1
λ
[(−
n∑k=1
Zk−1)λf ′(λ)
f(λ)+ (
n∑k=1
Zk)
].
HencedlogLn(λ)
dλ=
1
λ
[(
n∑k=1
Zk)− θ(n∑
k=1
Zk−1)
]= 0
provided
θ =
∑nk=1 Zk∑n
k=1 Zk−1
.
35
Note that σ2 = λ dθdλ
and hence dθdλ
= σ2
λ> 0. Therefore θ(.) is a strictly increasing
function of λ and we can reparametrize the problem through θ. Observe that
dlogLn(λ)
dθ=
dlogLn(λ)
dλ
dλ
dθ=dlogLn(λ)
dλ
λ
σ2
=1
σ2
[(
n∑k=1
Zk)− θ(n∑
k=1
Zk−1)
]=
1
σ2
n∑k=1
(Zk − θZk−1)
=n∑
k=1
uk(θ), uk(θ) =Zk − θZk−1
σ2.
Note that E[uk(θ)|Zk−1] = 0 since E(Zk|Zk−1) = θZk−1 and the conditional information
is given by
In(θ) =n∑
k=1
E[u2k(θ)|Fk−1] = −n∑
k=1
E[duk(θ)
dθ|Fk−1]
=n∑
k=1
Zk−1
σ2=
1
σ2
n∑k=1
Zk−1.
Note that ζn(θ) ≡ E(In(θ)) =1σ2
∑nk=1 θ
k−1 = 1σ2
θn−1θ−1
. It is easy to see that Zn
θn,Fn, n ≥
0 is a martingale since
E
(Zn
θn
∣∣∣∣Fn−1
)=Zn−1
θn−1
and it is a nonnegative martingale. Hence
Wn ≡ Zn
θna.s→ W (say) n→ ∞
where W ≥ 0 a.s. We now apply the Toeplitz Lemma (Loeve (1963)), viz,
xn → x⇒ 1
σn
n∑k=0
akxk → x if σn =n∑
k=0
ak ↑ ∞ as n→ ∞.
Note that
θn =
∑nk=1 Zk∑n
k=1 Zk−1
=
∑nk=0 Zk − Z0∑n−1
k=0 Zk
=
∑nk=0 Zk∑n−1k=0 Zk
− 1∑n−1k=0 Zk
≃∑n
k=0 Zk∑n−1k=0 Zk
on [W > 0]
and hence
θn =
∑nk=0 Zk∑n−1k=0 Zk
=
∑nk=0 Zk∑nk=0 θ
k.
∑nk=0 θ
k∑n−1k=0 θ
k.
∑n−1k=0 θ
k∑n−1k=0 Zk
→ W limn→∞
θn+1 − 1
θn − 1.1
W= θ wheneverW > 0.
36
Therefore
θn =
∑nk=1 Zk∑n
k=1 Zk−1
→ θ a.s. on the set [W > 0]
This proves that strong consistency of the estimator on the set [W > 0]. The strong
consistency might not hold on the set [W = 0] which might have positive probability.
Furthermore
In(θ)
ζn(θ)=
∑nk=1 Zk−1∑nk=1 θ
k−1
=
n∑
k=1
θk−1.Zk−1
θk−1
1∑n
k=1 θk−1
→ W a.s. as n→ ∞.
Note that
(In(θ))− 1
2d log Ln(θ)
dθ=
(1
σ2
n∑k=1
Zk−1
)− 12 n∑
k=1
Zk − θZk−1
σ2
=
∑nk=1 uk(θ)
∑n
k=1E(u2k(θ)|Fk−1)
12
L→ Z ∼ N(0, 1) as n→ ∞
by the martingale CLT. However
In(θ)
ζn(θ)→ W a.s. as n→ ∞.
Note that
(In(θ))− 1
2d log Ln(θ)
dθ
=
(1
σ2
n∑k=1
Zk−1
) 12 ∑n
k=1(Zk − θZk−1)∑nk=1 Zk−1
= (In(θ))12 (θn − θ)
and
(In(θ))− 1
2d log Ln(θ)
dθ= (In(θ))
12 (θn − θ).
Hence
(In(θ))12 (θn − θ)
L→ N(0, 1) as n→ ∞ (Random norming)
and
(ζn(θ))12 (θn − θ)
L→ W12N(0, 1) as n→ ∞ (Nonrandom norming)
37
where ζn(θ) = E[In(θ)], and W and N are independent. In other words the asymptotic
distribution of the maximum likelihood estimator is not normal. Such models are called
non-ergodic models.
Special case: Suppose that θ > 1 and
(⋆) pj = P (Z1 = j|Z0 = 1) =1
θ(1− 1
θ)j−1, j = 1, 2, . . . ;
that is the off-spring distribution is geometric. Then E(X1) = θ and the probability of
extinction q = 0. Furthermore
In(θ)
ζn(θ)=σ−2
∑n−1i=0 Zi
σ−2∑n−1
i=0 θi=
∑n−1i=0 Zi∑n−1i=0 θ
i
a.s.→ W as n→ ∞
where W is standard exponential. In fact ϕ(s) = E[e−sW ] satisfies the equation
(⋆⋆) ϕ(θs) =1θϕ(s)
1− (1− 1θ)ϕ(s)
.
A solution of the equation ( ⋆⋆ ) is ϕ(s) = λλ+s
where λ > 0. Since E[W ] = 1, it follows
that λ = 1 and W is exponential with mean 1.
Bayesian Estimation: Suppose the off-spring distribution is Poisson with mean θ.
Further assume that the parameter θ has a prior density which is Gamma with the
parameters α and β, that is,
p(θ) =e−θβθα−1βα
Γ(α), 0 < θ <∞
= 0 otherwise
where α > 0 and β > 0 are known. We have seen that the likelihood function Ln(θ)
is proportional to
exp(−θn∑
k=1
Zk−1)θ∑n
k=1 Zk .
Hence the posterior density of θ , given (Z0, . . . , Zn), is proportional to
exp(−θ(β +n∑
k=1
Zk−1))θα+
∑nk=1 Zk−1.
Therefore the posterior density of the parameter θ is again Gamma with the parameters
α+∑n
k=1 Zk and β+∑n
k=1 Zk−1. The mean of the posterior density is the Bayes estimator
of the parameter θ under the quadratic loss function. It is given by
θn =α +
∑nk=1 Zk
β +∑n
k=1 Zk−1
.
38
It can be checked that θn is asymptotically equivalent to the MLE
θn =
∑nk=1 Zk∑n
k=1 Zk−1
on the set of non-extinction, that is, on the set [W > 0].
Least squares approach: Let us again consider the BGW branching process process as
discussed earlier. We have seen that
E(Zn+1|Zn) = Znθ
and
V ar(Zn+1|Zn) = Znσ2.
Let Un+1 be defined by the relation
(∗)Zn+1 = θZn + Z1/2n Un+1, n ≥ 0, Z0 = 1.
Check that (i) E(Uk) = 0, k ≥ 1 (ii) V ar(Uk) = σ2, k ≥ 1 ,and (iii) E(UkUj) = 0, 1 ≤j ≤ k − 1, E(UkZk−1) = 0, k ≥ 1.
The relation (*) is an autoregressive type model for the process Zk, k ≥ 0. Since
the error terms in (*) satisfy classical assumptions in the theory of least squares, we may
consider the least squares approach for the estimation of the parameter θ. This is done by
minimizing the error sum of square∑n
k=1 U2k with respect to θ. This gives the estimator
θ∗n =
∑nk=1 Zk∑n
k=1 Zk−1
which is the same as the MLE if the off-spring distribution is the power series distribution.
The variance σ2 can be estimated by the residual sum of squares, namely,
σ2∗ =1
n
n∑k=1
(Zk − θ∗nZk−1)2
Zk−1
.
39
Lecture 7
Estimation by conditional least squares
Let Xn, n ≥ 1 be a stochastic process defined on a probability space (Ω,F , Pθ),
θ = (θ1, . . . , θp) ∈ Θ ⊂ Rp, Θ open. Consider
Qn(θ) =n∑
k=1
[Xk − Eθ(Xk|Fk−1)]2 .
We estimate θ by minimizing Qn(θ) over Θ. We assume that Qn(θ) has partial deriva-
tives with respect to θi, 1 ≤ i ≤ p.
Assume that Eθ(Xn|Fn−1) is a.s. twice continuously differentiable with respect to
θ in some neighbourhood S of the true parameter (say) θ = (θ1, . . . , θp) ∈ Θ open.
Applying the Taylor series expansion, we have
Qn(θ) = Qn(θ) + (θ − θ)′
∂Qn(θ)
∂θ
∣∣∣∣θ=θ
+1
2(θ − θ)′
∂2Qn(θ)
∂θ2
∣∣∣∣θ=θ⋆
(θ − θ) (⋆)
where ∥θ − θ⋆| ≤ ∥θ − θ∥. Hence
Qn(θ) = Qn(θ) + (θ − θ)′
∂Qn(θ)
∂θ
∣∣∣∣θ=θ
+1
2(θ − θ)′Vn(θ − θ)
+1
2(θ − θ)′Tn(θ
⋆)(θ − θ)(37)
where
(38) Tn =∂2Qn(θ)
∂θ2
∣∣∣∣θ=θ⋆ − Vn and Vn =∂2Qn(θ)
∂θ2
∣∣∣∣θ=θ
.
Theorem 1:(Klimko and Nelson (1978)) (Consistency) Suppose that
(i)
(39) limn→∞
limδ↓0
sup||θ∗−θ||≤δ
1
nδ|Tn(θ⋆)ij| <∞. 1 ≤ i, j ≤ p,
(ii)
(40) (2n)−1Vna.s→ V
where V is a positive definite (symmetric) p× p matrix of constants and
40
(iii)
(41)1
n
∂Qn(θ)
∂θi
∣∣∣∣θ=θ
a.s→ 0, 1 ≤ i ≤ p.
Then there exits a sequence of estimators θn such that
θna.s→ θ
and for any ε > 0 there exists an event E with P (E) > 1− ε and an n0 such that on
E, for n > n, θn satisfies the equation
(42)∂Qn(θ)
∂θ
∣∣∣∣θ=θn
= 0
and Qn attains local minimum at θn.
Proof: Given ε > 0 and condition (38)-(40), applying Egorov’s theorem, we can find an
event E with P (E) > 1− ε, 0 < δ⋆ < δ, M > 0 and an n0 such that on E, for any
n > n0 and θ ∈ Nδ⋆ (an open sphere with center at θ and radius δ⋆ ) such that
(a)∣∣∣(θ − θ)′ ∂Qn(θ)
∂θ
∣∣∣θ=θ
| < nδ3,
(b) the minimum eigenvalue of (2n)−1Vn is greater than some ∆ > 0 and
(c) 12(θ − θ)′Tn(θ
⋆)(θ − θ) < nMδ3.
Using ( ⋆ ) for θ on the boundary Nδ⋆ , we have
Qn(θ) ≥ Qn(θ) + n(−δ3 + δ2∆−Mδ3)
= Qn(θ) + nδ2(−δ +∆−Mδ).
Since ∆− δ −Mδ can be made positive by choosing δ sufficiently small, Qn(θ) must
attain a minimum at some θn = (θn1, . . . , θnp) in Nθ⋆ at which point the least squares
equation (37) must be satisfied on E for any n > n0.
Replace ε by εk = 2−k and δ by δk = 1k, k ≥ 1 to determine sequence of events
Ek and sequence of sets Ek and an increasing sequence nk such that the equation
(41) has a solution on Ek for any n > nk. For nk < n ≤ nk+1, define θn on Ek to
be a solution of (41) within δk of θ and at which Qn attains a relative minimum and
define θn to be zero off Ek. Then
θn → θ on lim infk→∞
Ek,
41
but
1− P(lim infk→∞
Ek
)= P
(limsupk→∞
Eck
)= lim
k→∞P(∪∞
j=kEcj
)≤ lim
k→∞
∞∑j=k
2−j = 0.
Asymptotic normality
Asymptotic normality of the estimator θn can be obtained if the linear term in (36)
has asymptotically multivariate normal distribution. This can be verified by the Cramer-
Wold technique and appropriate Central limit theorem for martingales. Note that
n− 12λ′ ∂Qn(θ)
∂θ
∣∣∣∣θ=θ
= −2n− 12
n∑k=1
[p∑
i=1
λi
∂Eθ(Xk|Fk−1)
∂θi
∣∣∣∣θ=θ
](Xk − Eθ(Xk|Fk−1))(43)
where λ = (λ1, . . . , λp) ∈ Rp is arbitrary non zero vector. Furthermore
p∑k=1
λi∂Eθ(Xk|Fk−1)
∂θi
∣∣∣∣θ=θ
is an Fk−1 - measurable function. Hence, from (42), it follows thatn− 1
2 λ ′ ∂Qn(θ)
∂θ
∣∣∣∣θ=θ
,Fn, n ≥ 1
is a martingale. If
1
2n− 1
2 λ ′ ∂Qn(θ)
∂θ
∣∣∣∣θ=θ
L→ N(0, λ ′W λ )
for any λ = 0 where W is a p× p covariance matrix, then
(44)1
2n− 1
2∂Qn(θ)
∂θ
∣∣∣∣θ=θ
L→ N(0,W ) as n→ ∞.
Theorem 2: Suppose the conditions of Theorem 1 hold. In addition, suppose that
limn→∞
limδ↓0
sup||θ∗−θ||≤δ
1
nδ|Tn(θ⋆)ij| = 0, ≤ i, j ≤ p,
and
(45)1
2n− 1
2∂Qn(θ)
∂θ
∣∣∣∣θ=θ
L→ N(0,W )
42
as n→ ∞ where V is as defined by (39). Then
n1/2(θn − θ)L→ N(0, V −1WV −1)
as n→ ∞.
Proof: Let θn be as given by Theorem 1. Note that θn satisfies (41). Expanding
n− 12∂Qn(θ)
∂θ
∣∣∣∣θ=θn
in a Taylor’s expansion about θ, we have, by (37),
0 = n− 12∂Qn(θ)
∂θ
∣∣∣∣θ=θn
= n− 12∂Qn(θ)
∂θ
∣∣∣∣θ=θ
+ n−1 (Vn + Tn(θ⋆))n
12 (θn − θ).
Since n−1(Vn + Tn(θ⋆))
a.s→ 2V as n→ ∞ by (38) and (39), it follows that
n12 (θn − θ)
has the same asymptotic distribution as that of
−(2V )−1n− 12∂Qn
∂θ
∣∣∣∣θ=θ
.
This proves that
n12 (θn − θ)
L→ N(0, V −1WV −1)
in view of (44).
Example : (BGW process with immigration): Consider a subcritical BGW process
Zn, n = 0, 1, . . . with immigration. Suppose that the process has an initial distribution
for Z0 with EZ20 <∞. Let m and λ be the means of the offspring distribution and im-
migration distribution respectively. Assume that these distribution have finite variances.
The problem is to estimate θ = (m,λ) based on Z0, . . . , Zn. Note that the (n + 1) -th
generation is obtained from the independent reproduction of each of the individuals in the
n -th generation plus an independent immigration input with immigration distribution.
Thus,
Eθ(Zn+1|Fn) = mZn + λ.
Let
Qn(θ) =n∑
k=1
[Zk − Eθ(Zk|Fk−1)]2
=n∑
k=1
[Zk −mZk−1 − λ]2.
43
Then∂Qn(θ)
∂m= 2
n∑k−1
[Zk −mZk−1 − λ](−Zk−1)
and∂Qn(θ)
∂λ= 2
n∑k=1
[Zk −mZk−1 − λ](−1).
Equating the above to zero, we obtain that
mn =n∑n
i=1 Zi−1Zi − (∑n
i=1 Zi−1)(∑n
i=1 Zi)
n∑n
i=1 Z2i−1 − (
∑ni=1 Zi−1)2
and
λn =1
n
n∑
i=1
Zi − mn
n∑i=1
Zi−1
.
It can be shown that the process Zn is a Markov process with a stationary distribution.
If Z0 has this stationary distribution, then the process Zn is stationary and ergodic
and the erogodic theorem can be applied and we have
1
n
n∑i=1
Zia.s→ E(Z0) = λ(1−m)−1 ≡ r1 (say)
1
n
n∑i=1
Z2i
a.s→ E(Z20) = c2(1−m2)−1 + r21 = r2 (say)
and
1
n
n∑i=1
ZiZi−1a.s→ mc2(1−m2)−1 + r21
where c2 = b2 + σ2λ(1−m)−1
and σ2 and b2 are the variance of the offspring and immigration distribution. Note the
r1 and r2 are the first and second moments of Z0.
In fact, the above results hold even if the initial distribution is a general distribution( ⋆ )
and we get that
mna.s→ m, λn
a.s→ λ
( ⋆ : Follows from Billingsley (1961): Statistical Inference for Markov processes; Revesz
(1968): Law of large numbers).
Suppose the offspring and the immigration distribution have finite third moments.
Then r3 = E(Z30) <∞. It can be shown that
n12
(mn −m
λn − λ
)L→ N(0, V −1WV −1)
44
where
V −1 = c−2(1−m2)
(1 −r1−r1 r2
)and
W =
(σ21r3 + σ2
2r2 σ21r2 + σ2
2r1
σ21r2 + σ2
2r1 σ21r1 + σ2
2
)Here σ2
1 and σ22 are defined by the relation
Var(Z1|Z0) = σ21Z0 + σ2
2.
(Ref. : Hall and Heyde (1980), p. 180-181).
Method of moments
This method does not generally lead to an estimator with any “optimal property” but
it is easy to implement. We illustrate the method through two examples.
Example 1 : Let Z0 = 1, Z1, Z2, . . . be a super critical BGW branching process. Let
θ = EZ1 > 1 and 0 < VarZ1 = σ2 <∞. Suppose the problem is to estimate θ and σ2
on the basis of a single realization Zk, 0 ≤ k ≤ n+ 1.Since
Zk+1 = Xk1 + · · ·+XkZk
where, conditional on Zk, the Xki, 1 ≤ i ≤ Zk are i.i.d. random variables each with the
distribution of Z1, we have
E(Zk+1|Zk) = θZk a.s i.e., E(Zk+1
Zk
|Zk) = θ
and
E((Zk+1 − θZk)2|Zk) = σ2Zk a.s i.e., E
((Zk+1 − θZk)
2
Zk
∣∣∣∣Zk
)= σ2 a.s.
Suppose that P (Z1 = 0) = 0. Note thatZn
θn,Fn, n ≥ 0
is a nonnegative martingale and
Zn
θna.s→ W (say) as n→ ∞.
It is known that W is non-degenerate and positive a.s. (Harris (1963)) and hence
θn =Zn+1
Zn
→ θ a.s as n→ ∞.
45
Let∼θn=
1n
∑nj=0 Zj+1Z
−1j . Then
∼θn can be considered as a moment estimator for m. In
fact∼θn→ θ a.s as n→ ∞.
However θn is a better estimator than θn as far as the rate of convergence to θ is
concerned (Heyde and Leslie (1971) Bull. Austral. Math. Soc. 5, 145-155). An estimator
by the method of moments for σ2 is
σ2n =
1
n
n∑k=0
(Zk+1 − θnZk)2
Zk
.
It is clear that
E
((Zk+1 − θZk)
2
Zk
− σ2
∣∣∣∣Z0, Z1, . . . Zk
)= 0
and hence n∑
k=0
(Zk+1 − θZk)2
Zk
− σ2,Fk, k ≥ 0
forms a martingale where Fk = σ(Z0, Z1, . . . , Zk). An application of the SLLN proves
that
limn→∞
1
n
n∑k=1
(Zk+1 − θZk)
2
Zk
− σ2
= 0 a.s.
One can prove that “ θ ” in the above equation can be replaced by “ θn ” by applying the
result
θn − θ = σζ(n)(2Z−1n logn)
12
where lim supn ζ(n) = 1a.s and lim infn ζ(n) = −1a.s (Heyde (1974) Advances in Appl.
Prob. 3, 421-433.)
Example 2 : Consider a stochastic process Xn governed by the model
Xn = εn + αXn−1εn−1, n ≥ 1
where εi, i ≥ 1 are i.i.d. with Eε0 = 0, σ2 = Eε20 <∞, Eε30 = 0.
We assume that α2σ2 < 1. Note that, for k large,
a = EXk = ασ2
and
b = E(XkXk−1) = α[Eε30 + 2ασ4] = 2α2σ4.
Let
an =1
n
n∑k=1
Xk and bn =1
n
n∑k=1
XkXk−1.
46
It can be shown that
(⋆)an → a a.s
bn → b a.s
and one can estimate α and σ2. In fact
n12 (an − a)
L→ N(0,Var(X0) + 2 cov(X0, X1))
and if Eε60 <∞, then
n12 (bn − b)
L→ N(0,Var(X0X1) + 2 cov(X0X1, X1X2) + 2 cov(X0X1, X2X3).
Remarks : If the process Xk,−∞ < k <∞ is stationary, then
Xn = εn + αε2n−1 +∞∑k=2
αkε2n−k
k−1∏j=0
εn−j
and the results given in ( ⋆ ) can be proved.
47
Lecture 8
Likelihood ratios in abstract space
Let X be the sample space and Θ = 0, 1 be the parameter space. Let P0 and P1
be probability measures defined on a measurable space (X ,B). A fundamental problem
is to test the null hypothesis
H0 : θ = 0 againstH1 : θ = 1.
In the Neyman-Pearson formulation, we choose a critical region W ⊂ X such that if the
observed x ∈ W, we reject H0 and if x ∈ W, we do not reject H0 (accept H0 ). The
performance of the test is determined by
α = level of significance of the test = P0(W )
and the power γ = P1(W ). Neyman-Pearson lemma gives a method to find a test which
maximizes γ for a given α. It gives a test based on the likelihood ratio.
In a general abstract space, there is no measure equivalent to the Lebesgue measure
on Rn and hence the concept of likelihood is not possible to formulate. We let the Radon-
Nikodym derivative play the role of the likelihood ratio. The basic problem is therefore
to find a method for calculating the Radon-Nikodym derivative whenever it exists.
Given P0 and P1, there exists a measurable set H contained in X with P0(H) =
0 and a non -negative function f integrable with respect to P0 such that for any
measurable set E ⊂ X ,
P1(E) =
∫E
f(x)P0(dx) + P1(E ∩H).(⋆)
This result is known as the Lebesgue-decomposition. The function f is the Radon-
Nikodym derivative and it will be denoted by dP1
dP0(x). Note that dP1
dP0(x) is the Radon-
Nikodym derivative of the absolutely continuous component of P1 with respect to P0.
Recall that if the set H has P1(H) = 1, then the measures P0 and P1 are singular
with respect to each other. If this happens, then the critical region W = H will allow
perfect (probability one) discrimination between H0 and H1. The test will give the
correct result with the first and second kind of errors zero.
If P0 and P1 are both absolutely continuous with respect to each other, then they
are said to be equivalent. In such a case P1(H) = 0 and f = 0 with P0 -probability
one.
It is possible that P0 and P1 are neither singular nor equivalent. However if P0
and P1 are Gaussian measures, then they are either equivalent or singular with respect
to each other (Hajek (1958), Feldman (1958)).
48
Let X = R∞ and X n = (X1, . . . , Xn) and gni(X n) be the joint p.d.f. with
respect to the Lebesgue measure on Rn under Pi for i = 0 and 1. Let B be the
σ -algebra generated by all the cylinder sets with finite dimensional base and Bn be the
Borel σ -algebra in Rn. Let x n = (x1, . . . , xn) and
fn( x 0) =gn1( x n)
gn0( x n), x 0 = (x1, x2, . . . , ) ∈ X = R∞.
Suppose fn( x 0) is defined a.e. P0. Let H be the set as defined above in ( ⋆ ). Then
(i) fn( x )a.s→ f( x ) (P0)
(ii) fn( x )p→ f( x ) (P1) in Hc , and
(iii) fn( x ) → +∞ (P1) in H.
(Ref.: Grenander (1950)).
Proof: (i) Suppose H is such that P1(H) = 0. Hence P0 and P1 are absolutely
continuous with respect to each other. Then the sequence fn forms a martingale and
by the martingale convergence theorem and
fn( x )a.s→ f( x ) (P0) as n→ ∞
where f( x ) = dP1
dP0( x ) as in ( ⋆ ). Suppose H is such that 0 < P1(H) ≤ 1 . For proofs
of (ii) and (iii), see Grenander (1950) p. 108-110.
Neyman-Pearson Lemma : Suppose P1 is absolutely continuous with respect to P0.
Let f(x) = dP1
dP0(x). If the critical region W is of the form
W = x|f(x) ≥ c ⊂ X ,
then W is the “best” critical region of given size. In other words, no other critical region
at the same level of significance has greater power.
Proof: Let V ⊂ X such that P0(V ) = P0(W ). Note that
P0(W ∩ V c) = P0(W )− P0(W ∩ V )
= P0(V )− P0(W ∩ V ) = P0(V ∩W c)(46)
Hence
P1(W ∩ V c) =
∫W∩V c
f(x)P0(dx)
≥ c
∫W∩V c
P0(dx) sinceW = [x : f(x) ≥ c]
= c P0(W ∩ V c)
= c P0(V ∩W c) (by (46)).(47)
49
However
P1(V ∩W c) =
∫V ∩W c
f(x)P0(dx)
≤ c
∫V ∩W c
P0(dx) sinceW c = [x : f(x) < c]
= c P0(V ∩W c).(48)
Combining (47) and (48), we get that
(49) P1(W ∩ V c) ≥ P1(V ∩W c).
Adding P1(W ∩ V ) on both sides, we get that
P1(W ) ≥ P1(V )
which completes the proof of the result.
We have the following theorem for best Bayesian test.
Theorem: If P1 is absolutely continuous with respect to P0 and if the apriori proba-
bilities of the two hypotheses H0 and H1 are π0 and π1 respectively, then the ”best”
test, in the sense of minimizing the probability of an error, is given by the critical region
W = [ x : f( x ) >π0π1
].
Proof: The probability of the test leading to the wrong result is
α = π0 P0(RejectH0) + π1 P1(RejectH1)
= π0P0(W ) + π1P1(Wc)
when W is the critical region. Hence
α = π0
∫W
P0(dx) + π1
∫W c
P1(dx) = π0
∫W
P0(dx) + π1
∫W c
f(x)P0(dx)
= π1 +
∫W
(π0 − π1f(x))P0(dx)
since ∫W c
f( x )P0(d x ) = 1−∫W
f( x )P0(d x ).
To minimize α, we should choose W in such a way that the integral is as small as
possible. This can be done by choosing W as the set where the integrand is negative.
Note that the integrand is negative when
π0 − π1f( x ) < 0, that is, f( x ) >π0π1.
Remarks: The best critical region need not be unique.
50
Lecture 9
Representation of a second order stochastic process
Let X(t), t ∈ T be a stochastic process with E[X(t)]2 < ∞ for all t ∈ T.
Let m(t) = E[X(t)] and r(s, t) = cov(X(s), X(t)). The fundamental problem is how
to represent a stochastic process, possibly with a complicated dependence structure, as a
linear combination of “simple” elements. Here “simple” means orthogonal (uncorrelated).
Mercers’ Theorem : Consider a symmetric non-negative definite continuous function
r(s, t) on [a, b]× [a, b] and the integral equation
λϕ(t) =
∫ b
a
r(s, t)ϕ(s)ds.(⋆)
The eigenvalues λ1, λ2, . . . and the associated normalized eigenfunctions ϕ1, ϕ2, . . . sat-
isfy
r(s, t) =∞∑i=1
λiϕi(s)ϕi(t)
in L2 -sense as well as with absolute and uniform convergence. Note that ϕv are
orthogonal.
Remarks : Note that r(., t) ∈ L2([a, b]). Hence, for fixed t,
r(., t) =∞∑v=1
ρv(t)ϕv(.)
where
ρv(t) =
∫ b
v
r(s, t)ϕv(s)ds
= λvϕv(t).
Hence r(s, t) =∑∞
i=1 λiϕi(s)ϕi(t).
Karhunen-Loeve expansion :(Karhunen (1947), Loeve (1946)) Let X(t), t ∈ T =
[a, b] be a second order process continuous in the mean on [a, b] , that is
E|X(t+ h)−X(t)|2 → 0 as h→ 0.
Define λv and ϕv for v ≥ 1 as above through the covariance function r(s, t) of the
process X(t), t ∈ T. Introduce the variables
Zv =
∫ b
a
X(t)ϕv(t)dt.
51
Note that Zv are uncorrelated and form the expansion
Z(t) =∞∑v=1
ϕv(t)Zv.
Then the expansion holds in L2 -mean and P (Z(t) = X(t)) = 1, t ∈ T.
Applications:
Example : (Test for the mean value function of a Gaussian process X(t), t ∈ [0, 1]with known covariance function r(s, t) )
We want to test the hypothesis
H0;E[X(t)] = m0(t) against the alternativeH1 : E[X(t)] = m1(t).
We take the coordinates of the process
Zv =
∫ 1
0
X(t)ϕv(t)dt
as observables where ϕv(t) are as defined by ( ⋆ ). Note that the random variables Zvare independent normal random variables. In fact, under Hi, Zv ∼ N(aiv, λv) where
aiv =
∫ 1
0
mi(t)ϕv(t)dt.
It is clear that Ei(Zv) = aiv. Let
Y (t) = X(t)−mi(t), 0 ≤ t ≤ 1.
Note that E[Y (t)Y (s)] = r(t, s) and
Var(Zv) =
∫ 1
0
∫ 1
0
E[(Y (t)Y (s)]ϕv(t)ϕv(s)dt ds
=
∫ 1
0
∫ 1
0
r(t, s)ϕv(t)ϕv(s)dt ds
=
∫ 1
0
λvϕv(t)ϕv(t)dt (by (⋆))
= λv
∫ 1
0
ϕ2v(t)dt = λv.
Suppose that λv = 0 for all v (this holds when r(s, t) is positive definte). In other
words, assume that the covariance function r(s, t) is positive definite. Then
pn(z) =πnv=1(2πλv)
− 12 exp− 1
2λv(Zv − a1v)
2πnv=1(2πλv)
− 12 exp− 1
2λv(Zv − a0v)2
= expqn(z)
52
where
qn( z ) =n∑
v=1
Zv(
a1v − a0vλv
)− (a21v − a20v
2λv)
=
n∑v=1
ζv (say).
Suppose that∞∑v=1
(a1v − a0v)2
λv<∞.
Then
E0(ζv) = −(a1v − a0v)2
2λv
E1(ζv) =(a1v − a0v)
2
2λv
and
Var(ζv) =(a1v − a0v)
2
λv.
Note that∑∞
v=1Ei(ζv) < ∞ and∑∞
v=1Var(ζv) < ∞ and ζv, v ≥ 1 are independent
random variables. Hence the series∑∞
v=1 ζv converges a.s. under P0 and P1 and the
Radon-Nikodym derivative p of P0 with respect to P1 is the limit of pn. The most
powerful test for testing P0 versus P1 is given by
z : p( z ) ≥ c,
or equivalently, by
z : q( z ) ≥ c⋆.
Let
fn(t) =n∑
v=1
((a1v − a0v)
λv)ϕv(t).
Then,
qn( Z ) =
∫ 1
0
fn(t)X(t)− m0(t) +m1(t)
2dt.
Under the additional condition∑∞
v=1(a1v−a0v
λv)2 < ∞, it can be shown that fn → f in
L2 - mean and the test can be written in the form z : q( z ) ≥ c⋆ where
q( z ) =
∫ 1
0
f(t)X(t)− m0(t) +m1(t)
2dt and
∫ 1
0
r(s, t)f(t)dt = m1(s)−m0(s).
Example : Consider a nonhomogenous Poisson process N(t) on [0, 1] with positive
and continuous intensity λ(t). Then
P (N(t) = k) =(∫ t
0λ(u)du)ke−
∫ t0 λ(u)du
k!, k = 0, 1, 2, . . .
53
Note that
P (N(t)−N(s) = k) =(∫ t
sλ(u)du)ke−
∫ ts λ(u)du
k!, k = 0, 1, 2, . . . .
As the observables here, we take the number of events in the interval
Iv = [v
n,v + 1
n), v = 0, 1, ., n− 1.
Let us test H0 : λ(t) = λ0(t) against H1 : λ(t) = λ1(t). Then the likelihood ratio is
given by
pn = πn−1v=0
(∫ v+1
nvn
λ1(u)du)fv exp(−
∫ v+1n
vn
λ1(u)du)
(∫ v+1
nvn
λ0(u)du)fv exp(−∫ v+1
nvn
λ0(u)du)
where fv is the number of events in the interval [ vn, v+1
n). This follows from the fact
that the Poisson process has independent increments. As n → ∞, the sequence pnconverges a.s. [P0] and [P1] to the Radon-Nikodym derivative of P1 with respect to
P0 namely
p = [ΠNk=1
λ1(tk)
λ0(tk)] exp−
∫ 1
0
(λ1(t)− λ0(t))dt
where N is the number of events that occurred in the interval [0, 1] and t0, t1, . . . , tn
are the corresponding time points of occurrence and the most powerful test for testing
H0 versus H1 is given by the critical region[Πn
k=1
λ1(tk)
λ0(tk)> c
].
Remarks: In both the examples described above, the observables are independent ran-
dom variables.
54
Lecture 10
The following theorem due to Kakutani (1948) gives a necessary and sufficient condition
for the equivalence of two product measures.
Theorem: Consider two product measures P0 = P(1)0 ×P (2)
0 × . . . and P1 = P(1)1 ×P (2)
1 ×. . . defined on some product space X = X1 × X2 × . . . with the associated product σ -
algebra. Suppose that the probability measure P(n)0 is equivalent to P
(n)1 for all n ≥ 0.
Let
ρ(P(n)0 , P
(n)1 ) =
∫Xn
√fn(xn)P0(dxn) where fn(xn) =
dP(n)1
dP(n)0
(xn).
(Note that 0 ≤ ρ(P(n)0 , P
(n)1 ) ≤ 1 and ρ = 1 if and only if fn = 1 a.s. which holds if
P(n)1 = P
(n)0 ). Then P0 and P1 are equivalent if and only if
Π∞n=1ρ(P
(n)0 , P
(n)1 ) > 0.
Remarks: Note the Hellinger distance between the probability measures P(n)0 and P
(n)1
is defined by
ρ(P(n)0 , P
(n)1 )
=
∫Xn
√√√√dP(n)1 (xn)
dP(n)0 (xn)
dP(n)0 (xn)
=
∫Xn
√dP
(n)1 (xn)dP
(n)0 (xn).
Proof: Let
fn(xn) =dP
(n)1 (xn)
dP(n)0 (xn)
and ρn = ρ(P(n)0 , P
(n)1 ).
Then
ΠNn=1ρn =
∫X
√RN(x)P0(dx) whereRN(x) = ΠN
n=1fn(xn)
since P0 is a product measure on the space X . By the martingale convergence
theorem,
RN(x)a.s.→ f(x) (say) asN → ∞
with respect to the probability measure P0. Hence, by the Fatou’s Lemma,
0 ≤ EP0( limN→∞
√RN(x)) ≤ lim inf
N→∞EP0(
√Rn(x)) = liminf
N→∞ΠN
n=1ρn
which implies that
0 ≤ EP0(√f(x)) ≤ Π∞
n=1ρn.
55
If this infinite product is zero, then f(x) = 0 a.s. [P0] so that P1 and P0 are singular
with respect to each other. Suppose the infinite product is positive. Let M > N and
consider
EP0 [RN −RM ]2 ≤ EP0 [|R12N −R
12M |]2EP0 [|R
12N +R
12M |]2
by the Cauchy-Schwartz inequality. But
EP0 [|R12N −R
12M |2] = EP0 [(1− ΠM
n=N+1
√fn)
2 RN ]
= EP0 [RN +RM − 2ΠMn=N+1
√fn RN ]
= 2(1− ΠMn=N+1ρn) → 0
as M and N → ∞. Further more
EP0 [|R12N +R
12M |2] ≤ 2EP0(RN +RM) = 4
since (x + y)2 ≤ 2(x2 + y2). Hence RN , n ≥ 1 is a Cauchy sequence in L2(Ω,F , P ).But the L2 -space is complete. Hence RN converges in L2, which implies that∫
Xf(x)P0(dx) = 1.
Hence P1 is absolutely continuous with respect to P0.
Fatou’s Lemma: Let (Ω,F , P ) be a probability space. Suppose fn ≥ g and fn → f
a.s as n → ∞. Further suppose that E(fn) < ∞ and Eg is finite. Then E(f) ≤lim infn→∞
E(fn).
Remarks: Note that the lemma holds if fn ≥ 0 and fn → f a.s. as n→ ∞.
Example : (Grenander, p.269). Let X(t), t ∈ T, T = [a, b], −∞ < a < b < ∞be a Gaussian process with continuous covariance function r(s, t) and continuous mean
function mi(t), i = 0, 1 under the probability measures P0 and P1 respectively. Let
λv be eigenvalues with corresponding eigenfunctions ϕv such that
λvϕv(s) =
∫ b
a
r(s, t)ϕv(t)dt.
Here we choose ϕv to be orthogonal and orthonormal. Let
Zv =
∫ 1
0
X(t)ϕv(t)dt.
56
Then Zv, v ≥ 1 are independent random variables. Let P(v)i be the probability measure
of Zv under Pi. Then Zv ∼ N(aiv, λv) under Pi. Let
ρ(P(v)0 , P
(v)1 ) =
∫ ∞
−∞
√P
(v)0 (dx)P
(v)1 (dx) =
∫ ∞
−∞
√√√√dP(v)1 (x)
dP(v)0 (x)
dP(v)0 (x)
=1√2πλv
∫ ∞
−∞exp−1
4((x− a0v)
2
λv+
(x− a1v)2
λv)dx
= exp− 1
8λv(a0v − a1v)
2.
Hence, by Kakutani’s theorem, P0 and P1 are equivalent if and only if
∞Π
v=1
ρ(P v0 , P
v1 ) > 0
or equivalently∞∑v=1
(a0v − a1v)2
λv<∞.
Hence we have the following result.
Theorem : Let Pi be the probability measure generated by a Gaussian process X(t), t ∈T, T = [a, b],−∞ < a < b <∞ with a continuous covariance function r(s, t) and a con-
tinuous mean function mi(t) for i = 0, 1. Then the Gaussian probability measures P0
and P1 with continuous mean functions m0(.) and m1(.) and the common continuous
covariance function r(., .) are equivalent if and only if
∞∑v=1
(a0v − a1v)2
λv<∞.
Remarks : If the Gaussian measures P0 and P1 are equivalent, then the Radon-
Nikodym derivative is given by
limn→∞
fn(x) = limn→∞
πnv=1
1√2Πλv
exp− 12λv
(zv − a1v)2
πnv=1
1√2πλv
exp− 12λv
(zv − a0v)2
= exp∞∑v=1
(a1v − aov)
λv(zv − cv)
where cv =a1v+aov
2.
Example : Test for covariance function of a gaussian process (Sagdar (1974))
Let X(t), t ∈ [a, b] be a mean zero Gaussian process under the probability measures
P1 and P2 with the covariance functions r1(s, t) and r2(s, t) respectively. We want to
57
test the hypothesis
H0 : r(s, t) = r1(s, t) against H1 : r(s, t) = r2(s, t).
Consider the integral equation
λ
∫ b
a
r2(s, t)ϕ(t)dt =
∫ b
a
r1(s, t)ϕ(t)dt.
Let λk and ϕk be the sequence of nonzero eigenvalues and the corresponding eigen-
functions respectively satisfying the above integral equation. Consider the integral equa-
tion
r2(s, t)− r1(s, t) =
∫ b
a
r1(s, u)c(u, t)du.
Then ∫ b
a
r2(s, t)− r1(s, t)ϕk(s)ds =
∫ b
a
ϕk(s)
∫ b
a
r1(s, u)c(u, t)du
ds
=
∫ b
a
∫ b
a
r1(s, u)ϕk(s)ds
c(u, t)du
= λk
∫ b
a
∫ b
a
r2(s, u)ϕk(s)ds
c(u, t)du.
Let
gk(u) =
∫ b
a
r2(s, u)ϕk(s)ds.
Hence
(1− λk)
∫ b
a
r2(s, t)ϕk(s)ds = λk
∫ b
a
gk(u)c(u, t)du
which implies that
(1− λk)gk(t) = λk
∫ b
a
gk(u)c(u, t)du.
Hence gk(t) is an eigenfunction of the kernel c(u, t). Now∫ b
a
ϕk(s)gk(s)ds =
∫ b
a
ϕk(s)
∫ b
a
r2(s, t)ϕk(t)dt
ds
=
∫ b
a
∫ b
a
ϕk(s)ϕk(t)r2(s, t)ds dt = ak = 0
for some constant ak and, for k = j,∫ b
a
ϕj(s)gk(s)ds =
∫ b
a
ϕj(s)
∫ b
a
r2(s, t)ϕk(t)dt
ds
=
∫ b
a
∫ b
a
r2(s, t)ϕj(s)ϕk(t)dt ds
= 0.
58
Let us normalize the bi-orthogonal system ϕk and gk so that ak = 1, k ≥ 1. Then
c(s, t) =∞∑k=1
1− λkλk
ϕk(s)gk(t).
Define
Zk =
∫ b
a
X(t)ϕk(t)dt.
Then Zk ∼ N(0, λk), k ≥ 1 are i.i.d. under P1 and Zk ∼ N(0, 1) i.i.d. under P2.
Hence the likelihood ratio, given Z1, . . . , Zn, is
Ln = (λ1 . . . λn)12 e
12
∑nk=1
(1−λk)
λkZ2k .
Suppose that∞∑k=1
(1− λk)2
λk<∞.
Then the probability measures P1 and P2 are equivalent and the best test for H0
against H1 is of the form∞∑k=1
1− λkλk
Z2k ≥ u.
Let
η(t) =
∫ b
a
c(s, t)X(s)ds.
Suppose there exists a solution ζ(s) such that∫ b
a
r2(s, t)ζ(s)ds = η(t).
Then ∫ b
a
X(s)ζ(s)ds =∞∑k=1
1− λkλk
Z2k
and the best critical region for testing the hypothesis H0 against H1 is given by∫ b
a
X(s)ζ(s)ds ≥ u.
The following result due to Baxter (1956) gives sufficient conditions for checking
singularity of two gaussian measures.
Theorem : Let X(t), t ∈ [0, 1] be a Gaussian process with mean function m(t) with
bounded derivatives. Let the covariance function r(s, t) of the process be continuous
with uniformly bounded second partial derivatives for s = t. Let
f(t) = D−(t)−D+(t)
59
where
D−(t) = lims↑t
r(t, t)− r(s, t)
t− s
and
D+(t) = lims↓t
r(t, t)− r(s, t)
t− s.
Then
limn→∞
n∑k=1
[X(
k
2n)−X(
k − 1
2n)
]2=
∫ 1
0
f(t)dt a.s.
As a consequence of the above theorem, we have the following result.
Corollary : Let X(t), t ∈ [0, 1] be a Gaussian process under the probability measures
P0 and P1 for which the condition of the theorem given above hold. Define f0 and f1
as before. If ∫ 1
0
f0(t)dt =∫ 1
0
f1(t)dt,
then the probability measures P0 and P1 are singular with respect to each other.
60
Lecture 11
Stochastic Integrals and Stochastic Differential Equations
Let W (t), t ≥ 0 be the standard Wiener process, that is, the process W (t), t ≥ 0is a Gaussian process with (i) W (0) = 0 (ii) W (t)−W (s) ∼ N(0, |t− s|) and (iii) the
increments W (t1) −W (t2) and W (t4) −W (t3) are independent if 0 ≤ t1 < t2 ≤ t3 <
t4 <∞.
Remarks : (i) A Wiener process has a version which has continuous sample paths almost
surely.
(ii) A Wiener process has unbounded variation on any finite interval almost surely.
(iii) The sample paths of a Wiener process are nowhere differentiable almost surely.
Let C[0, T ] be the space of continuous functions on [0, T ] with the associated topology
generated by the uniform metric. The Wiener process W (t), 0 ≤ t ≤ T generates a
probability measure on C[0, T ]. Let us denote it by P TW .
Theorem : Doob(1953). Let ξ(t), t ≥ 0 be a stochastic process defined on a probabil-
ity spacew (Ω,F , P ) with continuous sample paths almost surely and Ft be a family
of sub- σ -algebras of F such that Ft ⊂ Fs if t ≤ s. Suppose that
(i) for all t ≥ 0, ξ(t) is Ft -measurable
(ii) E[ξ(t + h) − ξ(t)|Ft] = 0 a.s. for all t ≥ 0, h ≥ 0 that is ξ(t),Ft, t ≥ 0 is a
martingale; and
(iii) E[(ξ(t+ h)− ξ(t))2|Ft)] = h a.s. for all t ≥ 0 and h ≥ 0. Then ξ(t), t ≥ 0 is a
standard Wiener process.
Stochastic integral
Let (Ω,F , P ) be a probability space. We want to define a stochastic integral∫ T
0
f(t)dW (t)
for a suitable class of random functions f(t), t ≥ 0 with respect to the Wiener process
W (t), t ≥ 0. The integral cannot be defined in the Lebesgue-Stieljes sense since the
Wiener process W (t), t ≥ 0 is of unbounded variation a.s. on any finite interval [0, T ].
Let Ft be a family of sub σ -algebras of F satisfying
i) t1 < t2 ⇒ Ft1 ⊂ Ft2
ii) W (t) is, Ft -measurable, and
iii) W (t+ s)−W (t) is independent of Ft for any t and for every s ≥ 0.
61
Let H[0, T ] be the class of all random functions f(t), 0 ≤ t ≤ T such that f(t) is
Ft -measurable for 0 ≤ t ≤ T and∫ T
0
f 2(t)dt <∞ a.s.
Case (i) Suppose f ∈ H[0, T ] and f is a step function, that is, there exists a partition
0 = t0 < t1 < · · · < tm = T
such that
f(t) = f(ti) for ti ≤ t < ti+1 for 0 ≤ i ≤ m− 1
= f(tm−1) for tm−1 ≤ t ≤ tm.
Then define ∫ T
0
f(t)dW (t) =m−1∑k=0
f(tk)[W (tk+1)−W (tk)].
Case (ii) Consider the class of f ∈ H[0, T ] for which∫ T
0
E(f 2(t))dt <∞.
It can be shown that any such f can be approximated by a sequence of step functions
fn ∈ H[0, T ] such that
limn→∞
E[
∫ T
0
|f(t)− fn(t)|2dt] = 0
(Liptser and Shiryayev (1977)) Statistics of Random Processes)
We define ∫ T
0
f(t)dW (t) = limn→∞
∫ T
0
fn(t)dW (t).
Here the limit is taken in the sense of quadratic mean. One can show that the limit is
independent of the choice of sequence of step functions.
Case (iii) Let f ∈ H[0, T ]. Then there exists a sequence gn ∈ H[0, T ] such that∫ T
0
Eg2n(t)dt <∞
and ∫ T
0
[gn(t)− f(t)]2dtp→ 0 as n→ ∞.
62
Define ∫ T
0
f(t)dW (t) = limn→∞
∫ T
0
gn(t)dW (t)
where the limit is in the sense of probability. It can be proved that the limit will be
independent of the choice of gn.
Properties:
(i) Suppose f1, f2 ∈ H[0, T ] and α1 and α2 are random variables such that α1f1 +
α2f2 ∈ H[0, T ]. Then∫ T
0
[α1f1(t) + α2f2(t)]dW (t) = α1
∫ T
0
f1(t)dW (t) + α2
∫ T
0
f2(t)dW (t).
(ii) Let f ∈ H[0, T ] for which∫ T
0Ef 2(t)dt <∞. Then
E[
∫ T
0
f(t)dW (t)] = 0, and E[
∫ T
0
f(t)dW (t)]2 =
∫ T
0
E[f 2(t)]dt.
(iii) Let f ∈ H[0, T ]. Then, for any ε > 0 and δ > 0,
P|∫ T
0
f(t)dW (t)| > ε ≤ P∫ T
0
f 2(t)dt > δ+ δ
ε2.
(iv) Let f ∈ H[0, T ] for which∫ T
0Ef 2(t)dt <∞. Then
E[
∫ β
α
f(t)dW (t)|Fα] = 0
and
E[(
∫ β
α
f(t)dW (t))2|Fα] =
∫ β
α
E(f 2(t)|Fα)dt
whenever 0 ≤ α < β ≤ T.
Here∫ β
αf(t)dW (t) is defined to be
∫ T
0χ (t)[α,β]
f(t)dW (t) where χ[α,β](t) is the indicator
function of the interval [α, β]. For f ∈ H[0, T ], define
I(t) =
∫ t
0
f(s)dW (s).
Then I(t),Ft, t ≥ 0 is a martingale and has continuous sample paths a.s.
Stochastic differential
63
Suppose the process ζ(t), 0 ≤ t ≤ T satisfies the equation
ζ(t2)− ζ(t1) =
∫ t2
t1
a(t)dt+
∫ t2
t1
b(t)dW (t), 0 ≤ t1 ≤ t2 ≤ T
where∫ T
0|a(t)|dt < ∞ a.s and
∫ T
0b2(t)dt < ∞ a.s. Then the process ζ(t) is said to
have the stochastic differential
dζ(t) = a(t)dt+ b(t)dW (t), 0 ≤ t ≤ T.
Suppose f ∈ H[0, T ] and ζ is a random variable such that P [0 ≤ ζ ≤ T ] = 1.
Then we define ∫ ζ
0
f(t)dW (t) = I(ζ)
where I(t) is as defined above. If the random variables ζ1 and ζ2 are such that P [0 ≤ζ1 ≤ ζ2 ≤ T ] = 1, then, define∫ ζ2
ζ1
f(t)dW (t) =
∫ ζ2
0
f(t)dW (t)−∫ ζ1
0
f(t)dW (t).
We choose the random variables ζ to be stopping times most often. A random variable
ζ is a stopping time with respect to the family Ft, t ≥ 0 if [ζ ≤ t] ∈ Ft for every
t ≥ 0. Examples of stopping times are
ζ1 = inft ≥ 0 : W (t) ≥ a for a fixed constant a,
and
ζ2 = inft ≥ 0 :
∫ t
0
f(u)dW (u) ≥ a for a fixed constant a.
Theorem : Suppose f ∈ H[0, T ] for every T > 0 and∫ ∞
0
f 2(s)ds = ∞ a.s.
Let
τt = infu ≥ 0 :
∫ u
0
f 2(s)ds ≥ t.
Then
ζt =
∫ τt
0
f(s)dW (s), t ≥ 0
is a Wiener process.
Central Limit Theorem :(CLT) Suppose f ∈ H[0, T ] for every T > 0 and
1
T
∫ T
0
f 2(s)dsp→ σ2 as T → ∞.
64
Then1√T
∫ T
0
f(s)dW (s)L→ N(0, σ2) as T → ∞.
Stochastic Differential Equations
Theorem : (Existence of a solution) Suppose there exists a constant K such that
(i) |a(t, x)− a(t, y)|+ |σ(t, x)− σ(t, y)| ≤ K|x− y|, x, y ∈ R,
(iii) |a(t, x)|2 + |σ(t, x)|2 ≤ K2(1 + |x|2), and
(iii) η(0) is independent of the Wiener process W (t), t ≥ 0 with Eη2(0) <∞.
Then there exists a solution η(t), 0 ≤ t ≤ T satisfying the SDE
(i) dη(t) = a(t, η(t))dt+ σ(t, η(t))dW (t), 0 ≤ t ≤ T,
(ii) η(t) is continuous a.s. on [0, T ] with η(t) = η(0) for t = 0,
(iii) sup0≤t≤T
Eη2(t) <∞, and
(iv) η(t) is unique in the sense if η1(t) and η2(t) are two such process satisfying
(i), (ii) and (iii), then
P sup0≤t≤T
|η1(t)− η2(t)| = 0 = 1.
Remarks : The coefficient a(., .) is called the drift coefficient and the coefficient σ(., .)
is called the diffusion coefficient. The problem in statistical inference for diffusion process
is the estimation of these coefficients given the process η(t), 0 ≤ t ≤ T.
Absolute Continuity of measures generated by diffusion process
Let (Ω,F , P ) be a probability space and Ft, 0 ≤ t ≤ 1 be a nondecreasing family
of σ -algebras contained in F . Suppose Wt, 0 ≤ t ≤ 1 is a standard Wiener process
such that Wt is Ft -measurable. For instance, one can choose Ft = σWs : 0 ≤ s ≤ t.Let C[0, 1] be the space of continuous functions on [0, 1] endowed with supnorm.
Let ξt, 0 ≤ t ≤ 1 be a stochastic process defined on (Ω,F , P ) such that ξt is Ft -
measurable and ξt continuous a.s. on [0, 1]. Let µξ denote the probability measure
generated by ξt, 0 ≤ t ≤ 1 on C[0, 1] and µW denote the probability measure
generated by the Wiener process on C[0, 1].
65
Let B be the Borel σ -algebra on C[0, 1] and Bt = σx : xs, s ≤ t. Let τ be the
σ -algebra of Borel sets on [0, 1] independent of the future i.e., Bt -measurable for every
0 ≤ t ≤ 1.
Definition: A continuous process ξt,Ft, 0 ≤ t ≤ 1 defined on (Ω,F , P ) is called a
process of diffusion type if there exists a τ × B -measurable function αt(x) such that
P∫ 1
0
|αt(ξ)|dt <∞ = 1
and for each 0 ≤ t ≤ 1
dξt = αt(ξ)dt+ dWt, ξ0 = 0.
Theorem : If a process is of diffusion type, then
P∫ 1
0
α2t (ξ)dt <∞ = 1
if and only if
µξ << µW
and in such a case
dµξ
dµW
= exp∫ 1
0
αt(ξ) dξt −1
2
∫ 1
0
α2t (ξ)dt a.s. [P ]
Proof: See Liptser and Shiryayev(1977). The proof depends on the Girsanov theorem
stated below.
Theorem : (Girsanov) Let Wt,Ft, P be a standard Wiener process on a probability
space (Ω,F , P ). Let the process Yt,Ft, t ≥ 0 be such that
P∫ 1
0
Y 2t dt <∞ = 1.
Let
ϕ = exp∫ 1
0
YtdWt −1
2
∫ 1
0
Y 2t dt.
If EPϕ = 1, then the process ξt,Ft,∼P, where
ξt = −∫ t
0
Ysds+Wt, 0 ≤ t ≤ 1
and the probability measure∼P is defined by
d∼P
dP= ϕ,
66
is a Wiener process relative to the probability space (Ω,Ft,∼P ).
Heuristics for computation of the Radon-Nikodym derivative for diffusion pro-
cesses
Consider the stochastic differential equation
dXt = a(Xt)dt+ σ(Xt)dWt, 0 ≤ t ≤ 1 underH0(µ0)
and
dXt = σ(Xt)dWt, 0 ≤ t ≤ 1 underH1(µ1).
Let 0 = t0 < t1 < · · · < tn < tn+1 = 1 be a subdivision of [0, 1], and discretize the
above stochastic differential equation. Then
X(tk+1)−X(tk)− a(Xtk)(tk+1 − tk) ≃ N(0, σ2(Xtk)(tk+1 − tk))
and these increments X(tk+1)−X(tk), k = 0, 1, ., n−1 can be considered independent.
Hence the log -likelihood ratio fn can be written in the form
fn = −1
2
n∑k=0
X(tk+1)−X(tk)− a(X(tk))(tk+1 − tk)2
σ2(X(tk))(tk+1 − tk)
+1
2
n∑k=0
(X(tk+1)−X(tk))2
σ2(X(tk))(tk+1 − tk)
=n∑
k=0
X(tk+1)−X(tk)a(X(tk))
σ2(X(tk))
−1
2
n∑k=0
a2(X(tk))(tk+1 − tk)2
σ2(X(tk))(tk+1 − tk)
≃∫ 1
0
a(X(t))
σ2(X(t))dX(t)− 1
2
∫ 1
0
a2(X(t))
σ2(X(t))dt.
anddµ0
dµ1
≃ exp∫ 1
0
a(X(t))
σ2(X(t))dX(t)− 1
2
∫ 1
0
a2(X(t))
σ2(X(t))dt.
67
Lecture 12
Ito’s Lemma: Let F (t, x) be a continuous function on [0, T ] × R with continuous
derivatives ∂F∂t(t, x), ∂F
∂x(t, x), ∂2F
∂x2 (t, x) and Y (t), 0 ≤ t ≤ T be a stochastic process
satisfying the stochastic differential equation (SDE)
dY (t) = a(t) dt+ b(t) dW (t), Y (0) = η, 0 ≤ t ≤ T.
Then the random process Z(t) = F (t, Y (t)) satisfies the SDE
dZ(t) = [∂F
∂t(t, Y (t)) +
∂F
∂y(t, Y (t))a(t) +
1
2
∂2F
∂y2(t, Y (t))b2(t))]dt
+∂F
∂y(t, Y (t))b(t)dW (t), Z(0) = F (0, η), 0 ≤ t ≤ T.
Heuristics : Note that
Z(t+ h)− Z(t) = F (t+ h, Y (t+ h))− F (t, Y (t))
≃ (t+ h− t)∂F
∂t(t, Y (t))
+(Y (t+ h)− Y (t)))∂F
∂y(t, Y (t))
+1
2(t+ h− t)2
∂2F
∂t2(t, Y (t))
+1
2(Y (t+ h)− Y (t))2
∂2F
∂y2(t, Y (t))
+(t+ h− t)(Y (t+ h)− Y (t))∂2F
∂t∂y(t, Y (t))
≃ h∂F
∂t+ (Y (t+ h)− Y (t))
∂F
∂y
+1
2h2∂2F
∂t2+
1
2(Y (t+ h)− Y (t))2
∂2F
∂y2
+h(Y (t+ h)− Y (t))∂2F
∂t∂y.
Note that
Y (t+ h)− Y (t) ≃ a(t)h+ b(t)[W (t+ h)−W (t)].
68
Hence
Z(t+ h)− Z(t) ≃ h∂F
∂t+ a(t)h+ b(t)(W (t+ h)−W (t))∂F
∂y
+1
2h2∂2F
∂t2+
1
2a(t)h+ b(t)(W (t+ h)−W (t))2∂
2F
∂y2
+ha(t)h+ b(t)(W (t+ h)−W (t)) ∂2F
∂t∂y
≃ h∂F∂t
+ a(t)∂F
∂y+ b(t)
W (t+ h)−W (t)
h
∂F
∂y
+1
2h2∂2F
∂t2+
1
2
a2(t)h2 + b2(t)(W (t+ h)−W (t))2
+2a(t)hb(t)W (t+ h)−W (t))
∂2F
∂y2
+ha(t)h+ b(t)(W (t+ h)−W (t)) ∂2F
∂t∂y
≃ h∂F∂t
+ a(t)∂F
∂y+ b(t)
W (t+ h)−W (t)
h
∂F
∂y
+1
2h2∂2F
∂t2+
1
2
a2(t)h2 + b2(t)h+Op(|h|
32 ) ∂2F∂y2
+ha(t)h+ b(t)Op(|h|
12 ) ∂2F
∂t∂y
since E(W (t+ h)−W (t))2 = |h| and E|W (t+ h)−W (t)| ≃ |h| 12 . Hence
Z(t+ h)− Z(t)
h≃ ∂F
∂t+ a(t)
∂F
∂y+ b(t)
W (t+ h)−W (t)
h
∂F
∂y
+1
2b2(t)
∂2F
∂y2+Op(|h|
12 ).
Therefore
dZ(t) ≃[∂F
∂t+ a(t)
∂F
∂y
]dt+ b(t)
∂F
∂ydW (t) +
1
2b2(t)
∂2F
∂y2dt.
We now consider sufficient conditions under which a solution of a SDE is an ergodic
process.
Theorem : (Maruyama and Tanaka (1957)) Consider the SDE
dX(t) = a(X(t))dt+ b(X(t))dW (t), X(0) = X0, t ≥ 0.
Define
ϕ(x) = 2
∫ x
0
a(y)
b2(y)dy.
Suppose
g =
∫ ∞
−∞
eϕ(x)
b2(x)dx <∞.
69
Define µ(x) = 1g
∫ x
−∞eϕ(y)
b2(y)dy,−∞ < x < ∞. Then the process is ergodic with stationary
distribution having distribution function µ(.) and the strong law of large numbers holds,
that is, if f is a function such that∫ ∞
−∞f(x)µ(dx) <∞,
then1
T
∫ T
0
f(X(t))dt→∫ ∞
−∞f(x)µ(dx) a.s as T → ∞.
(For proof, see Gikhman and Skorokhod: Stochastic Differential Equations).
Example : Suppose
dX(t) = −θX(t)dt+ dW (t), X(0) = X0, 0 ≤ t ≤ T
where θ ∈ Θ ⊂ R. Let F (t, x) = eθtx. Then
∂F
∂t= θeθtx,
∂F
∂x= eθt,
∂2F
∂x2= 0.
Hence, by the Ito’s Lemma,
d(F (t,X(t)) = [θeθtX(t) + eθt(−θX(t))]dt+ eθtdW (t)
= eθtdW (t).
Therefore
d(eθtX(t)) = eθtdW (t)
which implies that
eθtX(t)−X(0) =
∫ t
0
eθsdW (s)
or equivalently
(⋆)X(t) =
∫ t
0
e−θ(t−s)dW (s) +X(0)e−θt.
If θ > 0, then the process X(t) is ergodic by the above theorem (Maruyama and
Tanaka, Mem. Fac. Kyushu Uni. 11 (1957) 117-141. Some properties of one-dimensional
diffusion processes) and the ergodic theorem holds i.e., for any measurable function f
integrable with respect to the stationary measure µ,
limT→∞
1
T
∫ T
0
f(X(t))dt =
∫ ∞
−∞f(x)µ(dx) a.s.
70
Suppose the process X(t), 0 ≤ t ≤ T is observed. Let Pθ be the probability measure
generated by the process on C[0, T ] and PW be the measure generated by the Wiener
process. Then
LT (θ) ≡dPθ
dPW
= exp
∫ T
0
−θX(t)dX(t)− 1
2
∫ T
0
θ2X2(t)dt
.
Note that the MLE θT of θ is given by
θT =−∫ T
0X(t)dX(t)∫ T
0X2(t)dt
= −
X2(T )−X2(0)− T
2∫ T
0X2(t)dt
.
(It can be shown that∫ T
0X(t)dX(t) = X2(T )−X2(0)−T
2by applying the Ito’s lemma to the
function F (t, x) = x2 ). Note that
VT (θ) =∂logLT (θ)
∂θ= −
∫ T
0
X(t)dX(t)− θ
∫ T
0
X2(t)dt
= −∫ T
0
X(t)[dX(t) + θX(t) dt]
= −∫ T
0
X(t)dW (t)
is the score function and Vt(θ),Ft, 0 ≤ t ≤ T is a zero mean martingale. Let θ be the
true parameter. Then
θT − θ =−∫ T
0X(t)dX(t)∫ T
0X2(t)dt
− θ
=−∫ T
0X(t)dX(t)− θ
∫ T
0X2(t)dt∫ T
0X2(t)dt
=−∫ T
0X(t)dW (t)∫ T
0X2(t)dt
Hence
θT − θ =VT (θ)
IT (θ)where IT (θ) =
∫ T
0
X2(t)dt.
In other words,
VT (θ) = IT (θ)(θT − θ).
Suppose the process is ergodic ( θ > 0 ). Then
1
T
∫ T
0
X2(t)dta.s→∫ ∞
−∞x2µ(dx) = σ2 (say)
71
where µ is the stationary distribution of the process which is the normal distribution
with mean zero and variance (2θ)−1. Hence, by the CLT for stochastic integrals, it follows
that1√T
∫ T
0
X(t)dW (t)L→ N(0, σ2), σ2 = (2θ)−1
which implies that √T (θT − θ)
L→ N(0, 2θ) as T → ∞.
Suppose θ < 0. Note that
eθtX(t)−X(0),Ft, t ≥ 0
is a zero mean martingale which is L2 -bounded. Hence, by the martingale convergence
theorem, we note that
eθtX(t)−X(0) → Z a.s as t→ ∞
for some random variables Z <∞ a.s. and
e2θtX2(t) → (Z +X(0))2 a.s as t→ ∞.
Apply an integral version of the Toeplitz lemma. We have
(⋆⋆) e2θtIt(θ) = e2θt∫ t
0
X2(s)ds→ − 1
2θ(Z +X0)
2 a.s as t→ ∞.
Hence It(θ) → ∞ a.s as t→ ∞. By the martingale central limit theorem, it follows that
I12T (θ)(θT − θ)
L→ N(0, 1) as T → ∞
Note that
(θT − θ)e−θT = (e2θT∫ T
0
X2(s)ds)−1−eθT∫ T
0
X(s)dW (s).
Check that Z ∼ N(0,− 12θ) from ( ⋆ ) on page 70 and
(−2θ)−1e−θT (θT − θ)L→ N(0, 1) as T → ∞
from ( ⋆⋆ ) since e2θtE(It(θ)) → −12θE(Z +X(0))2 as t→ ∞.
If θ = 0, then (θT − 0) = L(−
∫ T0 W (t)dW (t)∫ T0 W 2(t)dt
)= L
(− W 2(T )−T
2∫ T0 W 2(s)ds
). In this case, the
random variable θT does not have an asymptotically normal distribution.
Remarks on the structure of continuous parameter martingales: Let (Ω,F , P )be a probability space and Ft, t ≥ 0 be a right continuous nondecreasing family of
72
sub σ -algebras of F such that F0 is complete with respect to the probability measure
P. Suppose Vt,Ft, t ≥ 0 is a square integrable martingale with mean zero and that
the process Vt, t ≥ 0 has right continuous sample paths almost surely. Then Vt is
Ft -measurable, E[Vt] = 0, E[V 2t ] <∞ and E[Vt|Fs] = Vs a.s for 0 ≤ s ≤ t. Then it is
known that there exists a right continuous increasing process It, t ≥ 0 such that It is
Ft -measurable and
E[(Vt − Vs)2|Fs) = E(It − Is|Fs) a.s , 0 ≤ s ≤ t (∗)
(cf. Meyer (1962)). The process It, t ≥ 0 is the continuous analogue of the conditional
variance In =∑n
j=1E(X2j |Fj−1) for a discrete parameter square integrable martingale
Sn =∑n
j=1Xj, n ≥ 1. In analogy with the definition of In, one can formally define
It =
∫ t
0
E([dVs]2|Fs)
and this can be used as a check for computing It. Suppose there exists a procee ζt, t ≥ 0such that ζt is Ft -measurable for which
It =
∫ t
0
ζ2udu a.s. (∗∗)
Theorem 1:(SLLN) If Vt,Ft, t ≥ 0 satisfies (*) and the condition (**) holds, then
VtIt
→ 0 a.s. on [It → ∞].
Theorem 2:(Kunita and Watanabe (1967)) If Vt,Ft, t ≥ 0 satisfies (*) and has contin-
uous sample paths almost surely, then there exists a standard Wiener process Wt, t ≥ 0such that
Vt = WIt a.s , t ≥ 0.
Theorem 3: Suppose the conditions stated in Theorems 1 and 2 hold and there exists a
function mt ↑ ∞ as t→ ∞ such that
Itmt
p→ η2
where P (η2 > 0) > 0. Then
VtI−1/2t
L→ N(0, 1)
as t → ∞ and the convergence holds with respect to any probability measure µ on
(Ω,F) which is absolutely continuous with respect to the conditional probability measure
PB(.) = P (.|B) where B = [η2 > 0].
73
Lecture 13
Estimation from discrete sampling
Let us again consider the process
dX(t) = θX(t)dt+ dW (t), t ≥ 0, X0 = 0.
We have seen that the MLE of θ is given by
θT =
∫ T
0X(t)dX(t)∫ T
0X2(t)dt
.
when a continuous sample path X(t), 0 ≤ t ≤ T is available. Suppose the process
is observed at the time points 0 = t0 < t1 < . . . < tN = T say. In order to estimate
the parameter θ, one can either consider the likelihood function of the Markov chain
Xti , 0 ≤ i ≤ N and then estimate by the maximum likelihood method provided the
transition function can be explicitly computed or discretize the likelihood estimator from
the continuous sample version or apply other methods of estimation such as conditional
least squares.
Le Breton (1976)
Suppose we approximate ∫ T
0
X(t)dX(t)
byN∑i=0
(X(ti)−X(ti−1))X(ti−1)
and ∫ T
0
X2(t)dt
byN∑i=1
X2(ti+1)(ti − ti−1).
Then the estimate θT can be approximated by
∼θNT=
∑Ni=1(X(ti)−X(ti−1))X(ti−1)∑N
i=1X2(ti)(ti − ti−1)
.
Let
δN = max1≤i≤N
|ti − ti−1|.
Theorem : (Le Breton (1975)) Suppose δN → 0 as N → ∞. Then
74
(i)∼θN,T
p→ θT as N → ∞,
(ii) δ− 1
2N (
∼θN,T −θT ) = Op(1).
In general, suppose we consider the SDE
dXt = a(θ,Xt)dt+ dWt, X0 = x0, t ≥ 0.
ConsiderN∑i=1
[X(ti)−X(ti−1)− a(θ,X(ti−1))(ti − ti−1)]2
and choose θ minimizing this expression. It can be shown that the estimator so obtained
is consistent ifT
N→ 0 asN → ∞ (Dorgovcev (1976))
and asymptotically normal if
T√N
→ 0 asN → ∞ (Prakasa Rao (1983)).
Kasonga (1988) suggested the following approach. Let Uk(θ, t) be the solution of the
ordinary differential equation
dxtdt
= a(θ, xt) on [tk−1, tk], xtk−1= Xtk−1
.
Let Q(θ) =∑N
k=1 |X(tk)− Uk(θ, tk)2| with
Uk(θ, t) = X(tk−1) +
∫ t
tk−1
a(θ, Uk(θ, s))ds for tk−1 ≤ t ≤ tk
Choose θ to minimize Q(θ). Let θ⋆N,T be such an estimator.
Theorem :(Kasonga (1988)) Suppose that for every θ1 = θ2,
p− limN→∞
1
N
N∑k=1
|Uk(θ1, tk)− Uk(θ2, tk)|2 > 0
and δN = max1≤i≤N
|ti − ti−1| → 0 as N → ∞. Then θ⋆N,T
p→ θ as N → ∞ and T → ∞when θ is the true parameter.
Remarks :Consider the SDE
dXt = θXtdt+ σdWt, t ≥ 0, X0 = 0.
75
It is known that
limN→∞
2N∑i=1
[W (it
2N)−W (
(i− 1)t
2N)]2 = t a.s. (Doob (1953), p.395).
Applying this result, it can be shown that
limN→∞
2N∑i=1
[X(it
2N)−X(
(i− 1)t
2N)]2 = σ2t a.s. (Basawa and Prakasa Rao (1980), p.242).
Parametric estimation for linear SDE
Consider the SDE
dX t = θX tdt+ G dW t, t ≥ 0, X 0 = 0
where the process X t is an n -dimensional vector-valued process, θ ∈ Θ,Θ is a subset
of the space of square matrices of order n×n , G ∈ ζ where ζ is a subset of the space
of nonsingular matrices of order n×n and W t, t ≥ 0 is an n -dimensional stochastic
process with independent standard Wiener processes as its components. Let µT
θ, G be the
probability measure induced by the process X t, 0 ≤ t ≤ T on the space C([0, T ], Rn),
the space of continuous functions from [0, T ] to Rn. Using Girsanov’s theorem, it can
be shown that
dµT
θ, GdµT
0, G= exp
∫ T
0
< θX t, (G G ′)−1dX t > −1
2
∫ T
0
< θX t, (G G ′)−1θX t > dt
where < ., . > denotes the inner product in Rn and M ′ denotes the transpose of the
matrix M . Maximization of the Radon-Nikodym derivative given above with respect
to the parameter θ leads to a system of linear equations which can be solved to obtain
the MLE θTG . Furthermore, for any (θ, G ) ∈ Θ× ζ and for 0 ≤ t ≤ T,
limN→∞
2N∑i=1
(X it2−N − X (i−1)t2−N )(X it2−N − X (i−1)t2−N )′ = G G ′ a.s.
(Ref: Basawa and Prakasa Rao (1980), p.212).
Remarks : If the true value θ0 of the parameter θ is a real stable matrix, that is, the
eigenvalues of the matrix θ have negative real parts, then the MLE θTG is consistent
and asymptotically normal. In fact,
T 1/2(θTG − θ0)L→ K θ0 as T → ∞
76
where K θ0 = ((Kθ0ij )) is a Gaussian matrix with mean zero and covariance given by
Eθ0(Kθ0ij K
θ0kl ) = ( G G ′)ik(Q
−1θ0)jl
and Q theta0 is a positive definite matrix satisfying the relation
θ0 Q θ0 + Q θ0θ′ = −G G ′.
Remarks : Most of the results discussed above can be extended to stochastic differential
equations of the type
dX t = θA(t, x )dt+ G dW t, t ≥ 0, X 0 = 0.
Sequential estimation for linear SDE
Consider a SDE of the form
dξ(t) = λ A(t, ξ)dt+ dWt, t ≥ 0, ξ0 = 0,
where the unknown parameter is λ,−∞ < λ < ∞ and A(t, ξ) is Ft -measurable for
every t ≥ 0. Further suppose that, for every x(.) ∈ C[0,∞), x(0) = 0, there exists
ϵ = ϵ(x) > 0 such that ∫ ϵ(x)
0
A2(t, ξ)dt <∞
and for every λ and for every t ≥ 0,
Pλ∫ t
0
A2(t, ξ)dt <∞ = 1.(∗)
Here Pλ is the probability measure generated by the process ξt, t ≥ 0 when λ is the
true parameter. The measures Pλ and P0 are equivalent under the condition (*). Note
that the probability measure P0 is the Wiener measure. Let P tλ denote the probability
measure generated by the process ξ(u), 0 ≤ u ≤ t over the space C[0, t]. Observe that
dP tλ
dP t0
= expλ∫ t
0
A(s, ξ)dξs −1
2λ2∫ t
0
A2(s, ξ)ds.
It is now easy to check that the MLE of the parameter λ, given the observation ξ(s), 0 ≤s ≤ T, is
λT (ξ) =
∫ T
0A(s, ξ)dξs∫ T
0A2(s, ξ)ds
.
77
Observe that
Eλ[λT (ξ)] = Eλ[
∫ T
0A(s, ξ)dξs∫ T
0A2(s, ξ)ds
]
= Eλ[λ∫ T
0A2(s, ξ)ds+
∫ T
0A(s, ξ)dWs∫ T
0A2(s, ξ)ds
]
= λ+ Eλ[
∫ T
0A(s, ξ)dWs∫ T
0A2(s, ξ)ds
].
Suppose that
Pλ∫ ∞
0
A2(s, ξ)ds = ∞ = 1,−∞ < λ <∞.
For any H ≥ 0, define
τ(H) = inft ≥ 0 :
∫ t
0
A2(s, ξ)ds = H.
Define
λ(H) = λτ(H) =
∫ τ(H)
0A(s, ξ)dξs∫ τ(H)
0A2(s, ξ)ds
=1
H
∫ τ(H)
0
A(s, ξ)dξs.
The estimator λ(H) is called a sequential maximum likelihood estimator of the parameter
λ. Note that
λ(H) =1
H
∫ τ(H)
0
A(s, ξ)dξs
=1
Hλ∫ τ(H)
0
A2(s, ξ)ds+
∫ τ(H)
0
A(s, ξ)dWs
= λ+1
H
∫ τ(H)
0
A(s, ξ)dWs.
Hence the distribution of the estimator λ(H) is N(λ, 1H) from the properties of stochas-
tic integrals with respect to a standard Wiener process.
Cramer-Rao inequality
Let us consider a sequential plan (τ(ξ), λτ (ξ)) for estimating a function h(λ) such
that
Eλ[λτ (ξ)] = h(λ).
78
Note that τ(ξ) is the stopping time of the sequential plan (τ(ξ), λτ (ξ)) . Suppose that
h(λ) is differentiable and that differentiation with respect to λ under the expectation
operator is permissible in the above equation. Further suppose that
Eλ∫ τ(ξ)
0
A2(t, ξ)dt <∞.
Theorem:(Cramer-Rao inequality) Under the conditions stated above
V arλ(λτ (ξ)) ≥[h′(λ)]2
Eλ∫ τ(ξ)
0A2(t, ξ)dt
.
Proof: Let Pλ be the probability measure generated by the process ξ(s), 0 ≤ s ≤ tcorresponding to the parameter λ and P
τ(ξ)λ be the probabilty measure generated by
the process ξ(t), 0 ≤ t ≤ τ(ξ). Applying Sudakov’s lemma (cf. Basawa and Prakasa
Rao (1980)), it can be shown that
dPτ(ξ)λ
dPτ(ξ)λ0
exists and
dPτ(ξ)λ
dPτ(ξ)λ0
= exp(λ− λ0)
∫ τ(ξ)
0
A(t, ξ)dξ(t)− 1
2(λ2 − λ20)
∫ τ(ξ)
0
A2(t, ξ)dt.
Note that Eλ[λτ (ξ)] = h(λ) and hence∫λτ (ξ)dPλ = h(λ)
which can also be written in the form∫λτ (ξ)
dPλ
dPλ0
dPλ0 = h(λ).
Differentiating under the integral sign with respect to λ , we get that∫λτ (ξ)
d
dλ(dPλ
dPλ0
)dPλ0 = h′(λ).
Hence ∫λτ (ξ)(
dPλ
dPλ0
)(
∫ τ(ξ)
0
A(t, ξ)dξ(t)− λ
∫ τ(ξ)
0
A2(t, ξ)dt)dPλ0 = h′(λ).
Therefore
Eλ[λτ (ξ)(
∫ τ(ξ)
0
A(t, ξ)dξ(t)− λ
∫ τ(ξ)
0
A2(t, ξ)dt)] = h′(λ).
79
Observe that ∫ T
0
A(t, ξ)dξ(t) = λ
∫ T
0
A2(t, ξ)dt+
∫ T
0
A(t, ξ)dWt
and hence
Eλ[
∫ τ(ξ)
0
A(t, ξ)dξ(t)− λ
∫ τ(ξ)
0
A2(t, ξ)dt] = Eλ[
∫ τ(ξ)
0
A(t, ξ)dWt] = 0.
The above relations imply that
Eλ[(λτ (ξ)− h(λ))(
∫ τ(ξ)
0
A(t, ξ)dWt)] = h′(λ).
Applying the Cauchy-Schwarz inequality, we have
[h′(λ]2 ≤ V ar(λτ (ξ))Eλ[
∫ τ(ξ)
0
A(t, ξ)dWt]2
= V ar(λτ (ξ))Eλ[
∫ τ(ξ)
0
A2(t, ξ)dt].
Hence
V ar(λτ (ξ)) ≥[h′(λ]2
Eλ[∫ τ(ξ)
0A2(t, ξ)dt]
.
In particular, if h(λ) ≡ λ , then
V ar(λτ (ξ)) ≥1
Eλ[∫ τ(ξ)
0A2(t, ξ)dt]
.
Definition: A sequential plan (τ(ξ), λτ (ξ)) is said to be efficient if the variance of the
corresponding estimator λτ (ξ) attains the Cramer-Rao lower bound.
Observe that, for the sequential plan defined by the stopping time τ(H) ,
V ar(λτ(H)) =1
H
which is the Cramer-Rao lowerbound for the variance of unbiased estimators of λ. Hence
the estimator λτ(H) is an efficient estimator for estimating the parameter λ.
MLE for the drift parameter for the diffusion process
Suppose (Ω,F , P ) is a probability space and Xt, t ≥ 0 be a stochastic process
defined on it satisfying the SDE
dXt = a(t,Xt, θ)dt+ dWt, X0 = 0, t ≥ 0, θ ∈ Θ ⊂ R.
80
The problem is to estimate the parameter θ based on the observation Xs, 0 ≤ s ≤T. We assume that (A0)Pθ1 = Pθ2 (Identifiability condition) whenever θ1 = θ2 ∈ Θ,
and (A1)Pθ(∫ T
0a2(t,Xt, θ)dt <∞) = 1, θ ∈ Θ, T ≥ 0.
Let P Tθ denote the probability measure generated by the process Xs, 0 ≤ s ≤
T and P TW denote the probability measure generated by the standard Wiener process
Ws, 0 ≤ s ≤ T. Then
dP Tθ
dP TW
= exp∫ T
0
a(t,Xt, θ)dXt −1
2
∫ T
0
a2(t,Xt, θ)dt.
A maximum likelihood estimator (MLE) θT (XT ) maximizes the likelihood function
LT (θ) =dPT
θ
dPTW. If Θ is compact and LT (θ) is continous in θ, then there exists measurable
MLE (cf. Schemetterer (1974), Prakasa Rao (1987)). We assume the existence of a
measurable MLE in the following discussion. Let
F (t, x, θ) =
∫ x
0
a(t, y, θ)dy. (1)
(A2) (i)Suppose the function a(t, x, θ) is continuous in x and the function F (t, x, θ) is
jointly continuous in (t, x) with partial derivatives Fx, Ft, and Fxx.
Observe that Fx = a and Fxx = ax. Applying the Ito’s lemma, we have
dF (t,Xt, θ) = [Ft(t,Xt, θ) +1
2ax(t,Xt, θ)]dt+ a(t,Xt, θ)dXt.
Hence ∫ T
0
a(t,Xt, θ)dXt = F (T,XT , θ)−∫ T
0
f(t,Xt, θ)dt
where
f(t, x, θ) = Ft(t, x, θ) +1
2ax(t, x, θ).
Therefore
ℓT (θ) = logLT (θ) = F (t,Xt, θ)−∫ T
0
[f(t,Xt, θ) +1
2a2(t,Xt, θ)]dt. (2)
(A2) (ii) Suppose the function ℓT (θ) = logLT (θ) is twice differentiable in θ.
Observe that
ℓ′T (θ) =
∫ T
0
a′(t,Xt, θ)(dXt − a(t,Xt, θ)dt)
=
∫ T
0
a′(t,Xt, θ)dWθt
where
W θt = Xt −
∫ t
0
a(s,Xs, θ)ds
81
is a Wiener process under the parameter θ. . Similarly
ℓ′′T (θ) =
∫ T
0
a′′dXt −∫ T
0
(aa′′ + (a′)2)dt
=
∫ T
0
a′′(dXt − adt)−∫ T
0
(a′)2dt
=
∫ T
0
a′′dW θt −
∫ T
0
(a′)2dt.
(A2) (iii)Suppose that the function ℓ′′T (θ) is continuous in a neighbourhood Vθ of θ for
every θ ∈ Θ and
Eθ[
∫ T
0
(a′(t,Xt, θ))2dt] <∞, Eθ[
∫ T
0
(a′′(t,Xt, θ))2dt] <∞.
Further suppose that
(A3) for every θ, there exists a neighbourhood Vθ of θ in Θ such that
Pθ(
∫ ∞
0
(a(t,Xt, θ′)− a(t,Xt, θ)
2dt = ∞) = 1
for every θ′ ∈ Vθ − θ.Let
IT (θ) =
∫ T
0
(a′(t,Xt, θ))2dt
and
YT (θ) =
∫ T
0
(a′′(t,Xt, θ))2dt.
(A4) Suppose that there exista a function mt ↑ ∞ such that
IT (θ)
mT
p→ η2(θ)
andYT (θ)
mT
p→ ζ2(θ)
under Pθ -measure as T → ∞ where Pθ(η2(θ) > 0) > 0.
Theorem : Under the conditions (A0) − (A4) stated above, there exists a solution of
the likelihood equation ℓ′T (θ) = 0 which is strongly consistent as T → ∞. Furthermore
(IT (θ))1/2(θT − θ)
L→ N(0, 1)
as T → ∞ conditionally with respect to any probability measure µ << PAθ where
PAθ (.) = Pθ(.|A) and A = [η2(θ) > 0].
82
Proof : For detailed proof, see Prakasa Rao (1999),p.16. We sketch it. Let δ > 0 such
that θ and θ + δ belong to Θ. Then
ℓT (θ + δ)− ℓT (θ) = [
∫ T
0
a(t,Xt, θ + δ)dXt −1
2
∫ T
0
a2(t,Xt, θ + δ)dt]
−[
∫ T
0
a(t,Xt, θ)dXt −1
2
∫ T
0
a2(t,Xt, θ)dt]
=
∫ T
0
[a(t,Xt, θ + δ)− a(t,Xt, θ)]dXt
−1
2
∫ T
0
[a2(t,Xt, θ + δ)− a2(t,Xt, θ)]dt
=
∫ T
0
Aθ+δt dXt −
1
2
∫ T
0
[a2(t,Xt, θ + δ)− a2(t,Xt, θ)]dt
where
Aθ+δt = a(t,Xt, θ + δ)− a(t,Xt, θ).
It is easy to check that
ℓT (θ + δ)− ℓT (θ) =
∫ T
0
Aθ+δt dW θ
t − 1
2
∫ T
0
(Aθ+δt )2dt.
Let
KT =
∫ T
0
(Aθ+δt )2dt.
ThenℓT (θ + δ)− ℓT (θ)
KT
=
∫ T
0Aθ+δ
t dW θT∫ T
0(Aθ+δ
t )2dt− 1
2.
Applying Lepingle’s strong law of large numbers (cf. Prakasa Rao(1999)), it follows that∫ T
0Aθ+δ
t dW θT∫ T
0(Aθ+δ
t )2dt
a.s.→ 0
as T → ∞ since ∫ T
0
(Aθ+δt )2dt
a.s.→ ∞
as T → ∞ by the condition ( A3 ). Hence, for every θ and δ and for almost every
ω ∈ Ω, , there exists T0 depending on θ, δ and ω such that for every T ≥ T0,
ℓT (θ + δ) < ℓT (θ). (3)
Similarly we can show that
ℓT (θ − δ) < ℓT (θ). (4)
83
for sufficiently large T. Since the function ℓT (θ) is continuous on the closed interval
[θ− δ, θ+ δ], it has a local maximum and the maximum is attained at some point θT in
the open interval (θ−δ, θ+δ) in view of inequalities (3) and (4). Furthermore ℓ′T (θT ) = 0.
This proves that
θTa.s.→ θ
as T → ∞ under Pθ -measure. This proves the existence and strong consistency of a
maximum likelihood estimator.
Applying Taylor’s expansion to the function ℓ′T (θ) at θT , , we get that
ℓ′T (θ) = ℓ′T (θT ) + (θ − θT )ℓ′′T (θ
∗T )
where |θ∗T − θ| ≤ |θT − θ|. Hence
ℓ′T (θ)√IT (θ)
=(θ − θT )ℓ
′′T (θ
∗T )√
IT (θ)
≃ (θ − θT )ℓ′′T (θ)√
IT (θ)
as T → ∞ since θ∗Ta.s.→ θ and IT (θ)
a.s→ ∞ as T → ∞ and ℓ′′T (θ) is continuous. Let
FT be the sub- σ -algebra generated by the process Xs, 0 ≤ s ≤ T. Note that the
process ℓ′T (θ),FT , T ≥ 0 is a martingale and, by the earlier remarks,
ℓ′T (θ)√IT (θ)
L→ N(0, 1)
as T → ∞ under PAθ -measure. Hence
(θ − θT )ℓ′′T (θ)√
IT (θ)
L→ N(0, 1).
Observe that
ℓ′′T (θ)
IT (θ)=
∫ T
0a′′(t,Xt, θ)dW
θt − 1
2
∫ T
0(a′(t,Xt, θ))
2dt
IT (θ)
a.s→ −1
as T → ∞ under PAθ -measure (cf. Feigin (1976)). In particular,√
IT (θ)(θT − θ)L→ N(0, 1)
as T → ∞ under PAθ -measure. This result proves the asymptotic normality of the MLE
under random norming.
84
Example : Consider the SDE
dXt = θtXtdt+ dWt, X0 = 0, t ≥ 0.
Check that
Xt = eθt2/2
∫ t
0
e−θs2/2dWs, t ≥ 0
and the MLE is strongly consistent and asymptotically normal after random normaliza-
tion.
Remarks: For the vector parameter case, see Prakasa Rao(1999), p. 20.
In order to find ”efficient” estimators as in the classical problems of estimation in the
finite dimensional case, we now obtain analogue of Cramer-Rao lower bound and discuss
the concept of locally asymptotically normal (LAN) families of distributions.
Cramer-Rao lower bound
Consider the SDE
dX(t) = a(θ, t,X)dt+ dWt, X(0) = X0, t ≥ 0, θ ∈ Θ ⊂ R.
Suppose that
Pθ(
∫ T
0
a2(θ, t,X)dt <∞) = 1
and a(θ, t, x) is differentiable with respect to θ. Let
I(θ1, θ2) = Eθ1 [
∫ T
0
a2θ(θ2, t, X)dt].
Here aθ denotes denote the partial derivative of the function a(θ, t, x) with respect to
θ.
Theorem: Suppose that I(θ1, θ2) > 0 for all θ1, θ2 ∈ Θ and I(θ, θ) is continuous in
θ. . Let θ∗T be any estimator of the parameter θ, based on the observation X T =
Xs, 0 ≤ s ≤ T, such that Eθ(θ∗T − θ)2 is bounded over compact subsets of Θ. . Let
b(θ) = Eθ(θ∗T − θ). Then b(θ) is differentiable almost everywhere and
Eθ(θ∗T − θ)2 ≥ (1 + b′(θ))2
I(θ, θ)+ b2(θ)
where b′(θ) denotes the derivative of b(θ) with respect to θ whenever it exists.
For proof, see Prakasa Rao (1999), p. 28.
Local Asymptotic Normality (LAN):
85
Let (Ω,F , P ) be a probability space and for ϵ ∈ (0, 1], let F (ϵ) = F (ϵ)t , 0 ≤ t ≤ 1
be a fitration, that is, a family of nondecreasing family of sub σ -algebras contained in F .Let X ϵ = Xϵ(t), 0 ≤ t ≤ Tϵ be a diffusion process satisfying the SDE
dXϵ(t) = aϵ(θ, t, X ϵ)dt+ dWϵ(t), Xϵ(0) = ηϵ, 0 ≤ t ≤ Tϵ
where ηϵ is an F (ϵ)0 -measurable random variable and θ ∈ Θ open in R. Let P
(ϵ)θ be
the probability measure generated by the process X ϵ. Suppose that
P(ϵ)θ (
∫ Tϵ
0
a2ϵ(θ, t,Xϵ)dt <∞) = 1, θ ∈ Θ, 0 < ϵ ≤ 1.
Let θ0 ∈ Θ. Suppose further ϕϵ(θ0) → 0 as ϵ → 0. It can be shown that the
measures P(ϵ)θ0+ϕϵ(θ0)u
and P(ϵ)θ0
are absolutely continous with respect to each other in a
neighbourhood of θ0. Let
Zϵ(u) =dP
(ϵ)θ0+ϕϵ(θ0)u
dP(ϵ)θ0
(X ϵ).
Definition : A family of probability measures P (ϵ)θ , θ ∈ Θ is said to be locally asymp-
totically normal (LAN) at θ0 ∈ Θ if
logZϵ(u) = u∆ϵ(θ0, X ϵ)−1
2u2 + ψϵ(θ0, u, X ϵ)
where
∆ϵ(θ0, X ϵ)L→ N(0, 1)
and
ψϵ(θ0, u, X ϵ)p→ 0
as ϵ→ 0 under P(ϵ)θ0
-measure.
Remarks : The function ϕϵ(θ0) is called the normalization. Local asymptotic normality
of the family of probability measures P (ϵ)θ , θ ∈ Θ implies that the likelihood ratio
process
dP(ϵ)θ
dP(ϵ)θ0
(X ϵ)
has the properties of the process
Z(u) = expuζ − 1
2u2,−∞ < u <∞
where ζ is N(0,1) whenever θ is close to θ0 and for ϵ small. Typically, the normaliza-
tion ϕϵ(θ) = (Iϵ(θ))−1/2 where Iϵ(θ) is the Fisher information. Under some conditions, it
86
can be shown that the family of probability measures P (ϵ)θ , θ ∈ Θ is LAN (cf. Theorem
2.2.17, p.32, Prakasa Rao (1999)).
Hajek-Lecam inequality
Suppose the family of probability measures P (ϵ)θ , θ ∈ Θ is LAN with normalizing
function ϕϵ(θ). Let ℓ(.) be a symmetric function, continuous at zero, such that the
set x : ℓ(x) < c is convex for all c > 0. Further suppose that for any h > 0,
ℓ(x) < ehx2
for |x| large. Then, for every γ ∈ (0, 1),
lim infϵ→0
infθ∗ϵ
sup|θ−y|<ϕϵγ(θ)Ey[ℓ(
θ∗ϵ − y
ϕϵ(θ))] ≥ E[ℓ(ξ)] (∗)
where the random variable ξ has the standard normal distribution.
For a proof of this result, see Kutoyants (1984). If ℓ(x) = x2, then the inequality (*)
reduces to
lim infϵ→0
infθ∗ϵ
sup|θ−y|<ϕϵγ(θ)Ey[
θ∗ϵ − y
ϕϵ(θ)]2 ≥ 1.
Definition: An estimator θ∗ϵ is said to be asymptotically efficient if
limϵ→0sup|θ−y|<ϕϵγ(θ)Ey[
θ∗ϵ − y
ϕϵ(θ)]2 = 1.
Example : Consider the SDE
dX(t) = −θX(t)dt+ dWt, X(0) = 0, θ ∈ (α, β), α > 0.
Then the family of probability measures P Tθ , θ ∈ Θ is LAN with the normalizing
function ϕT (θ) =√2θT−1/2 as T → ∞. If θ ∈ (α, β), β < 0, then the family of
probability measures P Tθ , θ ∈ Θ is LAN as T → ∞ with the normalizing function
ϕT (θ) = 2θeθT .
Parametric estimation for diffusion type processes from sampled data
Consider the SDE
dXt = a(Xt, θ)dt+ σ(Xt)dWt, t ≥ 0.
We now describe some methods of estimation of the parameter when the process Xt
is sampled at discret time points at equal time intervals. For detailed exposition, see
Prakasa Rao (1999).
87
Estimation based on discretization by the Euler method
Suppose the drift and diffusion are constant over the interval [t, t+∆t). Then
Xt+∆t −Xt = a(Xt, θ)∆t+ σ(Xt)(Wt+∆t −Wt).
This discretized process is considered as a local approximation to the original process.
Note that
σ(Xt)(Wt+∆t −Wt)
has normal distribution with mean zero and variance σ2(Xt)∆t and the transition density
function of the discretized process is
p(Xt+∆t|Xt = xt) = (2πσ2(xt)∆t)−1/2 exp−(Xt+∆t − xt − a(xt, θ)∆t)
2
2σ2(xt)∆t.
Suppose we observe the process Xt, t ≥ 0 at the points t+ i∆t, 0 ≤ i ≤ n. Let Zi =
Xt+i∆t. Then the joint probability density function of the random vector (Z0, . . . , ZN)
is
p(z0, . . . , zN) = ΠNi=1p(zi|zi−1)p(z0)
and the parameters θ, σ can be estimated by the method of maximum likelihood.
Estimation based on local linearization method of Shoji-Ozaki
Consider the SDE
dxt = a(Xt)dt+ σdWt, t ≥ 0.
Suppose the diffusion parameter σ is a constant and the drift function a(.) is possibly
nonlinear and differentiable. We try to approximate the above SDE by a linear SDE.
Consider the ordinary differential equation
dxtdt
= a(xt).
Suppose the function xt is differentiable twice with respect to t. Then
d2xtdt2
= a′(xt)dxtdt.
Suppose a′(x) is constant over the interval [t, t+∆t). Let u ∈ [t, t+∆t). . Then
dxtdt
|t=u =dxtdtea
′(xt)(u−t)
and
xt+∆t = xt +a(xt)
a′(xt)[ea
′(xt)∆t − 1].
88
Suppose we approximate the drift function a(x) by a linear function Lx on [t, t+∆t).
Then we have the SDE
dXt = LXtdt+ σdWt
where L is a constant on the interval [t, t+∆t). Apllying Ito’s lemma, we get that
Xt+∆t = XteL∆t + σ
∫ t+∆t
t
eL(t+∆t−u)dWu. (∗)
Let us choose L such that the conditional mean E[Xt+∆t|Xt] coincides with the mean
of the process given by (*). Hence
XteL∆t = Xt +
a(Xt)
a′(Xt)[ea
′(Xt)∆t − 1]
or
L =1
∆tlog[1 +
a(Xt)
Xta′(Xt)(ea
′(Xt)∆t − 1)].
Observe that the constant L depends on t. Denote it by Lt. The discretized process
by the local linearization method is as follows:
Xt+∆t = XteLt∆t + σ
∫ t+∆t
t
eLt(t+∆t−u)dWu.
Since the random variable ∫ t+∆t
t
eLt(t+∆t−u)dWu
has the normal distribution with mean zero and variance e2Lt∆t−12Lt
, we can now write the
transition density function of the discretized observations Yi = Xt+i∆t given Yi−1 for
0 ≤ i ≤ N and compute the likelihood function. MLE of θ and σ can now be obtained.
Estimation via martingale estimating functions
Consider the SDE
dXt = a(Xt, θ)dt+ σ(Xt, θ)dWt, X(0) = X0, t ≥ 0, θ ∈ Θ ⊂ R.
(i)Let us first consider the case when the diffusion function σ(x, θ) does not depend
on the parameter θ. This is the case in all the earlier discussions on estimation of the drift
parameter θ. If the process Xs, 0 ≤ s ≤ t is observed completely, then the likelihood
function Lt(θ) based on the observations is
ℓt(θ) = logLt(θ)
=
∫ t
0
a(Xs, θ)
σ2(Xs)dXs −
1
2
∫ t
0
a2(Xs, θ)
σ2(Xs)ds.
89
Suppose now the process is observed at times i∆, 0 ≤ i ≤ n. We approximate the integrals
in the above expression by Riemann-type sums to obtain an approximate log-likelihood
function. It is given by
ℓn(θ) =n∑
i=1
a(X(i−1)∆, θ)
σ2(X(i−1)∆)(Xi∆ −X(i−1)∆)−
1
2
n∑i=1
a2(X(i−1)∆, θ)
σ2(X(i−1)∆)∆.
Suppose the function a(x, θ) is differentiable with respect to θ. Then
ℓ′n(θ) =n∑
i=1
a′(X(i−1)∆, θ)
σ2(X(i−1)∆)(Xi∆ −X(i−1)∆)−∆
n∑i=1
a(X(i−1)∆, θ)
σ2(X(i−1)∆)a′(X(i−1)∆, θ).
The process ℓ′n(θ) is a zero mean martingale with respect to the fitration Fi with
Fi generated by the set X0, X∆, . . . , Xi∆. Solving the equation
ℓ′n(θ) = 0,
which is called a martingale estimating equation, we can estimate the parameter θ.
Let us now consider the case when the diffusion σ(x, θ) depends on θ. Let us consider
analogue of the function ℓ′n(θ) given by
Jn(θ) =n∑
i=1
a′(X(i−1)∆, θ)
σ2(X(i−1)∆, θ)(Xi∆ −X(i−1)∆)−∆
n∑i=1
a(X(i−1)∆, θ)
σ2(Xi−1)∆, θ)a′(X(i−1)∆, θ).
This process is not a martingale with respect to the filtration Fi. Define
Gn(θ) = Jn(θ)−n∑
i=1
Eθ[Ji(θ)− Ji−1(θ)|Fi−1].
The process Gn(θ) is a zero mean martingale with respect to the fitration Fi with
Fi generated by the set X0, X∆, . . . , Xi∆. Solving the martingale estimating equation
Gn(θ) = 0
, we can estimate the parameter θ whether the function σ is a function of θ or otherwise.
90
The following list of references contains bibliographic details of some books and some
important review papers published in the area of “Statistical Inference for Stochastic
Processes” but are not cited in the text.
References:
Aalen, O.O. (1975) Statistical Inference for a family of Counting Processes, Ph.D. Thesis,
University of California, Berkeley.
Aubry, C. (1997) Estimation parametrique par la methode de la distance minimale pour
des processus de Poisson et de diffusion. Ph.D. Thesis, Universite du Maine, Le
Mans.
Andersen, P.K., Borgan, Φ. , Gill, R,D. and Keiding, N. (1993) Statistical Methods for
Counting Processes, Springer, New York.
Arato, M. (1982)Linear Stochastic Systems with Constant Coefficients; a Statistical Ap-
proach: Lecture Notes in Control and Information Sciences, 45, Springer, Berlin.
Bar-Shalom, Y. (1971) On the asymptotic properties of maximum likelihood estimate
obtained from dependent observations, J. Roy. Statist. Soc. Ser. B, 33, 72-77.
Basawa, I.V. and Prabhu, N.U. (1994) Statistical Inference in Stochastic Processes, Spe-
cial Issue of Journal of Statistical Planning and Inference, 39, No. 2, pp. 135-352.
Basawa, I.V. and Prakasa Rao, B.L.S. (1980) ”Statistical Inference for Stochastic Pro-
cesses”, Academic Press, London.
Basawa, I.V. and Prakasa Rao, B.L.S. (1980) Asymptotic inference for stochastic pro-
cesses, Stoch. Proc. Appl., 10, 221-254.
Basawa, I.V. and Scott, D.J. (1983) Asymptotic Optimal Inference for Non-ergodic Mod-
els, Lecture Notes in Statistics, 17, Springer, Heidelberg.
Baxter, G. (1956) A strong limit theorem for Gaussian processes, Proc. Amer. Math.
Soc., 7, 522-527.
Bhat, B.R. (1974) On the method of maximum likelihood for dependent observations,
J. Roy. Statist. Soc. Ser. B, 36, 48-53.
Bhat, B.R. (1996) Tests based on estimating functions, In Stochastic Processes and
Statistical Inference, Ed. B.L.S. Prakasa Rao and B.R. Bhat, New Age International,
New Delhi, pp. 20-38.
91
Billingsley, P. (1961) Statistical Inference for Markov processes, University of Chicago
Press, Chicago.
Bishwal, J.P.N. (2000) Asymptotic Theory of Estimation of the Drift Parameter in Dif-
fusion Processes, Ph.D. Thesis, Sambalpur University, Sambalpur, India.
Borwanker, J.D. Kallianpur, G. and Prakasa Rao, B.L.S. (1971) The Bernstein-von Mises
theorem for Markov Processes, Ann. Math. Statist., 42, 1241-1253.
Bose, A. and Politis, D.N. (1996) A review of the bootstrap for dependent samples, In
Stochastic Processes and Statistical Inference, Ed. B.L.S. Prakasa Rao and B.R.
Bhat, New Age International, New Delhi, pp. 73-89.
Brody, E. (1971) An elementary proof of the Gaussian dichotomy theorem, Z. Wahrs. ,
20, 217-226.
Brown, B.M. (1971) Martingale central limit theorems, Ann. Math. Statist., 42, 59-66.
Bosq, D. (1998) Nonparametric Statistics for Stochastic Processes, Lecture Notes in
Statistics, 110, Springer, New York.
Bosq, D. (2012) Statistique Mathematique et Statistique des Processus, hermes-science
publications, Lavoisier, Cachan.
Brillinger, D. (1975) Statistical inference for stationary point processes, In Stochastic
Processes and Related Topics, Ed. M.L. Puri, Academic Press, New York, pp. 55-
99.
Cox, D.R. and Lewis, P.A.W. (1966) The Statistical Analysis of Series of Events, Methuen
and Barnes and Nobel, New York.
Cressie, N. (1991) Statistics of Spatial Data, Wiley, New York.
Dalalyan, A. (2001) Estimation non-parametrique pour les processus de diffusion er-
godiques, Ph.D. Thesis, Universite du Maine, Le Mans.
Dewan, Isha and Prakasa Rao, B.L.S. (2001) Associated sequences and related inference
problems, In Handbook of Statistics: Stochastic Processes;Theory and Methods, 19,
Ed. D.N.Shanbhag and C.R.Rao , Elsevier Science B.V., Amsterdam, pp. 693-731.
Dion, J.-P. (1974) Estimation des probabilites initiales et de la moyenne d’un processus
de Galton-Watson, Ph.D. Thesis, Universite de Montreal, Montreal.
92
Dion, J.-P. and Keiding, N. (1978) Statistical inference in branching processes, In Branch-
ing Processes, Ed. A. Joffe and P. Ney, Marcel Dekker, New York, pp. 105-140.
Doob, J.L. (1953) Stochastic Processes, Wiley, New York.
Dorogovcev, A. Ja. (1976) The consistency of an estimate of a parameter of stochastic
differential equation, Theory Prob. Math. Statist., 10, 73-82.
Foutz, R. (1974) Studies in Large sample Theory, Ph.D. Thesis, Ohio State University,
Columbus.
Feigin, P.D. (1975) Maximum likelihood estimation for continuous time stochastic pro-
cesses - A Martingale approach, Ph.D. Thesis, Australian National University, Can-
berra.
Feigin, P.D. (1976) Maximum likelihood estimation for continuous time stochastic pro-
cesses, Adv. Appl. Prob., 8, 712-736.
Feldman, J. (1958) Equivalence and perpendicularity of Gaussian processes, Pacific J.
Math., 8 , 699-708, correction, ibid. 9, 1295-1296.
Feller, W. (1971) An Introduction to Probability Theory and its Applications, Vol.II, 2nd
ed., Wiley, New York.
Fleisher, I. and Kooharian, A. (1958) On the statistical treatment of stochastic processes,
Ann. Math. Statist., 29, 544-549.
Fleming, T.R. and Harrington, D.P. (1991) Counting Processes and Survival analysis,
Wiley, New York.
Gallant, A.R. and Tauchen, G. (1996) A Unified Theory of Estimation and Inference for
Nonlinear Dynamic Models, Basil Blackwell, Oxford.
Girsanov, I.V. (1960) On transforming a certain class of stochastic processes by abso-
lutely continuous substitution of measures, Theory Prob. Appl., 5, 285-301.
Grenander, U. (1950) Stochastic processes and statistical inference, Arkiv. fur Mathe-
matik, 1, 195-227.
Grenander, U. (1968) Eight lectures on statistical inference in stochastic processes, Tech.
Report No.2, Division of Appl. Math., Brown University, Providence, Rhode Island.
Grenander, U. (1981) Abstract Inference, Wiley, New York.
93
Guttorp, P. (1991) Statistical Inference for Branching Processes, Wiley, New York.
Hall, P. and Heyde, C.C. (1980) Martingale Limit Theory and its Application, Academic
Press, London.
Hajek , J. (1958) On a property of normal distribution of any stochastic processes, Chech.
Math. J., 8, 610-618.
Heyde, C.C. (1974) On estimating the variance of the offspring distribution in a simple
branching process, Adv. Appl. Probab., 6, 421-433.
Heyde, C.C. (1997) Quasi-Likelihood and its Applications: A General Approach to Op-
timal Parameter estimation, Springer, New York.
Ibragimov, I.A. (1963) A central limit theorem for a class of dependent random variables,
Theory Prob. Appl., 8, 83-89.Brown,
Jacobsen, M. (1982) Statistical Analysis of Counting Processes, Lecture Notes in Statis-
tics No. 12, Springer, New York.
Kalman, R.E. (1960) A new approach to linear filtering and prediction theory, J. Basic
Engg., 82, 35-45.
Kalman, R.E. and Bucy, R.S. (1961) New results in linear filtering and prediction theory,
J. Basic. Engg., 83, 95-108.
Kakutani, S. (1948) On the equivalence of infinite product measures, Ann. Math., 49,
214-224.b
Karhunen, K. (1947) Uber lineare methoden in der wahrscheinlichkeitsrechnung, Ann.
Acad. Sci. Finn. a1, 37, 1-79.
Karr, A. (1991) Point Processes and their Statistical Inference, Marcel Dekker, New
York.
Kasonga, R. (1988) The consistency of a nonlinear least squares estimator for diffusion
processes, Stoch. Proc. Appl. , 30, 263-275.
Klimko, L.A. and Nelson, P.I. (1978) On conditional least squares estimation for stochas-
tic processes, Ann. Statist., 6, 629-642.
Krickeberg, K. (1980) Statistical Problems on Point Processes, Banach Center Publica-
tions No. 6, pp.197-223.
94
Krickeberg, K. (1982) Processus ponctuels en statistique, In Lecture Notes Math. , 929,
Springer, Berlin, pp. 205-313.
Kuchler, U.and Sorensen, M. (1997) Exponential Families of Stochastic Processes, Springer,
New York.
Kunita, H. and Watanabe, S. (1967) On square integrable martingales, Nagoya Math.
J., 30, 209-245.
Kutoyants, Yu. A. (1984) Parameter Estimation for Stochastic Processes, Translated
and Ed. B.L.S. Prakasa Rao, Heldermann, Berlin.
Kutoyants, Yu. A. (1994) Identification of Dynamical Systems with Small Noise, Kluwer,
Dordrecht.
Kutoyants, Yu. A. (1998) Statistical Inference for Spatial Poisson Processes, Lecture
Notes in Statistics, 134,Springer, New York.
Kutoyants, Yu. A. (2004) Statistical Inference for Ergodic Diffusion Processes, Springer,
London.
Le Breton, A. (1976) On continuous and discrete sampling for parameter estimation in
diffusion type process, In Mathematical Programming Studies, 5, 124-144.
Lewis, P.A.W. (1972) Stochastic Point Processes: Statistical Analysis, Theory and Ap-
plications, Wiley, New York.
Linkov, Y.N. (2001) Asymptotic Methods in the Statistics of Stochastic Processes, Amer-
ican Mathematical Society, Providence, Rhode Island.
Liptser, R.S. and Shiryayev, A.N. (1977) Statistics of Random Processes: General The-
ory, Springer, New York.
Liptser, R.S. and Shiryayev, A.N. (1978) Statistics of Random Processes: Applications,
Springer, New York.
Loeve, M. (1946) Fonctions aleatoires de second ordre, C.R. Acad. Sci. Paris, 222.
Loeve, M. (1977) Probability Theory I, 4th ed., Springer, Berlin.
Maruyama, G. and Tanaka, H. (1957) Some properties of one-dimensional diffusion pro-
cesses, Mem. Fac. Kyushu Univ., 11, 117-141.
Meyer, P. (1962) A decomposition theorem for supermartingales, Illinois J. Math., 6,
193-205.
95
Naik-Nimbalkar, U.V. (1996) In Stochastic Processes and Statistical Inference, Ed. B.L.S.
Prakasa Rao and B.R. Bhat, New Age International, New Delhi, pp.52-72.
Negri, I. (1998) Efficacite globale de la fonction de repartition empirique dans le cas d’un
processus de diffusion ergodique, Ph.D. Thesis, Universite du Maine, Le Mans.
Norman, M.F. (1971) Statistical inference with dependent observations. Extensions of
classical procedures, :J. Mathematical Psychology, 8, 444-451.
Novikov, A.A. and Shiryayev, A.N. (1994) Statistics and control of random processes,
Proceedings of Steklov Institute of Mathematics, 202, Amer. Math. Soc. Provi-
dence, Rhode Island, USA.
Prabhu, N.U. (1988)Statistical Inference from Stochastic Processes, Contemporary Math-
ematics, 80, American Mathematical Society, Providence, Rhode Island.
Prabhu. N.U. and Basawa, I.V. (1991) Statistical Inference in Stochastic Processes,
Marcel Dekker, New York.
Prakasa Rao, B.L.S. (1972) Maximum likelihood estimation for Markov processes, Ann.
Inst. Statist. Math, 24, 333-345.
Prakasa Rao, B.L.S. (1974) Statistical inference for stochastic processes, Tech. Report
CRM-465, Centre de Recherches Mathematiques, Universite de Montreal.
Prakasa Rao, B.L.S. (1983) Asymptotic theory for nonlinear least squares estimator for
diffusion processes, Math. Oper. Stat. Series Statistik, 14, 195-209.
Prakasa Rao, B.L.S. (1987) Asymptotic Theory of Statistical Inference, Wiley, New York.
Prakasa Rao, B.L.S. (1988) Statistical inference from sampled data for stochastic pro-
cesses, In Contemporary Mathematics, 80, American Mathematical Society, Provi-
dence, Rhode Island, pp. 249-284.
Prakasa Rao, B.L.S. (1990) Nonparametric density estimation for stochastic processes
from sampled data, Publ. Inst. Stat. Univ. de Paris., 35, 51-83.
Prakasa Rao, B.L.S. (1991) Asymptotic theory of weighted maximum likelihood estima-
tion for growth models, In Statistical Inference in Stochastic Processes, Ed. N.U.
Prabhu and I.V. Basawa, Marcel Dekker, New York, pp. 183-208.
Prakasa Rao, B.L.S. (1996) Optimal asymptotic tests of composite hypotheses for con-
tinuous time stochastic processes, Sankhya Ser. A, 58, 8-24.
96
Prakasa Rao, B.L.S. (1996) Nonparametric approach to time series analysis, In Stochastic
Processes and Statistical Inference, Ed. B.L.S. Prakasa Rao and B.R. Bhat, New
Age International, New Delhi, pp.73-89.
Prakasa Rao, B.L.S. (1999) Statistical Inference for Diffusion Type Processes, Kendall’s
Library of Statistics No.8, Arnold, London and Oxford University Press, New York.
Prakasa Rao, B.L.S. (1999) Semimartingales and their Statistical Inference, Chapman
and Hall/ CRC Press, Boca Raton., Florida.
Prakasa Rao, B.L.S. (2001) Statistical inference for stochastic partial differential equa-
tions, In Selected Proceedings of the Symposium on Inference for Stochastic Pro-
cesses, Ed. I.V. Basawa, C.C.Heyde, and R.L.Taylor, IMS Monograph Series, 37,
pp.47-70.
Prakasa Rao, B.L.S. (2001) Nonparametric inference for parabolic stochastic partial
differential equations, Random Operators and Stochastic Equations, 9, 329-338.
Prakasa Rao, B.L.S. (2002) Nonparametric inference for a class of stochastic partial
differential equations based on discrete observations, Sankhya Ser.A, 64, 1-15.
Prakasa Rao, B.L.S. (2002) On some problems of estimation for some stochastic partial
differential equations, In Uncertainty and Optimality, Ed. J.C.Misra (2002) World
Scientific, Singapore, pp. 71-154.
Prakasa Rao, B.L.S. (2003) Parametric estimation for linear stochastic differential equa-
tions driven by fractional Brownian motion, Random Operators and Stochastic Equa-
tions, 11, 229-242.
Prakasa Rao, B.L.S. (2004) Self-similar processes, fractional Brownian motion and sta-
tistical inference, In Festschrift for Herman Rubin , Ed. A. Das Gupta, Institute of
Mathematical Statistics, Lecture Notes and Monograph Series, 45, 98-125.
Prakasa Rao, B.L.S. (2009) Conditional independence, conditional mixing and condi-
tional association, Ann. Inst. Statist. Math., 61, pp. 441-460.
Prakasa Rao, B.L.S. (2010) Statistical Inference for Fractional Diffusion Processes, Wi-
ley, London.
Prakasa Rao, B.L.S. (2012) Associated Sequences, Demimartingales and Nonparametric
Inference, Birkhauser, Springer, Basel.
97
Prakasa Rao, B.L.S. and Bhat, B.R. (1996) Stochastic Processes and Statistical Inference,
New Age International, New Delhi.
Prakasa Rao, B.L.S. and Prasad, M.S. (1976) Maximum likelihood estimation for de-
pendent random variables, J. Indian Statist. Assoc., 14, 75-79.
Prasad, M. S. (1971) Some Contribution to the theory of Maximum likelihood Estimation
for Dependent Random Variables, Ph.D. Thesis, Indian Institute of Technology,
Kanpur.
Rajarshi, M.B. (1996) Resampling methods for stochastic processes, In Stochastic Pro-
cesses and Statistical Inference, Ed. B.L.S. Prakasa Rao and B.R. Bhat, New Age
International, New Delhi, pp.90-120.
Rao, M.M. (2000) Stochastic Processes: Inference Theory, Kluwer, Dordrecht.
Renyi, A. (1963) On stable sequences of events, Sankhya Series A, 25, 293-302.
Revesz, P. 91968) The Laws of Large Numbers, Academic Press, New York.
Rippley, B.D. (1988) Statistical Inference for Spatial Point Processes, Cambridge Uni-
versity Press, Cambridge, UK.
Sagdar, D. (1974) On an approximate test of hypotheses about the correlation function
of a Gaussian random process, Theor. Probab. Math. Stat., 2, 231-238.
Sarma, Y.R. (1976) Sur les tests et sur l’estimation de parametres pour certains processus
stochastiques stationnaires, Publ. Inst. Statist. Univ. Paris, 17, 1-124.
Sagirow, P. (1970) Stochastic Methods in the Dynamics of Satellites, CISM Courses and
Lectures, 57, Springer, Berlin.
Schmetterer, L. (1974) Introduction to Mathematical Statistics, Springer, Berlin.
Silvey, S.D. (1961) A note on the maximum likelihood in the case of dependent obser-
vations, J. Roy. Statist. Soc. ser. B, 23, 444-452.
Striebel, C.T. (1959) Densities for stochastic processes, Ann. Math. Statist.,30, 559-567.
Swensen , A. (1980) Asymptotic Inference for a Class of Stochastic Processes, Ph.D.
Thesis, University of California, Berkeley.
Wald. A. (1948) Asymptotic properties of the maximum likelihood estimate of an un-
known parameter of a discrete stochastic process, Ann. Math. Statist., 19, 40-46.
98
Winnicki, J. (1988) Estimation theory for the branching process with immigration, In
Contemporary Mathematics, 80, pp. 301-322.
Woerner, J. (2001) Statistical Analysis for Discretely Observed Levy Process, Ph.D. The-
sis, Albert-Ludwig-Universitat, Freiburg.
Yanev, N.M. (1975) On the statistics of branching processes, Theory Prob. Appl., 20,
612-622.
99