Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Second-order residual analysis of spatio-temporal point processesand applications in model evaluation
Jiancang Zhuang
Institute of Statistical Mathematics, Tokyo, Japan.
Summary.This paper gives first-order residual analysis for spatio-temporal point processes similar to the residual analy-
sis developed by Baddeley et al. (2005) for spatial point process, and also proposes principles for second-order residual
analysis based on the viewpoint of martingales. Examples are given for both first- and second-order residuals. In par-
ticular, residual analysis can be used as a powerful tool in model improvement. Taking a spatio-temporal epidemic-type
aftershock sequence (ETAS) model for earthquake occurrences as the baseline model, second-order residual analysis
can be useful to identify many features of the data not implied in the baseline model, providing us with clues of how to
formulate better models.
1. Introduction and motivations
Temporal, spatial and spatio-temporal point processes have been increasingly widely used in many fields, in-
cluding epidemiology, biology, environmental sciences and geosciences. Among the associated statistical inference
techniques, such as model specification, parameter estimation, model selection, testing goodness-of-fit and model
evaluation, the tools used for testing goodness-of-fit and model evaluation are quite under-developed. This is one
motivation for this article.
Model selection procedures can be used in testing goodness-of-fit and model evaluation. Given several explicit
models that are fitted to the same dataset, we can use some model selection criterion such as Akaike’s information
criterion (AIC, see, e.g., Akaike, 1974) and cross validation (e.g., Stone, 1977) to find the best model among
them. To find a model better than the current best model, one can always try several possible versions of new
2 J. Zhuang
models, fit them to the same dataset and use the model selection procedures again to see whether one of the new
models becomes the best performing model. However, the above procedures are not always easy to implement.
Formulating a model and fitting it to the dataset may involve heavy programming and computational tasks. Finally,
model selection procedures only give us some quantities that indicate the overall fit of each model. It is hard to
deduce from these quantities whether a model, even if it is not the best one, has some better properties than other
models ranked higher by the model selection procedure. It is very helpful if the model improvement process can
be simplified. Residual analysis developed in this article can be used for this purpose.
To help motivate this work, we begin with a description of the dataset used in this study. The developments
here are associated with seeking answers to a series of problems in modelling the phenomena associated with
earthquake clusters. Although earthquake data come from the field of geophysics, similar problems also appear in
the epidemiological modelling of contagious diseases, in biology, in ecology and in environmental sciences.
The epicentres of earthquakes are not homogeneously distributed on the surface of the earth. In the globe,
earthquakes mainly occur in the subduction zone between plate boundaries. Locally, earthquakes accumulate along
active faults or in volcanic regions. Their depths range from several to 700 kilometers. Although an earthquake
can be as big as M7, resulting in huge disasters, most earthquakes are so small that they can be detected only by
sensitive seismometres.
Seismicity is clustered in both space and time. The overlapping of earthquake clusters with one another and
also with the background seismicity, complicates our analysis. For the purpose of long-term earthquake prediction,
i.e., evaluating the risk of the occurrence of a powerful earthquake in about a 10-year time scale, a good estimate
of the background seismicity rate is necessary. On the other hand, for short-term prediction (in a scale of an hour
or a day), a good understanding of earthquake clusters is necessary.
The earthquake catalogue consist of a list (ti, xi, si) and other associated information, where ti, xi and si
record, respectively, the occurrence time, the epicentre location and magnitude of the ith event. Figure 1 shows
second-order residual analysis of point processes 3
the shallow earthquakes (with depths less than 100 km) in the Japanese Meteorological Agency (JMA) catalogue
used in this analysis. The time span of this catalogue is 01/01/1926 to 31/12/1999. In this article, we select
the data in the polygon with vertices (134.0E, 31.9N), (137.9E, 33.0N), (143.1E, 33.2N), (144.9E, 35.2N),
(147.8E, 41.3N), (137.8E, 44.2N), (137.4E, 40.2N), (135.1E, 38.0N) and (130.6E, 35.4N). The time
period from the 10000th day after 01/01/1926 to 31/12/1991 is used as the target range in which to estimate the
parameters through the method of maximum likelihood.
It is easy to see from Figure 1 that earthquakes are clustered. Typically, the spatio-temporal ETAS (epidemic
type aftershock sequence) model is used to describe the behavior of earthquake clustering (Kagan, 1991; Rathbun,
1994; Musmeci and Vere-Jones, 1992; Ogata, 1988, 1998, 2004; Ogata et al 2003; Zhuang et al., 2002, 2004;
Console and Murru, 2002; Console et al., 2003; Helmstetter and Sornette, 2003a, 2003b; Helmstetter et al., 2003).
In this model, seismicity is classified into two components, the background and the cluster. Background seismicity
is modelled as a Poisson process that is temporally stationary but not spatially homogeneous. Once an event
occurs, no matter if it is a background event or if it is generated (triggered) by another previous event, it produces
(triggers) its own children according to certain rules. Such a model is a continuous-type branching processes with
immigration (background). This model can by defined completely by the conditional intensity function (hazard
rate conditional on a given history Ft up to current time t, see Appendix A or Daley and Vere-Jones, 2003, Chapter
7, for more details) as
λ(t, x, s) = lim∆t↓0, ∆x↓0,∆s↓0
PrN((t, t + ∆t] × (x, x + ∆x] × (s, s + ∆s]) ≥ 1|Ft
∆t ∆x∆s.
The conditional intensity function of the ETAS model used in this paper takes the form given by Ogata (1998),
i.e.,
λ(t, x, s) = γ(s)
[
u(x) +∑
i: ti<t
κ(si) g(t − ti) f(x − xi | si)
]
(1)
4 J. Zhuang
where
κ(s) = A exp[αs]; (2)
γ(s) = β exp[−βs] H(s);
f(x | s) =q − 1
πCeαs
(
1 +‖x‖2
Ceαs
)−q
, (3)
and
g(t) =p − 1
c
(
1 +t
c
)−p
H(t), p > 1,
H being the Heaviside function. In the above, the magnitude distribution γ(s) is based on the Gutenberg-Richter
law (Gutenberg and Richter, 1956), the expected number of children κ(s) is based on Yamanaka and Shimazaki
(1990), and the time density g(t) is based on the modified Omori formula (Omori, 1898; Utsu, 1969), all being
empirical laws in seismicity studies. The background rate and the parameters A, α, β, c, p and C in the model
can be estimated by an iterative algorithm (Zhuang et al., 2002, 2004, see Appendix C).
However, there are many questions about the above model formulation. For example:
1 Is the background process stationary?
2 Are background events and triggered events different, for example, in magnitude distribution or in triggering
offspring?
3 Does the magnitude distribution of triggered events depend on the magnitudes of their parent events?
4 Is it reasonable to apply the same exponential function eαs in both κ(s) and f(x|s)?
In previous studies, residual analysis has been carried out by transforming the point process into a standard
Poisson process (Ogata, 1988). Schoenberg (2004) uses the thinned residuals to analyse the goodness-of-fit of the
ETAS model to Californian earthquake data. Baddeley et al. (2004) have made more remarkable and general
second-order residual analysis of point processes 5
developments. But their residual analysis methods are all of the first order and far from being sufficient for solving
problems where second-order properties such as clustering and inhibition are concerned. To answer these, it is
necessary for us to generalise the concepts of residual analysis to higher orders.
Zhuang et al. (2004) developed a stochastic reconstruction method to test the above hypotheses associated with
earthquake clusters, using the ETAS model as the reference model. Their method is based purely on intuition
rather than on a strict theoretical basis. As we show in this article, their method can be validated by using the
tools of residual analysis. Providing a theoretical basis for the stochastic reconstruction method that can also be
applied to a wider range of point-process models is another motivation of this article.
In the ensuing sections, we first review the first-order residuals developed by Baddeley et al. (2005) and then
propose principles for second-order residuals. The uses and powers of these residuals are illustrated via some simple
examples and also by solving the questions raised in the formulation of the ETAS model for the purpose of model
improvement.
2. First-order residuals and examples
Baddelay et al. (2005) give the first-order residuals for a spatial point process X = x1, x2, · · · based on the
Nguyen-Zessin (1979) formula
E
[
∑
xi∈X∩D
h(xi; X \ xi)
]
= E
[∫
D
h(x; X)λp(x; X)µ(dx)
]
,
for any measurable set D, where λp is the Papangelou conditional intensity. Thus, they define innovations with
respect to (D, h) to be
I(D, h, λp) =∑
xi∈X∩D
h(xi; X \ xi) −
∫
D
h(x; X)λp(x; X)µ(dx),
and residuals corresponding to I(D, h, λp)
R(D, h, λp) =∑
xi∈X∩D
h(xi; X \ xi) −
∫
D
h(x; X)λp(x; X)µ(dx),
6 J. Zhuang
where λp is the fitted conditional intensity. If h also depends on the parameters θ in λp, h is obtained by substituting
the estimated parameters θ in h.
The conditional intensity for temporal or spatio-temporal point processes defined in Appendix A is simpler than
the Papangelou conditional intensity for spatial point processes. In the former case, the occurrence of an event at a
certain time or a spatio-temporal location only depends on the events occurring at earlier times; while in the latter
case, the occurrence of an event at a particular location depends on all the other events that occur elsewhere. In
principle, we can simply obtain the conditional intensity defined in (1) and Appendix A through taking expectation
of the Papangelou conditional intensity, which is defined as conditional on the σ-algebra generated by all the events
not at the current time and location, over a simpler σ-algebra of the observation history up to current time, to
define first-order residuals for a spatio-temporal point process. However, it is hard to generalise first-order residuals
to second-order residuals in this way, which is the main concern of this article. We make use of the evolutionary
features of the process through the viewpoints of martingale theory.
Let N be a simple spatio-temporal point process in a interval [0, T ] and a d-dimensional region X ⊂ Rd,
admitting a conditional intensity λ(t, x) (see Delay and Vere-Jones, 1988, Chapter 13, or Appendix A for definition
of conditional intensity). According to the martingale property of the conditional intensity, for any predictable
process (measurable with respect to the σ-algebra generated by sets of the form (s, t] × B × E where E ∈ Fs and
B ∈ X ) h(t, x) ≥ 0, almost everywhere (a.e.),
E
[∫
D
h(t, x)N(dt × dx)
]
= E
[∫
D
h(t, x)λ(t, x)µ(dt × dx)
]
, ∀D ∈ T⊗
X (4)
where µ is the Lebesgue measure ℓ × ℓd (see also, Bremaud, 1981, Chap. 2; Karr, 1991, Chap. 5).
Using the terminology adopted in Baddeley et al. (2005) for spatial point processes, define first-order innovations
with respect to a predictable function h(t, x) ≥ 0, a.e., and a measurable set D ⊂ T × X by
I(D, h, λ) =
∫
D
[h(t, x)N(dt × dx) − h(t, x)λ(t, x)µ(dt × dx)] . (5)
second-order residual analysis of point processes 7
Due to (4), E[I(D, h, λ)] = 0. The first-order residuals are defined by replacing the true model in (5) by the fitted
model, i.e.,
R(D, h, λ) =
∫
D
[
h(t, x)N(dt × dx) − h(t, x) λ(t, x)µ(dt × dx)]
.
If the fitted model is close enough to the true model, then R(D, h, λ) ≈ 0.
The following six examples give different kinds of first-order innovations. Because the residuals corresponding
to each of these innovations can be easily defined, we give only the forms of innovations.
Example 1. Raw innovations. Let h(t, x) = 1, then the corresponding innovation is
I(D, h, λ) = N(D) −
∫
D
λ(t, x)µ(dt × dx),
which is called the raw innovation. One important property of raw innovations/residuals in applications is that,
for a realization of the process (ti, xi) : i ∈ N,
τi =
∫ ti
0
∫
X
λ(t, x) ℓd(dx) ℓ(dt)
is a standard Poisson process (Meyer, 1971; Papangelou, 1972; Ogata, 1988; Vere-Jones and Schoenberg, 2004).
Example 2. Reciprocal-lambda innovations (Baddeley et al. 2005; Schoenberg, 2004) If h(t, x) = 1/λ(t, x),
I(D, h, λ) =
∫
D
N(dt × dx)
λ(t, x)− µ(D).
This is an analogue of the Stoyan-Grabarnik (1991) weights for Gibbs point processes. The residual analysis on
the space-time ETAS model given by Schoenberg (2004) was essentially based on the reciprocal-lambda innova-
tions/residuals.
Example 3. Pearson innovations (Baddeley et al., 2005). If h(t, x) = 1/√
λ(t, x),
I(D, h, λ) =
∫
D
N(dt × dx)/√
λ(t, x) −
∫
D
√
λ(t, x) µ(dt × dx).
8 J. Zhuang
Example 4. Score innovations (Baddeley et al., 2005). The score innovations are defined by
I
(
D,∂ log λ(t, x)
∂θ, λ
)
=
∫
D
∂ log λ(t, x)
∂θN(dt × dx) −
∫
D
∂λ(t, x)
∂θµ(dt × dx),
where θ is any of the regular parameters in the model. The equation on the score residuals
R
(
D,∂ log λ(t, x)
∂θ, λ
)
= 0
is the condition for maximizing the log-likelihood function in D, i.e.,
log L =
∫
D
log λ(t, x)N(dt × dx) −
∫
D
λ(t, x)µ(dt × dx).
Example 5. Weighted score innovations and localized maximum likelihood estimates. The innovation defined
by
I
(
D, w(t, x; t0, x0)∂
∂θlog λ(t, x), λ
)
=
∫
D
w(t, x; t0, x0)∂ log λ(t, x)
∂θ[ N(dt × dx) − λ(t, x)µ(dt × dx)]
is called the weighted score innovation, where w(t, x; t0, x0) is a kernel function centered at a given (t0, x0). The
equality on the weighted score residual
R
(
D, w(t, x; t0, x0)∂ log λ(t, x)
∂θ, λ
)
= 0,
is the condition for maximizing the local log-likelihood
WLL =
∫
D
w(t, x; t0, x0) log λ(t, x)N(dt × dx) −
∫
D
w(t, x; t0, x0)λ(t, x)µ(dt × dx).
Example 6. Testing stationarity of the background process in the ETAS model. Define the background inno-
vation for the ETAS model by
I(D, ϕ, λ) =∑
i
ϕ(ti, xi, si) ID(ti, xi, si)) −
∫∫∫
D
u(x) γ(s) dt dxds, (6)
second-order residual analysis of point processes 9
where ϕ(t, x, s) = u(x)γ(s)/λ(t, x, s), λ(t, x, s) and u(x) are given in (1), D is a measurable subset of T ×X×M and
ID is the index function of D. To test the stationarity of the background process, choose D(t) = (0, t) × Bx × Bs,
Bx ⊂ X and Bs ⊂ S. From E [I(D, ϕ, λ)] = 0,
I(D, ϕ, λ)(t) =∑
i
ϕ(ti, xi, si) ID(t)(ti, xi, si) ≈ t
∫
Bx
u(x)µ(dx)
∫
Bs
γ(s)µ(ds), (7)
which is a linear function with respect to t. Applications of such tests and geophysical interpretations can be found
in Zhuang et al. (2005) and Hainzl and Ogata (2005).
3. Second-order residual analysis
Given a second-order F -predictable function H(t, x; t′, x′), i.e., a process that can be generated from processes
of the form h1(t, x)h2(t′, x′), where h1 and h2 are first-order predictable processes, by linear combinations and
monotone limits (see details in Appendix B), the second-order innovation with respect to H(t, x; t′, x′) for D ∈
(T⊗
X )⊗
(T⊗
X ) is
I2(D, H, λ) =
∫∫
D\ diag2 D
H(t, x; t′, x′)N(dt′ × dx′)N(dt × dx)
−
∫∫
D
H(t, x; t′, x′)λ(t, x)λ(t′, x′)µ(dt′ × dx′)µ(dt × dx), (8)
where diag2 D = (t, x, t, x) ∈ D. As shown in Appendix B, E[I2(D, H, λ)] = 0. The second-order residual can be
defined by R2(D, H, λ) = I2(D, H, λ).
10 J. Zhuang
Example 7. Variance of first-order innovations. For a first-order predictable process h(t, x) and any B ∈
T⊗
X ,
Var
[∫
B
h(t)N(dt)
]
= E
[
(∫
B
h(t)N(dt)
)2]
−
[
E
(∫
B
h(t)N(dt)
)]2
= E
[∫∫
B×B
h(t)h(s)N(dt)N(ds)
]
−
[
E
(∫
B
h(t)N(dt)
)]2
= E
[∫∫
B×B
h(t)h(s)λ(t)λ(s)µ(dt)µ(ds)
]
+ E
[∫
B
h2(t)λ(t)µ(dt)
]
−
[
E
(∫
B
h(t)λ(t)µ(dt)
)]2
= E
[∫
B
h2(t)λ(t)µ(dt)
]
,
or,
Var
[∫
B
h(t)N(dt)
]
= E
[∫
B
h2(t)N(dt)
]
.
The above formula leads to an approximate estimate of the standard error for first-order residuals. In Example 6, if
the ETAS model is close enough to the true one, the standard error of the background residual R(D, ϕ, λ)(t) deviat-
ing from the expected linear function of t given in (7) can be approximated by[∑
i ϕ2(ti, xi, si) ID(t)(ti, xi, si)]1/2
.
4. Applications of second-order residual to earthquake data: model improvement
Now we have tools to solve the problems associated with the earthquake clusters listed in the introduction. In this
section, before testing these hypotheses, we briefly recall the basic ideas of the stochastic reconstruction method
proposed by Zhuang et al. (2004).
Once the conditional intensity function in (1) is estimated, it provides us with a good way to evaluate the
probability of whether an event is likely to be a background event or be triggered by others (Kagan and Knoppoff,
1981; Zhuang et al 2002). Consider the contribution of the background seismicity rate relative to total seismicity
rate at the occurrence of the ith event,
ϕi =u(xi) γ(si)
λ(ti, xi, si), (9)
second-order residual analysis of point processes 11
If we remove the ith event with probability 1 − ϕi for all the events in the process, we can realize a process with
the occurrence rate of µ(x)γ(s) (see, also, Ogata, 1981; Karr, 1991, Chap. 5 for justification). Thus it is natural to
regard ϕi as the probability that the ith event is a background event. Similarly,
ρij =γ(sj) g(tj − ti) f(xj − xi | si)
λ(tj , xj , sj)(10)
can be regarded as the probability that the jth event is directly produced by the ith event. That is to say, given i
fixed, if we keep each event j with probability ρij , we can obtain a Poisson process of intensity γ(s) g(t−ti) f(x−xi |
si), which is the process consisting of the children of the ith event in the catalogue. Based on the above ideas, Zhuang
et al. (2004) tested various hypotheses associated with the clustering features of earthquakes by building empirical
functions with the events weighted by the estimated probabilities ϕi or ρij according to the fitted conditional
intensity of the ETAS model. This reconstruction method was purely based on intuition. In the coming sections,
we show how it works.
For simplification, in the following sections, the notation of Riemann integrals is used, and, without confusion,
we suppose that the integrals are over the whole range to which the data correspond .
4.1. Improving individual functions
Formulating a point-process model with a conditional intensity function for a particular dataset is mostly to begin
with empirical results and the researchers’ experience. Once an explicit form for the model is given, the first
question is whether this form is suitable or whether it can be improved in some way. In this subsection, we will
use κ(s) in (1) as a simple example to show how to improve the individual function in the model formulation.
Consider the following form for the conditional intensity of the ETAS model
λ1(t, x, s) = γ(s)
[
u(x) +∑
i: ti<t
K(si) g(t − ti) f(x − xi|si)
]
,
where K(s) is assumed to be a more suitable function than κ(s) for the mean number of children from an event
of magnitude s. Even though such K(s) is unknown, we will show below that we can still construct a so-called
12 J. Zhuang
“ratio-unbiased” (a ratio of two unbiased estimates) estimate of K(s) based on the properties of second-order
innovations/residuals.
Let
Hij =K(si) g(tj − ti) f(xj − xi | si) γ(sj)
λ1(tj , xj , sj).
Then from (8)
E
∑
i, j
I(si ∈ [s0 − δ, s0 + δ])Hij
= E
∫∫∫∫∫∫
I(s ∈ [s0 − δ, s0 + δ])K(s) g(t′ − t) f(x′ − x|s) γ(s′)λ1(t, x, s) dt′ dx′ ds′ dt dxds
=
∫
K(s) γ(s) I(s ∈ [s0 − δ, s0 + δ]) ds × E
[∫∫
λ1(t, x) dt dx
]
≈ 2 δ K(s0) γ(s0) × E
[∫∫
λ1(t, x) dt dx
]
, (11)
where
λ1(t, x) = u(x) +∑
i: ti<t
K(si) g(t − ti) f(x − xi|si).
Using the first-order innovation formula,
E
∑
i, j
I(si ∈ [s0 − δ, s0 + δ])
= E
∫∫∫
I(s ∈ [s0 − δ, s0 + δ])λ1(t, x, s) dt dxds
=
∫
γ(s) I(s ∈ [s0 − δ, s0 + δ]) ds × E
[∫∫
λ1(t, x) dt dx
]
≈ 2 δ γ(s0) × E
[∫∫
λ1(t, x) dt dx
]
. (12)
Dividing (11) by (12) yields
K(s0) ≈
∑
i, j
Hij I(si ∈ [s0 − δ, s0 + δ])
∑
i
I(si ∈ [s0 − δ, s0 + δ]). (13)
second-order residual analysis of point processes 13
Although K(s) is unknown, we can set K(0)(s) = κ(s) and thus H(0)ij = ρij for the first step, where κ(s) is from
the fitted conditional intensity of the ETAS model and ρij is the estimated ρij defined in (10). We can then use
(13) to get an updated K(s), i.e.,
K(1)(s0) =
∑
i, j
ρij I(si ∈ [s0 − δ, s0 + δ])
∑
i
I(si ∈ [s0 − δ, s0 + δ]). (14)
Since∑
j ρij can be interpreted as an estimate of the number of children from the ith event, the right side of (14)
can be regarded as the average number of children produced by a mother event of magnitude around s0. This
procedure can be iterated until convergence.
In this application, the second-order innovation plays a role in guaranteeing that the estimate of the quantity
in (11) is unbiasedly proportional to K(s) γ(s). The estimates from fitting the ETAS model to the earthquake data
are only used as initial values in (13). Such a procedure is similar in spirit to the expectation-maximization (EM)
algorithm, with the expectation step carried out by the properties of second-order innovations/residuals and the
maximization step replaced by the non-parametric approach of the “ratio-unbiased” estimate in (13).
Example 8. Average numbers of children produced by earthquakes in the JMA catalogue. We apply the above
reconstruction procedures to the JMA catalogue and the reconstructed results are shown in Figure 2. It is clear
that the differences between K(1)(s) and κ(s) are negligible, which implies that it is not necessary to improve κ(s)
at this stage. We will continue discussing this figure in Example 9.
4.2. Difference between background events and triggered events
The ETAS model assumes that there is no distinction between background events and triggered events. Once an
event occurs, its magnitude is from the common unique magnitude distribution and it triggers offspring in the same
manner as others. It is interesting and important to test whether background events are different from triggered
events. In this section, we will test whether they are different in the behavior of triggering children.
14 J. Zhuang
Consider a more complicated model admitting a conditional intensity
λ1(t, x, s) = λ1(t, x, s, ω) I(ω = 0) + λ1(t, x, s, ω) I(ω = 1), (15)
where
λ1(t, x, s, ω) =
u(x) γ0(s), if ω = 0,
γ1(s)∑
ti<t ξ(t, x; ti, xi, si, ωi), if ω = 1.(16)
and
ξ(t, x; ti, xi, si, ωi) =
κ0(si) g0(t − ti) f0(x − xi | si), if ωi = 0;κ1(si) g1(t − ti) f1(x − xi | si), if ω = 1.
(17)
That is to say, an event is a background event if ω = 0 or a triggered event if ω = 1. In this mixed-type model,
a background event (t, x, s) has a magnitude s from a p.d.f γ0(s) and triggers on average κ0(s) children, whose
magnitude, time and location distributions are characterised by γ1, g0 and f0, respectively. For a triggered event
(t, x, s), the magnitude is from the density γ1(s) and it triggers κ1(s) children on average, with the children’s
magnitude, time and location densities being γ1, g1 and f1, respectively.
To construct a “ratio-unbiased” estimate of κ0, let
H(t, x, s; t′, x′, s′) =γ1(s
′) ξ(t′, x′; t, x, s, ω)λ1(t, x, s, ω) I(ω = 0)
λ1(t′, x′, s′)λ1(t, x, s)
=γ1(s
′)κ0(s) g0(t′ − t) f0(x
′ − x | s)u(x) γ0(s)
λ1(t′, x′, s′)λ1(t, x, s)
and
h(t, x, s) =λ1(t, x, s, ω) I(ω = 0)
λ1(t, x, s)=
u(x) γ0(s)
λ1(t, x, s).
Then
E
∑
i, j
H(ti, xi, si; tj , xj , sj) I(si ∈ [s0 − δ, s0 + δ])
= E
[∫∫∫∫∫∫
H(t, x, s; t′, x′, s′) I(si ∈ [s0 − δ, s0 + δ]) dt′ dx′ ds′ dt dxds
]
=
∫
κ0(s) γ0(s) I(si ∈ [s0 − δ, s0 + δ]) ds ×
∫∫
u(x) dt dx
≈ 2δ κ0(s0) γ0(s0) ×
∫∫
u(x) dt dx (18)
second-order residual analysis of point processes 15
and
E
[
∑
i
h(ti, xi, si) I(si ∈ [s0 − δ, s0 + δ])
]
= E
[∫∫∫
h(t, x, s) I(si ∈ [s0 − δ, s0 + δ]) dt dxds
]
=
∫
γ0(s) I(si ∈ [s0 − δ, s0 + δ]) ds ×
∫∫
u(x) dt dx
≈ 2δ γ0(s0) ×
∫∫
u(x) dt dx. (19)
A “ratio-unbiased” estimate of κ0(s) given by (18) and (19) is
κ0(s0) =
∑
i, j H(ti, xi, si; tj , xj , sj) I(si ∈ [s0 − δ, s0 + δ])∑
i h(ti, xi, si) I(si ∈ [s0 − δ, s0 + δ]).
If, at the initial step, we let γ(0)0 = γ
(0)1 = γ, κ
(0)0 = κ
(0)1 = κ, f
(0)0 = f
(0)1 = f and g
(0)0 = g
(0)1 = g, where γ, κ, f
and g are from the fitted conditional intensity of the ETAS model, we can obtain
κ(1)0 (s0) =
∑
i, j ρij ϕi I(si ∈ [s0 − δ, s0 + δ])∑
i ϕi I(si ∈ [s0 − δ, s0 + δ]), (20)
where ϕi and ρij are the estimates of ϕi and ρij defined in (9) and(10), respectively.
Similarly, we can find that
κ1(s0) =
∑
i, j H ′(ti, xi, si; tj, xj , sj) I(si ∈ [s0 − δ, s0 + δ])∑
i(1 − h(ti, xi, si)) I(si ∈ [s0 − δ, s0 + δ]).
where
H ′(t, x, s; t′, x′, s′) =γ1(s
′) ξ(t′, x′; t, x, s, ω)λ1(t, x, s, ω) I(ω = 1)
λ1(t′, x′, s′)λ1(t, x, s)
=γ1(s
′)κ1(s) g1(t′ − t) f1(x
′ − x | s)
λ1(t′, x′, s′)
[
1 −u(x) γ0(s)
λ1(t, x, s)
]
and
κ(1)1 (s0) =
∑
i, j ρij (1 − ϕi) I(si ∈ [s0 − δ, s0 + δ])∑
i(1 − ϕi) I(si ∈ [s0 − δ, s0 + δ]). (21)
The other functions in equations (15) to (17) such as g1, g2, f1 and f2 can be reconstructed in similar ways.
16 J. Zhuang
Example 9. Differences in average numbers of children triggered by the background events and the triggered
events in the JMA catalogue. Applying the above procedure to the JMA catalogue, the triggering abilities (average
numbers of children) from the background events and the triggered events for each magnitude are plotted in
Figure 2. It is found that background events and triggered events generate children according to different positive
exponential laws. However, a background event triggers fewer children on average than a triggered event of the
same magnitude. Such a feature plays an important role in evaluating the probability associated with foreshocks
(background events of which each has at least one larger descendant) (see a detailed discussion in Zhuang and
Ogata, 2006).
4.3. Spatial scaling factor and triggering abilities
As argued in the introduction, immediate questions regarding the use of (3) include: 1 Is the scaling factor Ceαs
necessary, i.e., can it be replaced by a constant D0? 2 Should the scaling factor Ceαs have the same exponent α
as in the triggering abilities of (2)? 3 Is the scaling factor of the form of an exponential law?
Suppose that the true conditional intensity is
λ1(t, x, s) = γ(s)
[
u(x) +∑
i: ti<t
κ(si) g(t − ti) f∗(x − xi|si)
]
,
where
f∗(x | s) = f∗(x, σ(s)) =π(q − 1)
σ(s)
[
1 +||x||2
σ(s)
]−q
,
is the p.d.f. for the locations of offspring with σ(s) being a function of s. Consider
H(t, x, s; t′, x′, s′) =γ(s′)κ(s) g(t′ − t)f∗(x′ − x, σ(s))
λ1(t′, x′, s′)
∂
∂σ(s)log f∗(x′ − x, σ(s)),
second-order residual analysis of point processes 17
and then, for any positive number δ,
E
∑
i, j
H(ti, xi, si; tj , xj , sj) I(si ∈ [s0 − δ, s0 + δ])
= E
[∫∫∫∫∫∫
γ(s′)κ(s) g(t′ − t)∂f(x′ − x, σ(s))
∂σ(s)I(s ∈ [s0 − δ, s0 + δ])λ1(t, x, s) dt′ dx′ ds′ dt dxds
]
=
∫
κ(s) I(s ∈ [s0 − δ, s0 + δ]) γ(s) ds × E[λ1(t, x)] ×∂
∂σ(s)
∫
f∗(x′ − x, σ(s)) dx′
= 0.
Since f∗ integrates to one. Thus, for all s0, if δ is small enough,
∑
i, j
H(ti, xi, si; tj , xj , sj)I(si ∈ [s0 − δ, s0 + δ]) ≈ 0.
Further, the above equation gives
∑
i, j
γ(sj)κ(si) g(tj − ti)f∗(xj − xi, σ(si))
λ1(tj , xj , sj)I(si ∈ [s0 − δ, s0 + δ])
∂
∂σ(s)log f∗(xj − xi, σ(s))
∣
∣
∣
∣
σ(s)=σ(si)
≈ 0.
In order to update σ(s), we convert the above equation into
∑
i, j
γ(sj)κ(si) g(tj − ti)f∗(xj − xi, σ
(0)(si))
λ1(tj , xj , sj)I(si ∈ [s0 − δ, s0 + δ])
∂
∂σ(s)log f∗(xj − xi, σ(s))
∣
∣
∣
∣
σ(s)=σ(1)(si)
≈ 0.
(22)
Taking σ(0)(s) = Ceαs in the above equation, i.e., f∗(x, σ(0)(s)) = f , where f is the estimated f defined in (3),
and (22) gives
∑
i, j
ρij I(si ∈ [s0 − δ, s0 + δ])∂
∂σ(s)log f∗(xj − xi, σ(s))
∣
∣
∣
∣
σ(s)=σ(1)(s0)
= 0, (23)
The solution σ(1)(s0) for each s0 in (23) is the updated version from σ(s0) (see Zhuang et al. 2004 for how to solve
this equation iteratively). Such an estimate can be regarded as maximizing the following pseudo-log-likelihood
PLL(σ(s0)) =∑
i, j
ρij log f∗(xj − xi, σ(si)) I(si ∈ [s0 − δ, s0 + δ])
where δ is a small positive number.
18 J. Zhuang
Example 10. The scaling law for the locations of clusters in the JMA catalogue. Applying the above procedure
to the JMA catalogue by using the original model and a model equipped with
f1(x|s) =q − 1
πCeα′sexp
(
1 +||x||2
Ceα′s
)−q
, (24)
we obtain the results shown in Figure 3. For comparison, the same procedure is also applied to simulated catalogues
generated from the model. It can be seen that the scaling law for the spatial locations of offspring is still an
exponential law, but not the same as the one for triggering abilities. This conclusion results in a new revision of
the formulation of the space-time ETAS model in the form of (24) (See more detailed discussion in Ogata and
Zhuang, 2006; Zhuang et al., 2005).
5. Conclusions
The idea of residual analysis for spatio-temporal point processes can be generalised through the viewpoint of
martingale theory. For the first-order innovations/residuals, as shown in the examples, it can be almost said that
each martingale associated with the conditional intensity corresponds to a specific kind of residual. Extending the
idea to higher-order cases, the second-order residual analysis is especially useful for improving model formulation
when second order properties such as clustering or inhibition are of interest. In this article, we use the spatio-
temporal ETAS model with a non-homogeneous background rate as the initial model for describing the clustering
features in earthquake occurrences. First- and second-order residual analysis helps us to find a proper direction in
which to improve on the model formulation.
Acknowledgements
This work was carried out while the author was supported by a post-doctoral programme P04039 funded by the
Japan Society for Promotion of Science. The author thanks Prof. Yosihiko Ogata and Prof. David Vere-Jones for
introducing him to the interesting research field of point processes. Prof. Adrian Baddeley kindly sent him his
second-order residual analysis of point processes 19
manuscript on residual analysis for spatial point processes, which stimulated this study. Discussions with Daryl
Daley, Estate Khmaladze, Frederic Schoenberg and David Vere-Jones proved very helpful as did the encouragement
and comments from the editor, Robin Henderson, as well as two anonymous referees.
References
Akaike, H. (1974), A new look at the statistical model identification. IEEE Transaction on Automatic Control,
AC-19, pp. 716-723.
Baddeley A., Turner R., Møller J. and Hazelton M. (2005) Residual analysis for spatial point processes. Journal
of the Royal Statistical Society, Series B, 67(5), 617-666. doi:10.1111/j.1467-9868.2005.00519.x.
Bremaud P. (1981), Point Processes and Queues: Martingale Dynamics, New York: Springer-Verlag.
Console, R., and M. Murru (2001), A simple and testable model for earthquake clustering, J. Geophys. Res., 106,
8699-8711.
Console, R., M. Murru, and A. M. Lombardi (2003), Refining earthquake clustering models. J. Geophys. Res.,
108(B10), 2468, doi:10.1029/2002JB002130.
Daley, D. and D. Vere-Jones (2003), An Introduction to Theory of Point Processes. Springer, New York.
Gutenberg B., Richter C. F., (1956). Earthquake Magnitude, Intensity, Energy, and Acceleration. Bull. Seism.
Soc. Am., 46, 105.
Hainzl, S. and Y. Ogata (2005), Detecting fluid signals in seismicity data through statistical earthquake modeling.
Journal of Geophysical Research, 110, B5, B05S07, doi:10.1029/2004JB003247.
Helmstetter A. and Sornette D. (2003) Foreshocks explained by cascades of triggered seismicity, J. Geophys. Res.,
108 (B10),2457 10.1029/2003JB002409.
20 J. Zhuang
Helmstetter A. and Sornette D. (2003) Predictability in the ETAS Model of Interacting Triggered Seismicity, J.
Geophys. Res., 108, 2482, 10.1029/2003JB002485.
Helmstetter A., Sornette D. and Grasso J.-R. (2003) Mainshocks are Aftershocks of Conditional Foreshocks:
How do foreshock statistical properties emerge from aftershock laws, J. Geophys. Res., 108 (B1), 2046,
doi:10.1029/2002JB001991.
Kagan, Y. (1991), Likelihood analysis of earthquake catalogues. Geophysical Journal International, 106, B7,
135-148.
Kagan, Y. Y., and L. Knopoff, (1981). Stochastic synthesis of earthquake catalogs. Journal of Geophysical
Research, 86, 2853-2862.
Karr, A.F. (1991). Point processes and their Statistical Inference, second edition. New York: Dekker.
Meyer P. (1971) Demonstration simplifiee d’un theoreme de Knight. In Seminaire de Probabilites, V, University
of Strasbourg, Lecture Notes in Math, 191, 191-195.
Musmeci, F. and D. Vere-Jones (1992). A space-time clustering model for historical earthquakes. Ann. Inst. Stat. Math.,
44, 1-11.
Nguyen X. X. and Zessin H. (1979). Integral and Differential Characterizations of the Gibbs process. Mathema-
tische Nachrichten. 88: 105-115.
Ogata Y. (1981). On Lewis’ simulation method for point processes. IEEE Transactions on Information Theory,
IT-27 (1): 23-31.
Ogata, Y. (1988), Statistical model for earthquake occurrences and residual analysis for point processes. J. Am.
Stat. Assoc., 83, 9-27.
second-order residual analysis of point processes 21
Ogata, Y. (1998), Space-time point-process models for earthquake occurrences, Ann. Inst. Stat. Math., 50,
379-402.
Ogata, Y. (2004), Space-time model for regional seismicity and detection of crustal stress changes, J. Geo-
phys. Res., 109, B03308, doi:10.1029/2003JB002621.
Ogata, Y., K. Katsura and M. Tanemura (2003). Modelling heterogeneous space-time occurrences of earthquake
and its residual analysis, Appl. Stat. (J. Roy. Statist. Soc, Ser. C), 52, 499-509.
Ogata, Y. and J. Zhuang (2006), Space-time ETAS models and an improved extension, Tectonophysics, . Tectono-
physics. 413, 13-23.
Omori F., (1894). On after-shocks of earthquakes. J. Coll. Sci. Imp. Univ. Tokyo, 7, 111 – 200.
Papangelou F. (1972). Integrability of expected increments point processes and a related random change of scale.
Trans. Mer. Math. Soc.,165, 483-506.
Rathbun, S. L. (1993), Modelling marked spatio-temporal point patterns. Bull. Int. Stat. Inst., 55, Book 2,
379-396.
Schoenberg F. P. (2004). Multidimensional residual analysis of point process models for earthquake occurrences.
Journal of the American Statistical Association, 98, 789-795, doi: 10.1198/016214503000000710.
Stone, M. (1977), Asymptotics for and against cross-validation. Biometrika, 64, 29-35.
Stoyan D. and Grabarnik P. (1991). Second-order characteristics for stochastic structures connected with Gibbs
point processes. Mathematische Nachritchten, 151, 95-100.
Utsu T. (1969) Aftershock and earthquake statistics (I): Some para meters which characterize an aftershock
sequence and their interrelations. Journal o f the Faculty of Science, Hokkaido University, Series VII (Geo-
physics), 3: 129–19 5.
22 J. Zhuang
Vere-Jones D. and Schoenberg F. P. (2004). Rescaling marked point processes. Australian & New Zealand Journal
of Statistics, 46 (1): 133-143.
Yamanaka, Y. and K. Shimazaki (1990), Scaling relationship between the number of aftershocks and the size of
the main shock, J. Phys. Earth, 38, 305-324.
Zhuang J., Ogata Y. and Vere-Jones D. (2002). Stochastic declustering of space-time earthquake occurrences.
Journal of the American Statistical Association, 97: 369-380.
Zhuang J., Ogata Y., Vere-Jones D. (2004). Analyzing earthquake clustering features by using stochastic recon-
struction. Journal of Geophysical Research, 109, No. B5, B05301, doi:10.1029/2003JB002879.
Zhuang J., Chang C.-P., Ogata Y. and Chen Y.-I. (2005). A study on the background and clustering seismic-
ity in the Taiwan region by using a point-process model. Journal of Geophysical Research, 110: B05S18,
doi:10.1029/2004JB003157.
Zhuang J. and Ogata Y. (2006) Properties of the probability distribution associated with the largest event
in an earthquake cluster and their implications to foreshocks. Accepted by Physics Review, E. (e-print:
http://bemlar.ism.ac.jp/zhuang/publication.htm)
A. Point processes and conditional intensities
Consider a space-time point process consisting of events occurring at times ti in the interval [0, T ] and at
corresponding locations xi in a region X ⊂ Rd. Assume that the marginal temporal point process is orderly; i.e.,
the probability that more than one event occurs in [t, t + δ) × B is o(δ) for all t ≥ 0 and bounded B ⊂ X . Such
space-time point processes can be defined through random counting measures N on [0,∞) × X . Here, N(C × B)
is the number of events falling in a region B ∈ X and at times in a set C ∈ T , where X and T are the Borel
σ-algebras of subsets of X and [0,∞), respectively. Let Φ be the collection of locally bounded counting measures
second-order residual analysis of point processes 23
N on [0,∞)×X and A be the σ-algebra on Φ generated by N((r, t]×B) : B ∈ X , 0 ≤ r ≤ t. Then a space-time
point process N is a measurable mapping of a probability space (Ω,F , P ) onto (Φ,A).
For any measurable B ∈ X , there exists an F -compensator A(t, B) such that N([0, t) × B) − A(t, B) is an
F -martingale. Let ℓ and ℓd denote Lebesgue measure in R and Rd, respectively. Suppose that, for each x ∈ X ,
there exists an integrable, non-negative, F -adapted process λ(t, x) such that, with probability 1, for all t ∈ R+
and B ∈ X ,∫
B
∫ t
0
λ(s, x) ℓ(ds) ℓd(dx) = A(t, B).
The conditions and justification for the existence of such a process can be found, for example, in Vere-Jones and
Schoenberg (2004). If it exists, λ(t, x) is called an F -conditional intensity, satisfying
∫
B
λ(t, x)ℓd(dx) = lim∆t↓0
E[N(t + ∆t, B) − N(t, B) | Ft]
∆t.
B. Second-order predictability and second-order residuals
Call the sub-σ-algebra of (T⊗
F)⊗
(T⊗
F) generated by sets of the form (s, t] × U × (s′, t′] × V , the F⊗
F-
predictable σ-algebra, denoted by ΨFN
F , where U ∈ Fs, 0 ≤ s ≤ t < ∞, V ∈ Fs, 0 ≤ s′ ≤ t′ < ∞ and T is the
Borel σ-algebra on R+. A process H(t, ω, t′, ω′) is F
⊗
F -predictable when it is ΨFN
F -measurable; that is, for
any A ∈ T , (t, ω, t′, ω) : H(t, ω, t′, ω′) ∈ A ∈ ΨFN
F . In other words, all the F⊗
F -predictable processes can
be generated from processes of the form
I(a,b](t) I(a′,b′](t′) IU (ω) IV (ω′), U ∈ Fa, V ∈ Fa′ ,
by linear combinations and monotone limits. We call the process h(t, t′, ω) = H(t, ω, t′, ω) second-order predictable
if H(t, ω, t′, ω′) is F⊗
F -predictable.
Given a F⊗
F -predictable process H(t, ω, t′, ω′), it is easy to verify that the process can be decomposed as
H(t, ω, t′, ω′) = H0(t, ω, t′, ω′) + H−(t, ω, t′, ω′) + H+(t, ω, t′, ω′) (25)
24 J. Zhuang
where H0(t, ω, t′, ω′) = H(t, ω, t′, ω′) It(t′), and H−(t, ω, t′, ω′) = H(t, ω, t′, ω′) I(t,∞](t
′) is F -predictable with
respect to t′ and ω′ when t and ω are fixed, and H+(t, ω, t′, ω′) = H(t, ω, t′, ω′) I(t′,∞](t) is F -predictable with
respect to t and ω when t′ and ω′ are fixed.
Lemma 1. Using the notation of H− and H+ in (25) for an F⊗
F-predictable process H(t, ω, t′, ω′), given an
F-adapted point process, N(t, ω) with F-conditional intensity λ(t, ω), define
F (t, ω, t′, ω′) ≡ E
[
∫
(t′, ∞)
H−(t, ω, s, ω′)N(ds, ω′)
∣
∣
∣
∣
∣
Ft′
]
(26)
and
G(t, ω, t′, ω′) ≡ E
[
∫
(t,∞)
H+(s, ω, t′, ω′)N(ds, ω)
∣
∣
∣
∣
∣
Ft
]
.
Then
F (t, ω, t′, ω′) = E
[
∫
(t′,∞)
H−(t, ω, s, ω′)λ(s, ω′) ds
∣
∣
∣
∣
∣
Ft′
]
(27)
and
G(t, ω, t′, ω′) = E
[
∫
(t,∞)
H+(s, ω, t′, ω′)λ(s, ω) ds
∣
∣
∣
∣
∣
Ft
]
, (28)
assuming that the integrals in the right sides of (27) and (28) exist. Moreover, F (t, ω, t′, ω′) and G(t, ω, t′, ω′) are
both F⊗
F-predictable processes.
Proof: For F in (26), we need only consider the case when t′ > t. For a fixed t and any u > t′, H−(t, ω, u, ω′)I(t′,+∞)(u)
is F -predictable with respect to u. Based on the martingale property associated with the conditional intensity,
J(u) =
∫
(0,u]
H−(t, ω, s, ω′) I(t′,+∞)(s) [N(ds, ω′) − λ(s, ω′)µ(ds)]
is also a zero-mean martingale when t and t′ are fixed, i.e.,
E[J(u) | Ft′ ] = J(t′) = 0, u > t.
second-order residual analysis of point processes 25
i.e.,
E
[
∫
(t′, u]
H−(t, ω, s, ω′)N(ds, ω′)
∣
∣
∣
∣
∣
Ft′
]
= E
[
∫
(t′, u]
H−(t, ω, s, ω′)λ(s, ω′)µ(ds)
∣
∣
∣
∣
∣
Ft′
]
.
Since the conditional intensity measure is diffuse, the integral in the right-hand side of the above equation is
continuous with respect to t′. Thus, if u → +∞, for fixed t and ω, F (t, ω, t′, ω′) is left-continuous and adopted to
Ft′ . From Bremaud (1980, Theorem T5, Chapter I.3), F (t, ω, t′, ω′) is F -predictable with respect to (t′, ω′).
Moreover, if H−(t, ω, t′, ω′) takes a form of I(a,b] IU (ω) I(a′, b′](t′) IV (ω′), U ∈ Fa, V ∈ Fa′ , then F takes the
form I(a,b] IU (ω) f(t′, ω′) where f(t′, ω′) is F -predictable. That is to say, F is F⊗
F -predictable.
The related conclusions for G can similarly be proved. 2
Proposition 2. Assume that a second-order F-predictable process H(t, ω, t′, ω) can be similarly decomposed
into H−(t, ω, t′, ω), H+(t, ω, t′, ω) and H0(t, ω, t′, ω) as in (25). Then
E
[∫∫
D
H−(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
[∫∫
D
H−(t, ω, t′, ω, )λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)
]
(29)
if the integral on the right-hand side exists, where D is a measurable subset of T × T .
Proof: Notice
I ≡ E
[∫∫
D
H−(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
∫
dom D
E
[∫ +∞
0
H−(t, ω, t′, ω) ID(t)(t′)N(dt′, ω)
∣
∣
∣
∣
Ft
]
N(dt, ω)
where domD = t : ∃t′ such that (t, t′) ∈ D and D(t) = t′ : (t, t′) ∈ D. From Lemma 1,
F (t, ω, t′, ω′) ≡ E
[∫ +∞
0
H−(t, ω, s, ω′) ID(t′)(s)N(ds, ω′)
∣
∣
∣
∣
Ft′
]
is F⊗
F -predictable, implying that
F (t, ω) ≡ F (t, ω, t, ω) = E
[∫ +∞
0
H−(t, ω, t′, ω) ID(t)(t′)N(dt′, ω)
∣
∣
∣
∣
Ft
]
26 J. Zhuang
is F -predictable.
Thus,
I = E
[∫
domD
F (t, ω)N(dt, ω)
]
= E
[∫
domD
F (t, ω)λ(t, ω)µ(dt)
]
.
On the other hand, from Lemma 1, the right-hand side of (29)
E
∫∫
D
H−(t, ω, t′, ω)λ(t′, ω)λ(t, ω)µ(dt′)µ(dt)
= E
∫
dom D
E
[
∫
D(t)
H−(t, ω, t′, ω)λ(t′, ω)µ(dt′)
∣
∣
∣
∣
∣
Ft
]
λ(t, ω)µ(dt)
= E
[∫
dom D
F (t, ω)λ(t, ω)µ(dt)
]
.
2
Similarly, (29) holds if we replace H− by H+, i.e.,
E
[∫∫
D
H+(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
[∫∫
D
H+(t, ω, t′, ω, )λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)
]
. (30)
When t = t′, H0(t, ω) ≡ H0(t, ω, t′, ω) is a F -predictable process and N(dt, ω)N(dt′, ω) = N(dt, ω); when
t 6= t′, H0(t, ω, t′, ω) = 0. Thus
E
[∫∫
D
H0(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
[∫
diag D
H0(t, ω, t, ω)N(dt, ω)
]
= E
[∫
diag D
H0(t, ω, t, ω)λ(dt, ω)µ(dt)
]
, (31)
where diag D = t : (t, t) ∈ D.
Combining (29), (30) and (31), we have
Theorem 3. For any second-order predictable process H(t, ω, t′, ω), given a point process N(ω) with conditional
intensity λ(t, ω), the equalities
E
[∫∫
D
H(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
[∫∫
D
H(t, ω, t′, ω)λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)
]
+ E
[∫
diag D
H0(t, ω, t, ω)λ(dt, ω)µ(dt)
]
,
second-order residual analysis of point processes 27
and
E
[
∫∫
D\diag2 D
H(t, ω, t′, ω)N(dt, ω)N(dt′, ω)
]
= E
[∫∫
D
H(t, ω, t′, ω)λ(t, ω)λ(t′, ω)µ(dt′)µ(dt)
]
,
hold if the integrals on the right-hand side exist, where diag2 D = (t, t) ∈ D.
C. Estimation procedures for the ETAS model (Zhuang et al., 2002, 2004)
(a) Set up the initial background seismicity rate, for example, let u(x) = 1.
(b) Set
λ(t, x) = γ(s)
[
ν u(x) +∑
i:ti<t
κ(si) g(t − ti) f(x − xi | si)
]
,
and estimate ν and the other model parameters by maximizing the log-likelihood function
log L =∑
(ti,xi,si)∈D
log λ(ti, xi, si) −
∫∫∫
D
λ(t, x, s) dt dxds,
where D is a specified spatio-temporal region of interests.
(c) For each event i, set
ϕi = ν u(xi) γ(si)/λ(ti, xi, si). (32)
(d) Get a better estimate of the background rate by using the weighted kernel estimates
u(x) ∝∑
i
ϕi Z(x − xi, hi) (33)
where Z represent the gaussian kernel function, the bandwidth hi is the distance to the npth closest events
to i or is a threshold bandwidth if this distance is less than the threshold bandwidth.
(e) Replace the background rate by this better one, and return to Step 2. Repeat until the results converge.
28 J. Zhuang
130 135 140 145 150
3035
4045 (a)
Longitude
Latit
ude
0 5000 10000 15000 20000 25000
3035
4045 (b)
Time in days
Latit
ude
Fig. 1. Occurrence locations and times of shallow earthquakes in the central Japan area. (a) Epicentral locations. The polygonrepresents the target region. (b) Epicentral latitudes against occurrence times. Black and gray circles are target events andcomplementary events, respectively. Different sizes of circles show the magnitude of earthquakes from 4.2 to 8.1.
second-order residual analysis of point processes 29
Magnitude
Trig
gerin
g ab
ilitie
s
5 6 7 8
0.05
0.1
0.2
0.5
12
510
2050
All eventsTriggeredBackgroundTheoretical
Fig. 2. Reconstructed triggering abilities (average number of children triggered by events of same magnitude) for all the eventsK
(1)(s), background events κ(1)0 (s) and triggered events κ
(1)1 (s), as in (14), (20) and (21), respectively (Zhuang et al., 2004).
The solid line represents the initial κ(s) = K(0)(s) = κ
(0)0 (s) = κ
(0)1 (s) obtained by MLE from fitting model (1) to the JMA
catalogue. Magnitude 4.2 on the horizontal axes corresponds to s = 0.
30 J. Zhuang
(a)
Magnitude
Sca
ling
Fac
tor
0.00
10.
010.
1
5 6 7 8
(b)
MagnitudeS
calin
g F
acto
r0.
001
0.01
0.1
5 6 7 8
(c)
Magnitude
Sca
ling
Fac
tor
0.00
10.
010.
1
5 6 7 8
(d)
Magnitude
Sca
ling
Fac
tor
0.00
10.
010.
1
5 6 7 8
Fig. 3. Reconstructed results of σ(1)(s) against the corresponding magnitudes (Magnitude 4.2 corresponds to s = 0) of parent
earthquakes (Ogata and Zhuang, 2006). Circles indicate the values of σ(1)(s). (a) shows the reconstructed results from the
original model for the JMA catalogue. (b) is the same as (a), but the catalogue is a simulation of the original model. (c)Reconstructed results from the model equipped with (24). (d) is the same as (c), but the catalogue is a simulation of the modelequipped with (24). In (c) and (d), the solid lines represent Ce
αs and the dashed lines represent Ceα′s.