View
4
Download
0
Category
Preview:
Citation preview
Unsupervised Approaches for Post-Processing in Computationally EfficientWaveform-Similarity-Based Earthquake Detection
Karianne Bergen1, Clara Yoon2, Ossian O’Reilly2, Gregory Beroza2
1Institute for Computational and Mathematical Engineering, Stanford University, 2Department of Geophysics, Stanford University email: kbergen@stanford.edu
Introduction
Fingerprint and Similarity Thresholding (FAST) promises to allow large-scaleblind search for similar waveforms in long-duration continuous seismic data [1].n Waveform similarity search applied to datasets of months to years of data will
identify significantly more low-magnitude events than traditional methods forearthquake detection.
n New approaches for processing the output from similarity-based detection arerequired - manual inspection is infeasible for large data volumes.
n We explore data mining techniques for improved detection post-processing.
FAST: Method Overview
FAST is inspired by the Waveprint [2] algorithm for identifying audio clips, adaptedto continuous seismic waveform data.
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
Sign of top wavelet coefficients, window #1267
0 20 40 600
5
10
15
20
25
30
−1
0
1
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
log10(|Haar transform|), window #1267
0 20 40 600
5
10
15
20
25
30
−5
0
5
fingerprint x index
finge
rprin
t y in
dex
Binary fingerprints, window #1267
0 20 40 600
10
20
30
40
50
60
0
1
Time (s)
Freq
uenc
y (H
z)
log10(|spectral image|), window #1267
0 2 4 6 8 100
2
4
6
8
10
−5
0
5
Preprocessing:spectrogram(a.erbandpassfiltering)
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
Sign of top wavelet coefficients, window #1267
0 20 40 600
5
10
15
20
25
30
−1
0
1
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
log10(|Haar transform|), window #1267
0 20 40 600
5
10
15
20
25
30
−5
0
5
fingerprint x index
finge
rprin
t y in
dex
Binary fingerprints, window #1267
0 20 40 600
10
20
30
40
50
60
0
1
Time (s)
Freq
uenc
y (H
z)
log10(|spectral image|), window #1267
0 2 4 6 8 100
2
4
6
8
10
−5
0
5
Data:con6nuous6meseriesdata
140 160 180 200 220 240 260 280 300
-0.6
-0.4
-0.2
0
0.2
0.4
A
140 160 180 200 220 240 260 280 300
-0.6
-0.4
-0.2
0
0.2
0.4
B
Detec1onResults
Post-Processing
§ Iden6fyingevents§ Combiningovernetwork§ Removingfalseposi6ves§ Clusteringwaveforms
( , ) ( , )
( , )
( , )
DatabaseGenera1on&Search
Fastapproximatesimilaritysearchusing§ MinHashand§ LocalitySensi6veHashing
FASTAlgorithmicPipeline
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
Sign of top wavelet coefficients, window #1267
0 20 40 600
5
10
15
20
25
30
−1
0
1
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
log10(|Haar transform|), window #1267
0 20 40 600
5
10
15
20
25
30
−5
0
5
fingerprint x index
finge
rprin
t y in
dex
Binary fingerprints, window #1267
0 20 40 600
10
20
30
40
50
60
0
1
Time (s)
Freq
uenc
y (H
z)
log10(|spectral image|), window #1267
0 2 4 6 8 100
2
4
6
8
10
−5
0
5
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
Sign of top wavelet coefficients, window #1267
0 20 40 600
5
10
15
20
25
30
−1
0
1
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
log10(|Haar transform|), window #1267
0 20 40 600
5
10
15
20
25
30
−5
0
5
fingerprint x index
finge
rprin
t y in
dex
Binary fingerprints, window #1267
0 20 40 600
10
20
30
40
50
60
0
1
Time (s)
Freq
uenc
y (H
z)
log10(|spectral image|), window #1267
0 2 4 6 8 100
2
4
6
8
10
−5
0
5
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
Sign of top wavelet coefficients, window #1267
0 20 40 600
5
10
15
20
25
30
−1
0
1
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
log10(|Haar transform|), window #1267
0 20 40 600
5
10
15
20
25
30
−5
0
5
fingerprint x index
finge
rprin
t y in
dex
Binary fingerprints, window #1267
0 20 40 600
10
20
30
40
50
60
0
1
Time (s)
Freq
uenc
y (H
z)
log10(|spectral image|), window #1267
0 2 4 6 8 100
2
4
6
8
10
−5
0
5
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
Sign of top wavelet coefficients, window #1267
0 20 40 600
5
10
15
20
25
30
−1
0
1
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
log10(|Haar transform|), window #1267
0 20 40 600
5
10
15
20
25
30
−5
0
5
fingerprint x index
finge
rprin
t y in
dex
Binary fingerprints, window #1267
0 20 40 600
10
20
30
40
50
60
0
1
Time (s)
Freq
uenc
y (H
z)
log10(|spectral image|), window #1267
0 2 4 6 8 100
2
4
6
8
10
−5
0
5
FeatureExtrac1on
SpectralImage
Topcoefficients(mostdiscrimina-ve)
BinaryFingerprint
HaarTransform
n Database search returns list of “candidate pairs” - post-processing is necessaryto eliminate non-earthquakes (false positives, correlated noise)
Event Identification and Network Detection
How do we identify earthquakes from waveform pairs returned by FAST?
0.9880.975
0.970
event1
event2
n Output of FAST(single channel): sparse matrix - (candidate) pairs of similar waveformsn Single event pairs often result in multiple detections: time-adjacent windows overlapn Multiple (sequential) detections of a single event pair appear along a diagonal line (fixed
inter-event time ∆t) in similarity matrixn Link all detections for each event pair for improved thresholding
How do we combine single-station detection results from FAST over a network of seismic stations?
n Network detection can improve detection sensitivityn Limited move-out (multiple channels at single sta-
tion or nearby stations): sum single-channel similar-ity matrices → network similarity matrix
n Challenge: move-out varies between stations and isunknown a priori in blind search
n Inter-event time is uniform across network for agiven event pair
n Pseudo-association: group detections by inter-event time (diagonal) across multiple stations
Data set: Iquique foreshocks, 2014-03-21 Time (s), from 831580 20 40 60
PSGCX
PB11
PB08
PB01
PATCX
Time (s), from 840750 20 40 60
CC"="0.627""
CC"="0.792"
CC"="0.814"
CC"="0.775"
CC"="0.829"
Waveforms of event pair recordedacross multiple stations
83160 83180 83200 83220
84080
84100
84120
84140 0
0.1
0.2
0.3
0.4
0.5
0.6
>0.7
!meindex1
!meinde
x2
SummedNetworkSimilarity
PB01PB08
PATCX
2sta!onsPSGCXPB11
Similarity matrix: event pair detected across multiple stationsappears along same diagonal, but with minimal temporal overlap
Clustering Waveforms
Clustering is a set of techniques for identifying groups of similar waveforms within the full set of detections returned by FAST, which can be used to:n Organize detection results for easier interpretation (i.e. find interesting structure/patterns in the data),n Identify new template waveforms for template matching or subspace detection, andn Remove additional false alarms (e.g. outliers, non-earthquake clusters)
Application: Guy-Greenbrier Fault, central Arkansas
n FAST detects 746 new earthquakes that were not identified by templatematching in one month of data (July 2010) at station WHAR [3]
n Similarity matrix for new detections has a block-like structure - apply spectralclustering to identify 8 broad waveform clusters
1
234
5
6
78
3-channeleventsimilari0es(normalizedCC) 3-channeleventsimilari0es(normalizedCC)
eventindex1 eventindex1(reordered)
even
tind
ex2
even
tind
ex2(reo
rdered
)
Representative waveforms (three-component) from each cluster
WHAR.HHE WHAR.HHN WHAR.HHZ
*me(s)
cluster2
cluster3
cluster4
cluster5
cluster6
cluster7
cluster8
*me(s)*me(s)0.0 4.02.00.0 4.02.00.0 4.02.0
cluster1
n Reclustering within large clusters can identify repre-sentative waveforms or small clusters, e.g. cluster 8
n e.g. Hierarchical clustering (complete-linkage)identifies representative waveforms within clusters
(Right) Clustering can aid in visualization and interpretation of alarge number of new detections: cluster membership of new FASTdetections plotted over time. Injection began at well #1, closest tothe Guy-Greenbrier Fault, on 7 July 2010 (at 518400s in figure).
!me(s)from2010-07-0100:00:00.00
1.0
0.8
0.6
0.4
0.2
0
similarity(m
axim
umnormalize
dCC
)
0 0.5×106 1.5×106 2.0×1061.0×106 2.5×106
Feature Extraction
“Good” feature extraction can reduce false detectionsn Binary fingerprints act as proxies for waveforms in efficient similarity searchn Fingerprints must be discriminative: (dis)similar waveforms should have
(dis)similar fingerprintsn False detections preferred to missed detections, but too many hurt performance
How are “most discriminative” Haar coefficients selected?
n Top magnitude coefficients (often used for efficient compression)n Most atypical coefficients, as measured by:
n Z-score (mean, standard deviation), orn Median Absolute Deviation (MAD) across data set
n MAD-based Haar coefficient selection demonstrates the best performancein low SNR settings and is most efficient.
50 100 150 200 250 300 350-20
-10
0
10
20
50 100 150 200 250 300 350
-20-1001020
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
10 20 30 40 50 60
10
20
30
40
50
60
TopMagnitude TopZ-score TopMAD
noisesample1
noisesample2
Synthetic Test
Comparison of the performance of Haar coefficient-selection methods on synthetictest. The MAD-based coefficient selection best separates the repeated waveformsfrom the noise.
(Right) Test data (a): 12 pairs of repeatedwaveforms (SNR 1.25-5) planted at knowntimes in 3hrs of noise (bandpass 1-10Hz).Detection results from FAST shown for (b)top magnitude, (c) top Z-score, and (d) topMAD Haar coefficients. Location of truerepeated events indicated by orange verti-cal lines, and the detection statistic (simi-larity value) is plotted in blue. Top 400 co-efficients selected in results pictured, butresults hold for top 100-800 coefficients.
% bits in binary fingerprint (cumulative)0 0.2 0.4 0.6 0.8 1
frequ
ency
of c
oeffi
cien
t act
ivat
ion
(nor
mal
ized
)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1top 400 Haar coefficients in magnitudetop 400 standarized Haar coefficients (Z-score)top 400 standarized Haar coefficients (MAD)ideal line for perfectly efficient representation
% bits in binary fingerprint (cumulative)0 0.2 0.4 0.6 0.8 1
frequ
ency
of c
oeffi
cien
t act
ivat
ion
(nor
mal
ized
)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1top 400 Haar coefficients in magnitudetop 400 standarized Haar coefficients (Z-score)top 400 standarized Haar coefficients (MAD)ideal line for perfectly efficient representation
(a)
(b)
(c)
(d)
0 100002000 4000 6000 8000.me(s)
0 100002000 4000 6000 8000.me(s)
0 100002000 4000 6000 8000.me(s)
0
-40
-80
40
80
similarityvalue
0.4
0.2
0
1.0
0.6
0.8
similarityvalue
0.4
0.2
0
1.0
0.6
0.8
0 100002000 4000 6000 8000.me(s)
similarityvalue
0.4
0.2
0
1.0
0.6
0.8
(Left) Efficiency of binary representations (orderedfrom least to most efficient): top magnitude (blue),top Z-score (orange) and top MAD (purple), withGini index of 0.73, 0.28, and 0.11, respectively.
Alternate Feature Extraction Approaches (on-going work)
n Time-domain features: bag-of-waveforms, wavelets, random projections,n Data-driven features: spectral hashing, shift-invariant sparse coding,
nonnegative matrix factorization (NMF)-based features
References
[1] Yoon, C., et al. (2015). “Earthquake detection through computationallyefficient similarity search.” Science Advances, 1(11).
[2] Baluja, S., and Covell, M. (2008). “Waveprint: Efficient wavelet-basedaudio fingerprinting.” Pattern Recognition, 41(11).
[3] Yoon, C. et al., (2015) AGU Fall Meeting Abstract S13B-2850.ReadmoreaboutFAST(doi:10.1126/sciadv.1501057)
Recommended