678)$6789$6:;.4!0&>$6+?

Unsupervised Approaches for Post-Processing in Computationally EfficientWaveform-Similarity-Based Earthquake Detection

Karianne Bergen1, Clara Yoon2, Ossian O’Reilly2, Gregory Beroza2

1Institute for Computational and Mathematical Engineering, Stanford University, 2Department of Geophysics, Stanford University email: kbergen@stanford.edu

Introduction

Fingerprint and Similarity Thresholding (FAST) promises to allow large-scaleblind search for similar waveforms in long-duration continuous seismic data [1].n Waveform similarity search applied to datasets of months to years of data will

identify significantly more low-magnitude events than traditional methods forearthquake detection.

n New approaches for processing the output from similarity-based detection arerequired - manual inspection is infeasible for large data volumes.

n We explore data mining techniques for improved detection post-processing.

FAST: Method Overview

FAST is inspired by the Waveprint [2] algorithm for identifying audio clips, adaptedto continuous seismic waveform data.

wavelet transform x index

Sign of top wavelet coefficients, window #1267

0 20 40 600

log10(|Haar transform|), window #1267

0 20 40 600

fingerprint x index

t y in

Binary fingerprints, window #1267

0 20 40 600

Time (s)

log10(|spectral image|), window #1267

0 2 4 6 8 100

Preprocessing:spectrogram(a.erbandpassfiltering)

0 20 40 600

fingerprint x index

t y in

0 20 40 600

Time (s)

0 2 4 6 8 100

Data:con6nuous6meseriesdata

140 160 180 200 220 240 260 280 300

Detec1onResults

Post-Processing

§  Iden6fyingevents§  Combiningovernetwork§  Removingfalseposi6ves§  Clusteringwaveforms

( , ) ( , )

DatabaseGenera1on&Search

Fastapproximatesimilaritysearchusing§ MinHashand§  LocalitySensi6veHashing

FASTAlgorithmicPipeline

0 20 40 600

fingerprint x index

t y in

0 20 40 600

Time (s)

0 2 4 6 8 100

0 20 40 600

fingerprint x index

t y in

0 20 40 600

Time (s)

0 2 4 6 8 100

0 20 40 600

fingerprint x index

t y in

0 20 40 600

Time (s)

0 2 4 6 8 100

0 20 40 600

fingerprint x index

t y in

0 20 40 600

Time (s)

0 2 4 6 8 100

FeatureExtrac1on

SpectralImage

Topcoefficients(mostdiscrimina-ve)

BinaryFingerprint

HaarTransform

n Database search returns list of “candidate pairs” - post-processing is necessaryto eliminate non-earthquakes (false positives, correlated noise)

Event Identification and Network Detection

How do we identify earthquakes from waveform pairs returned by FAST?

0.9880.975

event1

event2

n Output of FAST(single channel): sparse matrix - (candidate) pairs of similar waveformsn Single event pairs often result in multiple detections: time-adjacent windows overlapn Multiple (sequential) detections of a single event pair appear along a diagonal line (fixed

inter-event time ∆t) in similarity matrixn Link all detections for each event pair for improved thresholding

How do we combine single-station detection results from FAST over a network of seismic stations?

n Network detection can improve detection sensitivityn Limited move-out (multiple channels at single sta-

tion or nearby stations): sum single-channel similar-ity matrices → network similarity matrix

n Challenge: move-out varies between stations and isunknown a priori in blind search

n Inter-event time is uniform across network for agiven event pair

n Pseudo-association: group detections by inter-event time (diagonal) across multiple stations

Data set: Iquique foreshocks, 2014-03-21 Time (s), from 831580 20 40 60

Time (s), from 840750 20 40 60

CC"="0.627""

CC"="0.792"

CC"="0.814"

CC"="0.775"

CC"="0.829"

Waveforms of event pair recordedacross multiple stations

83160 83180 83200 83220

84140 0

!meindex1

!meinde

SummedNetworkSimilarity

PB01PB08

2sta!onsPSGCXPB11

Similarity matrix: event pair detected across multiple stationsappears along same diagonal, but with minimal temporal overlap

Clustering Waveforms

Clustering is a set of techniques for identifying groups of similar waveforms within the full set of detections returned by FAST, which can be used to:n Organize detection results for easier interpretation (i.e. find interesting structure/patterns in the data),n Identify new template waveforms for template matching or subspace detection, andn Remove additional false alarms (e.g. outliers, non-earthquake clusters)

Application: Guy-Greenbrier Fault, central Arkansas

n FAST detects 746 new earthquakes that were not identified by templatematching in one month of data (July 2010) at station WHAR [3]

n Similarity matrix for new detections has a block-like structure - apply spectralclustering to identify 8 broad waveform clusters

3-channeleventsimilari0es(normalizedCC) 3-channeleventsimilari0es(normalizedCC)

eventindex1 eventindex1(reordered)

ex2(reo

rdered

Representative waveforms (three-component) from each cluster

WHAR.HHE WHAR.HHN WHAR.HHZ

*me(s)

cluster2

cluster3

cluster4

cluster5

cluster6

cluster7

cluster8

*me(s)*me(s)0.0 4.02.00.0 4.02.00.0 4.02.0

cluster1

n Reclustering within large clusters can identify repre-sentative waveforms or small clusters, e.g. cluster 8

n e.g. Hierarchical clustering (complete-linkage)identifies representative waveforms within clusters

(Right) Clustering can aid in visualization and interpretation of alarge number of new detections: cluster membership of new FASTdetections plotted over time. Injection began at well #1, closest tothe Guy-Greenbrier Fault, on 7 July 2010 (at 518400s in figure).

!me(s)from2010-07-0100:00:00.00

similarity(m

umnormalize

0 0.5×106 1.5×106 2.0×1061.0×106 2.5×106

Feature Extraction

“Good” feature extraction can reduce false detectionsn Binary fingerprints act as proxies for waveforms in efficient similarity searchn Fingerprints must be discriminative: (dis)similar waveforms should have

(dis)similar fingerprintsn False detections preferred to missed detections, but too many hurt performance

How are “most discriminative” Haar coefficients selected?

n Top magnitude coefficients (often used for efficient compression)n Most atypical coefficients, as measured by:

n Z-score (mean, standard deviation), orn Median Absolute Deviation (MAD) across data set

n MAD-based Haar coefficient selection demonstrates the best performancein low SNR settings and is most efficient.

50 100 150 200 250 300 350-20

50 100 150 200 250 300 350

-20-1001020

10 20 30 40 50 60

TopMagnitude TopZ-score TopMAD

noisesample1

noisesample2

Synthetic Test

Comparison of the performance of Haar coefficient-selection methods on synthetictest. The MAD-based coefficient selection best separates the repeated waveformsfrom the noise.

(Right) Test data (a): 12 pairs of repeatedwaveforms (SNR 1.25-5) planted at knowntimes in 3hrs of noise (bandpass 1-10Hz).Detection results from FAST shown for (b)top magnitude, (c) top Z-score, and (d) topMAD Haar coefficients. Location of truerepeated events indicated by orange verti-cal lines, and the detection statistic (simi-larity value) is plotted in blue. Top 400 co-efficients selected in results pictured, butresults hold for top 100-800 coefficients.

% bits in binary fingerprint (cumulative)0 0.2 0.4 0.6 0.8 1

1top 400 Haar coefficients in magnitudetop 400 standarized Haar coefficients (Z-score)top 400 standarized Haar coefficients (MAD)ideal line for perfectly efficient representation

% bits in binary fingerprint (cumulative)0 0.2 0.4 0.6 0.8 1

1top 400 Haar coefficients in magnitudetop 400 standarized Haar coefficients (Z-score)top 400 standarized Haar coefficients (MAD)ideal line for perfectly efficient representation

0 100002000 4000 6000 8000.me(s)

similarityvalue

0 100002000 4000 6000 8000.me(s)

similarityvalue

(Left) Efficiency of binary representations (orderedfrom least to most efficient): top magnitude (blue),top Z-score (orange) and top MAD (purple), withGini index of 0.73, 0.28, and 0.11, respectively.

Alternate Feature Extraction Approaches (on-going work)

n Time-domain features: bag-of-waveforms, wavelets, random projections,n Data-driven features: spectral hashing, shift-invariant sparse coding,

nonnegative matrix factorization (NMF)-based features

References

[1] Yoon, C., et al. (2015). “Earthquake detection through computationallyefficient similarity search.” Science Advances, 1(11).

[2] Baluja, S., and Covell, M. (2008). “Waveprint: Efficient wavelet-basedaudio fingerprinting.” Pattern Recognition, 41(11).

[3] Yoon, C. et al., (2015) AGU Fall Meeting Abstract S13B-2850.ReadmoreaboutFAST(doi:10.1126/sciadv.1501057)

678)$6789$6:;.4!0&>$6+?

Documents

The CITRUS SPIEL · Tel: (407) 678-6789 German Excellence Page 5 President’s Update by Lyn Perez - Club President Most of you have heard that during our February Drivers Education

678 9: ) ! / # # 678 ;-9: ) ! / # # 678 8< 9: ) ! / # # 678 - 9: !# * # ' ! / # # 678 9: # ' ! / # # 678 ;-9: # ' ! / # # 678 8

0&,/1&'021./0 ),/13445& 6789:678; · Middle School Parent and Student Handbook • 2017-2018 • Page 2 Admissions, Financial Aid, and External Relations Office – 5601 Covington

APPENDIX A Notice of Preparation and Comment Letters · APN 678-200-19-00 APN 678-200-16-00 APN 678-200-19-00 APN 678-200-21-00 APN 678-200-04-00 APN 678-200-01-00 MAIN BUILDING

Computationally Efficient MIMO HSDPA System ... - … · PublisherInfo PublisherName : Springer International Publishing PublisherLocation : Cham PublisherImprintName : Springer Computationally

EMERGENCY: 516-463-6789 - Hofstra University

0'12345 6789:;< =>?(+,@AB@C#DEFGH

Geriatrics 6789

Computationally Related Problems

($@A!(!@BC!D,&:,E · 2016. 7. 31. · !!2345 01. 0./ !"!$(%)!+$,-!!!!!? !"!6789:,;! !!!!! !?! "!6789:,;!+$,-!!!!!

Making computer vision computationally efficient

89:; MoMAExh 0415 MasterChecklist · ()+,-+ ). /" 0-12,+" #345"!"#!$% 6789 :; 9?9" ()+,-+ ). ,1 ,+ @)660@+) #34$"!"#!$& '()+% ' 6789 :; 278=; (7a=bca8d

nask.man.ac.ukhome$DesktopAAMs etcA computationally

Tollfree 1800 303-6789 Support For Norton

The CITRUS SPIEL...Tel: (407) 678-6789 German Excellence Page 5 · Enthusiast of the Year: Jim LaPiana. For serving as Secretary, Webmaster, Insurance Coordinator, Chili Cook-off Master

0 12345/6789:; !#$%&’()*

TOWER 6789

Computationally Equivalent Elimination of Conditions

TEL.0123-45-6789 小スペース広告ご出稿ガイド · 2020. 8. 17. · 日経商事TEL.0123-45-6789 日経物産 TEL.0123-45-6789 日経商事TEL.0123-45-6789 お問い合せは

Simple and Computationally Efficient ... - eprints.soton.ac.uk