69
Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Embed Size (px)

Citation preview

Page 1: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Network analysis and statistical issues

Lucio Baggio

An introductive seminar to ICRR’s GW group

Page 2: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Topics of this presentation

Setting confidence intervals

False discovery probability

Gravitational wave bursts networksFrom the single detector to a worldwide network

IGEC (International GW Collaboration)Long-term search with four detectors; directional search and statistical issues

From raw data to probability statements; likelihood/Byesian vs frequentist methods

Multiple tests and large surveys change the overall confidence of the first detection

Miscellaneous topicsThe LIGO-AURIGA white paper on joint data analysis;problems with non-aligned or different detectors; coherent data analysis.

Page 3: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Network analysis is unavodable, as far as

background estimation is concerned

Page 4: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Gravitational wave burst events

For fast (~1÷10ms) gw signals the impulse response of the optimal filter for the signal amplitude is an exponentially damped oscillation

Even at a very low amplitude the signals from astrophysical sources are expected to be rare.

A candidate event in the gravitational wave channel is any single extreme value in a more or less constant time window.

Background events come from the extreme distribution for an (almost) Gaussian stochastic process

Page 5: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Amplitude distribution of eventsAURIGA, Jun 12-21 1997

The background in practice (1)

vetoed (2 test)

simulation (gaussian)

L. Baggio et al.

2 testing of optimal filters for gravitational wave signals: an experimental implementation.

Phys. Rev. D, 61:102001–9, 2000

Page 6: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Amplitude distribution of eventsAURIGA Nov. 13-14, 2004

Remaining events after vetoingvetoed glitches

epoch vetoes (50% of time)

cumulative event rate above thresholdfalse alarm rate [hour-1]

after vetoing

The background in practice (2)

Page 7: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Cumulative power distribution of eventsTAMA Nov. 13-14, 2004from the presentation at The 9th Gravitational Wave Data Analysis Workshop (December 15-18, 2004, Annecy, France)

The background in practice (3)

100 101 102 103 104 105 106

10–6

10–5

10–4

10–3

10–2

10–1

100

Event Power Threshold (Pth)

Rat

e [

even

ts/s

ec]

Gau

ssian n

oise

DT9

DT8

DT6

DT9 (before veto)

Page 8: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

8

The background in practice (4)Environmental Monitoring• Try to eliminate locally all possible false signals• Detectors for many possible sources (seismic, acoustic, electromagnetic, muon)• Also trend (slowly-varying) information (tilts, temperature, weather)• Matched filter techniques for `known' signals this can only decrease background (no confidece for not matched signal) but not increase the (unknown) confidence for remaining signals.

Non-coeherent methods

coincidences among detectors (also non-GW: e.g., optical, g-ray , X-ray, neutrino)

Coeherent methods

Correlations

Maximum likelihood (e.g.: weighted average)

Two good reasons for multiple detector analysis

1. the rate of background candidates can be estimated reliably

2. the background rate of the network can be less than that of the single detector

Page 9: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

M-fold coincidence search

A coincidence is defined as a multiple detection on many detectors of triggers with estimated time of arrival so close that there is a common overlap between their time windows tw. The latter are defined by the estimated timing error.

(3)2wt

(2)t (3)t

(1)t

(1)2wt

(2)2wt 1st detector 2nd detector 3rd detector

coincidence! timing error box

(2)t (3)t

(1)t 1st detector 2nd detector 3rd detector

dt2

dt3

The ideal “off-source” measure of the background cannot be truly performed (no way to shield the detector). The surrogate solution consists in computing coincidence search after proper delays dtk (greater than the timing errors) have been applied to event series. Then, the coincidences due to real signals disappear, and only background coincidences are left.

Page 10: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

M-fold coincidence search (2)

( )

1

( )( ) ( )M

kb

k

t tC tThe expected coincidence rate is given by:

1( ) 2

M

wC t M t

( )

1

( ) 2M

hw

k h k

C t t

C(t) depends on the choice of the the time error boxes:

equal and constant vary with detector vary with event

Monte Carlo(by shifted times

resampled statistics)

2E-21 1E-201E-6

1E-5

1E-4

1E-3

0.01

0.1

1

10

AL-AU AL-AU-NA

falsealarm rate

[yr-1]

common search threshold [Hz-1]

From IGEC 1997-2000: example of predicted mean false alarm rates. Notice the dramatic improvement when adding a third detector: the occurrence of a 3-fold coincidence would be interpreted inevitably as a gravitational wave signal.

In practice, when no signal is detected in coincidence, the upper limit is determined by the total observation time

Page 11: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

International networks of GW detectorsInterferometers

Operative:

GEO600 – (Germany/UK)

LIGO Hanford 2km – (USA)

LIGO Hanford 4km – (USA)

LIGO Livingstone 4km – (USA)

TAMA300 – (Japan)

Upcoming:

VIRGO – (Italy/France)

CLIO – (Japan)

Resonant bars

ALLEGRO – (USA)

AURIGA – (Italy)

EXPLORER – (CERN, Geneva)

NAUTILUS – (Italy)

LIGOGEO600 Virgo, AURIGA, NAUTLUS

TAMA300CLIO100

EXPLORER

Page 12: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

International networks of GW detectors

15 years of worldwide networks

1989 – 2 bars, 3 months E. Amaldi et al., Astron. Astrophys. 216, 325 (1989).

1991 – 2 bars, 120 days P. Astone et al., Phys. Rev. D 59, 122001 (1999).

1995-1996 – 2 detectors, 6 monthsP. Astone et al., Astropart. Phys. 10, 83 (1999).

1989 – 2 interferometers, 2 daysD. Nicholson et al., Phys. Lett. A 218, 175 (1996).

1997-2000 – 2, 3, 4 resonant detectors, resp. 2 years, 6 months, 1 month P. Astone et al., Phys. Rev. D 68, 022001 (2003).

2001 – 2 detectors, 11 daysTAMA300-LISM collaboration (2004)Phys. Rev. D 70, 042003 (2004)

2001 – 2 detectors, 90 days P. Astone et al., Class. Quant. Grav 19, 5449 (2002).

2002 – 3 detectors, 17 daysLIGO collaborationB. Abbott et al., Phys. Rev. D 69, 102001 (2004)

1969 -- Argonne National Laboratory and at the University of Maryland J. Weber, Phys. Rev. Lett. 22, 1320–1324 (1969)1973-1974 – Phys. Rev. D 14, 893-906 (1976)

GW detected?If NOT, why?

Page 13: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

The International Gravitational Event

Collaboration

Page 14: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

http://igec.lnl.infn.it

LSU group: ALLEGRO (LSU) http://gravity.phys.lsu.eduLouisiana State University, Baton Rouge - Louisiana

AURIGA group: AURIGA (INFN-LNL) http://www.auriga.lnl.infn.itINFN of Padova, Trento, Ferrara, Firenze, LNLUniversities of Padova, Trento, Ferrara, FirenzeIFN- CNR, Trento – Italia

NIOBE group: NIOBE (UWA) http://www.gravity.pd.uwa.edu.auUniversity of Western Australia, Perth, Australia

ROG group: EXPLORER (CERN) http://www.roma1.infn.it/rog/rogmain.htmlNAUTILUS (INFN-LNF)

INFN of Roma and LNFUniversities of Roma, L’AquilaCNR IFSI and IESS, Roma - Italia

The International Gravitational Event Collaboration

Page 15: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

The IGEC protocol

Detector

DATA ACQUISITION

raw data

evt

> start_anl

evt

evt

evt

evt

Time of arrival

Amplitude SNR

DATA ANALYSIS

The source of IGEC data are different data analysis applied to individual detector outputs.

The IGEC members are only asked to follow a few general guidelines in order to characterize in a consistent way the parameters of the candidate events and the detector status at any time.

Further data conditioning and background estimation are performed in a coordinated way

Page 16: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Exchanged periods of observation 1997-2000

fraction of time in monthly bins

exchange threshold

21 16 10 Hz 21 13 6 10 Hz

21 13 10 Hz

ALLEGRO

AURIGA

NAUTILUS

EXPLORER

NIOBE

Fourier amplitude of burst gw

0 0( ) ( )h Ht tt

arrival time

Page 17: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

The exchanged data

0

1

2

3

4

5

6

7

8

9

10

0 6 12 18 24 30 36 42 48 54 60

time (hours)

gaps

minimum detectable amplitude

(aka exchange threshold)

events amplitude and time of arrival

ampl

itude

(H

z-1·1

0-21)

Page 18: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

M-fold coincidence search (revised)

A coincidence is defined when for all 0<i,j<M t i – t j< tij~0.1 sec

Coincidence windows tij depend on timing error, which is

  non-gaussian at low SNR !

< 5% false dismissal for k =4.5 (Tchebyceff inequality) 

  strongly dependent on SNR !

2 2ij i jt k

To make things even worse, we would like the sequence of event times to be described by a (possibly non-homogeneous) Poisson point series, which means rare and independent triggers, but this was not the case.

Page 19: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Timing error uncertainty (AURIGA, for -like bursts )

Page 20: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Auto- and cross-correlation of time series (clustering)

Auto-correlation of time of arrival on timescales ~100s

No cross-correlation

AL = ALLEGRO

AU = AURIGA

EX = EXPLORER

NA = NAUTILUS

NI = NIOBE

x-axis: seconds

y-axis: counts

Page 21: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Amplitude distributions of exchanged events

relat

ive c

ount

s

10-5

10-4

10-3

10-2

10-1

1

relat

ive c

ount

s

10-5

10-4

10-3

10-2

10-1

1

NIOBENIOBEAMP/THR1 10

NAUTILUSNAUTILUSAMP/THR1 10

AURIGAAURIGAAMP/THR1 10

ALLEGROALLEGROAMP/THR1 10

EXPLOREREXPLORERAMP/THR1 10

normalized to each detector threshold for trigger search

      typical trigger search thresholds:SNR 3 ALLEGRO, NIOBESNR 5 AURIGA, EXPLORER, NAUTILUS The amplitude range is much wider than expected extreme distribution: non modeled outliers dominate at high SNR

Page 22: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

time

amplitude

time

amplitude

time

amplitude

time

amplitude

A

False alarm reduction by amplitude selection

Corollary:

Selected events have naturally consistent amplitudes

With a small increase of minimum amplitude, the false alarm rate drops dramatically.

Page 23: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

0

1

2

3

4

5

6

7

8

9

10

0 6 12 18 24 30 36 42 48 54 60

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 6 12 18 24 30 36 42 48 54 60

amplitude directional sensitivity

2sin GC2sin GC

0

1

2

3

4

5

6

7

8

9

10

0 6 12 18 24 30 36 42 48 54 60

time (hours)

ampl

itude

(H

z-1·1

0-21)

time (hours)

Sensitivity modulation for directional searcham

plitu

de (

Hz-1

·10-2

1)

0

1

2

3

4

5

6

7

8

9

10

0 6 12 18 24 30 36 42 48 54 60

Page 24: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

A small digression: different antenna patterns and the relevance of signal

polarization

Page 25: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

• In order to reconstruct the wave amplitude h, any amplitude has to be divided by

Introduction

• At any given time, the antenna pattern is:

it is a sinusoidal function of polarization , i.e. any gravitational wave detector is a linear polarizer it depends on declination and right ascension through the magnitude A and the phase )),(2cos(),(),,( AF

),,( F

• We will characterize the directional sensitivity of a detector pair by the product of their antenna patterns F1 and F2

F1F2 is inversely proportional to the square of wave amplitude h2 in a cross-correlation search

F1F2 is an “extension” of the “AND” logic of IGEC 2-fold coincidence

This has been extensively used by IGEC: first step is a data selection obtained by putting a threshold F-1 on each detector

Page 26: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

For linearly polarized signal, does not vary with time.The product of antenna pattern as a function of is given by:

)cos()4cos(

)2cos()2cos(

)()(

21212

121

2121

21

AA

AA

FF

)()( 21 FF

)()( 21 FF

The relative phase 1-2 between detectors affects the sensitivity of the pair.

Linearly polarized signals

Page 27: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

AURIGA -TAMA sky coverage: (1) linearly polarized signal

)cos()4cos()()( 21212

12121 AAFF

AURIGA2

TAMA2

21

F

22

F

02

21

AURIGA x TAMA 21 FF

Page 28: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

If:

the signal is circularly polarized:

Amplitude h(t) is varying on timescales longer than 1/f0

Then:

The measured amplitude is simply h(t), therefore it depends only on the magnitude of the antenna patterns. In case of two detectors:

The effect of relative phase 1-2 is limited to a spurious time shift t which adds to the light-speed delay of propagation:

(Gursel and Tinto, Phys Rev D 40, 12 (1989) )

Circularly polarized signals

0

21

2 ft

F

F

tf 02

22

22

12

12

21 FFFFAA

0f

h

)2sin(

)2cos()(

thh

h

04

1

ft

Page 29: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

)cos()4cos()()( 21212

12121 AAFF

AURIGA2

TAMA2

2221 11 FFA

2222 22 FFA

AURIGA -TAMA sky coverage: (2) circularly polarized signal

AURIGA x TAMA 21 AA

Page 30: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

AURIGA x TAMA 21AA AURIGA x TAMA 21 FF

AURIGA -TAMA sky coverage

Linearly polarized signalCircularly polarized signal

Page 31: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

IGEC (continued)

Page 32: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

0

1

2

3

4

5

6

7

8

9

10

0 6 12 18 24 30 36 42 48 54 60time (hours)

Data selection at work

Duty time is shortened at each detector in order to have efficiency at least 50%

A major false alarm reduction is achieved by excluding low amplitude events.

ampl

itude

(H

z-1·1

0-21)

Page 33: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

amplitude of burst gw

Duty cycle cut: single detectors

total time when exchange threshold has been lower than gw amplitude

Page 34: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

1

10

100

1000

10000

1.E-21 1.E-20search threshold (Hz-1)

day

s

single 2-fold

3-fold 4-fold

4yr limit

Duty cycle cut: network (1)

Galactic Center coverage

DETECTORS TIME (days)

FRACTION of 4 yr

search threshold 6 E-21 /Hz

1 or more 894

61%

2 or more 397

27%

3 or more 70

5%

4 7.2

0.5%

search threshold 3 E-21 /Hz

1 or more 359

25%

2 or more 70

5%

3 or more 3

0.2%

4 -

-

Page 35: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Duty cycle cut: network (2)

0%

25%

50%

75%

100%

01-Jan-97 31-Dec-97 30-Dec-98 29-Dec-99

duty

cyc

le p

erce

ntag

e

1

2

3

4

0%

25%

50%

75%

100%

01-Jan-97 31-Dec-97 30-Dec-98 29-Dec-99

duty

cyc

le p

erce

ntag

e

1

2

3

4

search threshold 6 10 -21/Hz

search threshold 3 10 -21/Hz

Page 36: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

False dismissal probability

• data conditioning. The common search threshold Ht guarantees that no gw signal in the

selected data are lost because of poor network setup.…however the efficiency of detection is still undetermined (depends

on distribution of signal amplitude, direction, polarization)

Best choice for 1997-2000 data:false dismissal in time coincidence less than 5% 30%no amplitude consistency test

• time coincidence constraintThe Tchebyscheff inequality provides a robust (with respect to timing

error statistics) and general method to limit conservatively the false dismissal

2 22

1i j i jt t k false dismissal

k

false alarms k

• amplitude consistency check: gw generates events with correlated amplitudes testing (same as above) i jA A A

A coincidence can be missed because of…

fraction of found gw coincidences

fluctuations of accidental background

When optimizing the (partial) efficiency of detection versus false alarms, we are lead to maximize the ratio

Page 37: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

0

1

2

3

4

5

6

7

8

9

10

0 6 12 18 24 30 36 42 48 54 60

0

1

2

3

4

5

6

7

8

9

10

0 6 12 18 24 30 36 42 48 54 60

0

1

2

3

4

5

6

7

8

9

10

0 6 12 18 24 30 36 42 48 54 60

0

1

2

3

4

5

6

7

8

9

10

0 6 12 18 24 30 36 42 48 54 60

0

1

2

3

4

5

6

7

8

9

10

0 6 12 18 24 30 36 42 48 54 60time (hours)

Resampling statistics by time shiftsam

plitu

de (

Hz-1

·10-2

1)

We can approximately resample the stochastic process by time shift.

The in the resampled data the gw sources are off, along with any correlated noise

Ergodicity holds at least up to timescales of the order of one hour.

The samples are independent as long as the shift is longer than the maximum time window for coincidence search (few seconds)

Page 38: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Poisson statistics

For each couple of detectors and amplitude selection, the resampled statistics allows to test Poisson hypothesis for accidental coincidences.

Example: EX-NA background(one-tail 2 p-level 0.71)

As for all two-fold combinations a fairly big number of tests are performed, the overall agreement of the histogram of p-levels with uniform distribution says the last word on the goodness-of-the-fit.

verified

Page 39: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Setting (frequentist) confidence intervals

Page 40: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Unified vs flip-flop approach (1)

experimental data

physical results

hypothesis testing (CL)

x upper limit

estimation (with error bars)

(x)kCL

up(CL)

Flip-flop method

null

claim

Page 41: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Unified vs flip-flop approach (2)

experimental data

physical results

confidence belt

xestimation (with confidence interval)

Unified approach

min(CL)max(CL)

Page 42: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Setting confidence intervals

IGEC approach is

Frequentist in that it computes the confidence level or coverage as the probability that the confidence interval contains the true value

Unified in that it prescribes how to set a confidence interval automatically leading to a gw detection claim or an upper limit

however, different from F&C

References

G.J.Feldman and R.D.Cousins, Phys. Rev. D 57 (1998) 3873B. Roe and M. Woodroofe, Phys. Rev. D 63 (2001) 013009F. Porter, Nucl. Inst. Meth. A368 (1986), http://www.cithep.caltech.edu/~fcp/statistics/Particle Data Group: http://pdg.lbl.gov/2002/statrpp.pdf

Page 43: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

A few basics: confidence belts and coverage

x

x

x

( ; )p d f x

0 1 coverage

0 1 coverage

0 1 coverage experimental data

phys

ical

unk

now

n

Page 44: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

A few basics (2)

experimental data

physical unknown

confidence interval

x

coverage

|

( ) ( ; )xx I

C pdf xI

xI

For each outcome x one should be able to determine a confidence interval Ix

For each possible , the measures which lead to a confidence interval consistent with the true value have probability C(), i.e. 1-C() is the false dismissal probability

x I

Page 45: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

( )C CL

I can be chosen arbitrarily within this “horizontal” constraint

Feldman & Cousins (1998) and variations (Giunti 1999, Roe & Woodroofe 1999, ...)

0 1 coverage

Freedom of choice of confidence belt

Fixed frequentistic coverageMaximization of “likelyhood”

( ; )

" " ( )( ; )

xI

usually

x d

CL Cx d

Ix can be chosen arbitrarily within this “vertical” constraint

Roe & Woodroofe [2000]: a Bayesian inspired frequentistic approachFine tune of the false discovery probability

0

GW enthusiastic

fanatic skeptical

Non-unified approaches

decision threshold

Other requirements...

Page 46: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Confidence level, likelyhood, maybe probability?

The term “CL” is often found associated with equations like

( ; )

" "( ; )

xI

x d

CLx d

1

2

( ; )" "

( ; )x

CLx

limit( ; )

" "( ; )max

xCL

x

( )usually

C

( )usually

C

( )usually

C

In general the bounds obtained as a solution to these equations have a coverage (or confidence level) different from “CL”

likelihood integral

likelihood ratio relative to the maximum

likelihood ratio (hipothesis testing)

Page 47: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Confidence intervals from likelihood integral

• I fixed, solve for :

sup

inf

supinf

1

0

( ; ) ( ; )

( ; ) ( ; )

c c

N

c cN

N N N N

I N N dN N N dN supinf0 N N

• Compute the coverage

supinf|

( ) ( ; )c

cN N N N

C N f N N I

• Let

c b

b obs

N N N

N T

• Poisson pdf:

( ; )!

bc

N NN

c bc

ef N N N N

N

( ; ) ( ; )c cN N f N N• Likelihood:

Page 48: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

c bN N N

Example: Poisson background Nb = 7.0

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

coincidence counts Nc

N

0

1

2

3

4

5

6

7

8

9

10

99%

99%

95%

95%

99.9%

99.9%

50%50%

99%

95%

85%

N

Likelihood integral

Page 49: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Dependence of the coverage from the background

Nb=0.01-0.1-1.0-3.0-7.0-10

likelihood integral = 0.90

Page 50: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

From likelihood integral to coverage

Plot of the likelihood integral vs. minimum (conservative) coverage minC(), for sample values of the background counts Nb, spanning the range Nb=0.01-10

Page 51: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

IGEC results (and what we learned from experience)

Page 52: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Setting confidence intervals on IGEC results

Example: confidence interval with coverage 95%

0

2

4

6

8

10

12

14

16

18

1.0 10.0 100.0

search threshold [10-21/Hz]

Ngw

Ht

“upper limit”: true value outside with coverage 95%

GOAL: estimate the number (rate) of gw detected with amplitude Ht

Page 53: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Uninterpreted upper limits

1

10

100

1,000

1E-21 1E-20 1E-19

0.60

0.80

0.90

0.95

…on RATE of BURST GW from the GALACTIC CENTER DIRECTION whose measured amplitude is greater than the search threshold

no model is assumed for the sources, apart from being a random time series

ensured minimumcoverage

true rate value is under the curves with a probability = coverage

search threshold(Hz -1 )

rate(year –1)

Page 54: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Upper limits after amplitude selection

0

2

4

6

8

10

12

14

16

18

1.0 10.0 100.0

search threshold [10-21/Hz]

Ngw

systematic search on thresholdsmany trials !

all upper limits but one:

overall false alarm probability 33%

at least one detection in case NO GW are in the data

NULL HYPOTHESIS WELL IN AGREEMENT WITH THE

OBSERVATIONS

Page 55: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Multiple configurations/selection/grouping within IGEC analysis

Page 56: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

0

100

200

300

400

500

0 1 2 3 4 5

numer of false alarms

coun

ts

Resampling statistics of accidental claimsevent time series

coverage “claims”

0.90 0.866 (0.555) [1]

0.95 0.404 (0.326) [1]

expected found

Resampling blind analysis!

Page 57: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

False discovery rate: setting the probability of false claim of detection

Page 58: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Why FDR?

When should I care of multiple test procedures?.

• All sky surveys: many source directions and polarizations are tried

• Template banks

• Wide-open-eyes searches: many analysis pipelines are tried altogether, with different amplitude thresholds, signal durations, and so on

• Periodic updates of results: every new science run is a chance for a “discovery”. “Maybe next one is the good one”.

• Many graphical representations or aggregations of the data: “If I change the binning, maybe the signal shows up better…

Page 59: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Preliminary (1) : hypothesis testingFalse discoveries (false positives)

Detected signals

(true positives)

Reported signal candidates

inefficiency

Null Retained

(can’t reject)

RejectReject Null ==

AcceptAccept Alternative

Total

Null (Ho) True

Background (noise)

U B

Type I Error α = εb

mo

Alternative True signal

Type II Error β = 1- εs

T

S m1

m-R

R = S+B

m

Page 60: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Preliminary (2): p-levelAssume you have a model for the noise that affects the measure x.

However, for our purposes it is sufficient assuming that the signal can be distinguished from the noise, i.e. dP/dp 1. Typically, the measured values of p are biased toward 0.

signal

You derive a test statistics t(x) from x.F(t) is the distribution of t when x is sampled from noise only (off-signal).

The p-level associated with t(x) is the value of the distribution of t in t(x):

p = F(t) = P(t>t(x))

• Example: 2 test p is the “one-tail” 2 probability associated with n counts (assuming d degrees of freedom)

Usually, the alternative hypothesis is not known.

p-level

1

background

pdf

• The distribution of p is always linearly raising in case of agreement of the noise with the model P(p)=p dP/dp = 1

Page 61: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Usual multiple testing procedures

For each hypothesis test, the condition {p< reject null} leads to false positives with a probability

In case of multiple tests (need not to be the same test statistics, nor the same tested null hypothesis), let p={p1, p2, … pm} be the set of p-levels. m is the trial factor.We select “discoveries” using a threshold T(p): {pj<T(p) reject null}.

• Uncorrected testing: T(p)=

–The probability that at least one rejection is wrong is

P(B>0) = 1 – (1- )m ~ m

hence false discovery is guaranteed for m large enough

• Fixed total 1st type errors (Bonferroni): T(p)= /m

–Controls familywise error rate in the most stringent manner:

P(B>0) =

–This makes mistakes rare…

–… but in the end efficiency (2nd type errors) becomes negligible!!

Page 62: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

p

S

pdf

m0

Let us make a simple case when signals are easily separable (e.g. high SNR)

Controlling false discovery fractionWe desire to control (=bound) the ratio of false discoveries over the

total number of claims: B/R = B/(B+S) q.

The level T(p) is then chosen accordingly.

m

B

m

BpT

0

)(

B R

BqFDR

R

pmT )(

m

q

R

pT

)(

mq

)( pT p

B

S

cumulative counts

R

Page 63: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Benjamini & Hochberg FDR control procedure

Among the procedures that accomplish this task, one simple recipe was proposed by Benjamini & Hochberg (JRSS-B (1995) 57:289-300)

• choose your desired FDR q (don’t ask too much!);

• define c(m)=1 if p-values are independent or positively correlated; otherwise c(m)=Sumj(1/j)

• compute p-values {p1, p2, … pm} for a set of tests, and sort them in creasing order;

p

m

• determine the threshold T(p)= pk by finding the index k such that pj<(q/m) j/c(m) for every j>k;

reject H0

)( pT

q/c(m)

Page 64: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

LIGO – AURIGA:coincidence vs correlation

Page 65: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

LIGO-AURIGA MoU

A working group for the joint burst search in LIGO and AURIGA has been formed, with the purpose to:

» develop methodologies for bar/interferometer searches, to be tested on real data

» time coincidence, triggered based search on a 2-week coincidence period (Dec 24, 2003 – Jan 9, 2004)

» explore coherent methods‘best’ single-sided PSD

Simulations and methodological studies are in progress.

Page 66: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

White paper on joint analysis

Two methods will be explored in parallel:

Method 1:• IGEC style, but with a new definition of consistent amplitude estimator in

order to face the radically different spectral densities of the two kind of

detectors (interferometers and bars). • To fully exploit IGEC philosophy, as the detectors are not parallel,

polarization effects should be taken into account (multiple trials on polarization

grid).

Method 2:• No assumptions are made on direction or waveform.• A CorrPower search (see poster) is applied to the LIGO interferometers

around the time of the AURIGA triggers. • Efficiency for classes of waveforms and source population is performed

through Monte Carlo simulation, LIGO-style (see talks by Zweizig, Yakushin,

Klimenko).• The accidental rate (background) is obtained with unphysical time-shifts

between data streams.

Page 67: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Summary of non-directional “IGEC style” coincidence search

detector 1

detector 2

detector 2

AND

AND

AND

detector 3

Detectors: PARALLEL, BARS Shh: SIMILAR FREQUENCY RANGE Search: NON DIRECTIONAL Template: BURST = (t)

The search coincidence is performed in a subset of the data such that: the efficiency is at least 50% above the threshold (HS) significant false alarm reduction is accomplished

The number of detectors in coincidence considered is self-adapting

This strategy can be made directional

HS

Page 68: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Cross-correlation search (naïve)

Detectors: PARALLEL Shh: SAME FREQUENCY RANGE NEEDED Search: NON DIRECTIONAL Template: NO

Selection based on data quality can be implemented before cross-correlating.

The efficiency is to be determined a posteriori using Monte Carlo.

The information which is usually included in cross-correlation takes into account statistical properties of the data streams but not geometrical ones, as those related to antenna patterns.

detector 1

detector 2

detector 1 * detector 2

Threshold crossing after correlation

Txxwwnj

jj ,1

21)2()1(

T

Page 69: Network analysis and statistical issues Lucio Baggio An introductive seminar to ICRR’s GW group

Comparison between “IGEC style” and cross-correlation

IGEC style search was designed for template searches. The template guarantees that it is possible to have consistent estimators of signal amplitude and arrival time. A bank of templates may be required to cover different class of signals. Anyway in burst search we don’t know how well the template fits the signal

A template-less IGEC style search can be easily implemented in case of detectors with equal detector bandwidth. In fact it is possible to define a consistent amplitude estimator. (Karhunen-Loeve, power…)

Cross-correlation among identical detectors is the most used method to cope with lack of templates.

Cross-correlation in general is not efficient with non-overlapping frequency bandwidths, even for wide band signals.

Some more work is needed to extend IGEC in case of template-less search among (spectrally) different detectors. Hint: the amplitude estimators should have spectral weights common to all detectors, to be consistent without a template. The trade-off will be between between efficiency loss and network gain (sky coverage and false alarm rate)

21hh SS

Templatesearch

Template-lesssearch

21hh SkS IG

EC

IGEC

cross-

corr

IGEC

cross-

corr

IGEC