Upload
p6e7p7
View
215
Download
0
Embed Size (px)
Citation preview
7/23/2019 Lecture 11sdf
1/41
Detection Theory
http://find/http://goback/7/23/2019 Lecture 11sdf
2/41
Detection theory
A the last topic of the course, we will briefly considerdetection theory.
The methods are based on estimation theory and attemptto answer questions such as
Is a signal of specific model present in our time series? E.g.,detection of noisy sinusoid; beep or no beep?
Is the transmitted pulse present at radar signal at time t? Does the mean level of a signal change at timet? After calculating the mean change in pixel values of
subsequent frames in video, is there something moving inthe scene?
http://find/7/23/2019 Lecture 11sdf
3/41
Detection theory
The area is closely related tohypothesis testing, which is widelyused e.g., in medicine: Is the response in patients due to the newdrug or due to random fluctuations?
In our case, the hypotheses could be
H1 :x[n] =A cos(2f0n + ) + w[n]
H0 :x[n] =w[n]
This example corresponds to detection of noisy sinusoid.
The hypothesisH1corresponds to the case that the sinusoid ispresent and is calledalternative hypothesis.
The hypothesisH0corresponds to the case that the
measurements consists of noise only and is callednull hypothesis.
http://find/7/23/2019 Lecture 11sdf
4/41
Introductory Example
Neyman-Pearson approachis the classical way of solvingdetection problems in an optimal manner.
It relies on so calledNeyman-Pearson theorem.
Before stating the theorem, consider a simplistic detectionproblem, where we observe one samplex[n]from one oftwo densities: N(0, 1)orN(1, 1).
The task is to choose the correct density in an optimalmanner.
http://find/7/23/2019 Lecture 11sdf
5/41
Introductory Example
4 3 2 1 0 1 2 3 40
0.1
0.2
0.3
0.4
x[0]
Likelihood
p(x; H0) p(x; H1)
Our hypotheses are now
H1 :=1,
H0 :=0.
An obvious approach for deciding the density wouldchoose the one, which is higher for a particularx[0].
http://find/7/23/2019 Lecture 11sdf
6/41
Introductory Example
More specifically, study the likelihoods and choose themore likely one.
The likelihoods are
H1 :p(x[0]|=1) = 1
2exp
(x[n] 1)2
2
.
H0 :p(x[0]|=0) = 1
2exp
(x[n])2
2 . Now, one should selectH1if
p(x[0]|=1)> p(x[0]|=0).
http://find/7/23/2019 Lecture 11sdf
7/41
Introductory Example
Lets state this in terms ofx[0]:
p(x[0]|=1)> p(x[0]|=0)
p(x[0]|=1)
p(x[0]|=0) >1
12
exp
(x[n]1)2
2
12
exp (x[n])2
2 >1
exp
(x[n] 1)2) x[n]2
2
>1
http://find/7/23/2019 Lecture 11sdf
8/41
Introductory Example
(x[n]2 (x[n] 1)2)>0 2x[n] 1>0
x[n]>1
2 .
In other words, chooseH1ifx[0]>0.5 andH0ifx[0]
7/23/2019 Lecture 11sdf
9/41
Introductory Example
Note, that it is also possible to study posterior probabilityratiosp(H1 |x)/p(H0 |x)instead of the above likelihoodratiop(x| H1)/p(x| H0).
However, using Bayes rule, thisMAP testturns out to be
p(x| H1)
p(x| H0) >
p(H1)
p(H2),
i.e., the only effect of using posterior probability is on thethreshold for the LRT.
http://find/7/23/2019 Lecture 11sdf
10/41
Error Types
It might be that the detection problem is not symmetric
and some errors are more costly than others. For example, when detecting a disease, a missed detection
is more costly than a false alarm.
The tradeoffbetween misses and false alarms can be
adjusted using the threshold of the LRT.
http://find/7/23/2019 Lecture 11sdf
11/41
Error Types
The below figure illustrates the probabilities of the twokinds of errors. The red area on the left corresponds to theprobability of choosingH1whileH0would hold (false
match). The blue area is the probability of choosingH0whileH1would hold (missed detection).
4 3 2 1 0 1 2 3 40
0.1
0.2
0.3
0.4
x[0]
Likelihood
http://find/7/23/2019 Lecture 11sdf
12/41
Error Types
It can be seen that we can decrease either probabilityarbitrarily small by adjusting the detection threshold.
4 3 2 1 0 1 2 3 40
0.1
0.2
0.3
0.4
x[0]
Likelihood
4 3 2 1 0 1 2 3 40
0.1
0.2
0.3
0.4
x[0]
Likelihood
Left: large threshold; small probability of false match (red),but a lot of misses (blue).
Right: small threshold; only a few missed detections(blue), but a huge number of false matches (red).
http://find/7/23/2019 Lecture 11sdf
13/41
Error Types
Probability of false alarm for the threshold =1.5 is
PFA =P(x[0]> |= 0) =
1.5
12
exp
(x[n])2
2
dx[n] 0.0668.
Probability of missed detection is
PM =P(x[0]> | =1) =
1.5
12
exp
(x[n] 1)2
2
dx[n] 0.6915.
An equivalent, but more useful term is the complement ofPM:probability of detection:
PD =1 PM =
1.5
12
exp
(x[n] 1)2
2
dx[n] 0.3085.
http://find/7/23/2019 Lecture 11sdf
14/41
Neyman-Pearson Theorem
SincePFAandPDdepend on each other, we would like tomaximizePDsubject to given maximum allowedPFA.Luckily the following theorem makes this easy.
Neyman-Pearson Theorem: For a fixed PFA, the likelihood
ratio test maximizesPDwith the decision rule
L(x) = p(x; H1)
p(x; H0) > ,
with thresholdis the value for whichx:L(x)>
p(x; H0) dx=PFA.
http://find/7/23/2019 Lecture 11sdf
15/41
Neyman-Pearson Theorem
As an example, suppose we want to find the best detectorfor our introductory example, and we can tolerate 10%false alarms (PFA =0.1).
According to the theorem, the detection rule is:
SelectH1ifp(x|=1)p(x|=0)
>
The only thing to find out now is the thresholdsuch that
p(x|=0) dx=0.1.
This can be done with Matlab functionicdf, which solvesthe inverse cumulative distribution function.
http://find/7/23/2019 Lecture 11sdf
16/41
Neyman-Pearson Theorem
Unfortunatelyicdfsolves thefor which
p(x|=0) dx=0.1 instead of
p(x|=0) dx=0.1.
Thus, we have to use the function like this:icdf(norm, 1 - 0.1, 0, 1), which gives 1.2816.
Similarly, we can also calculate thePDwith this threshold:
PD =
1.2816p(x|=1) dx 0.3891.
http://find/7/23/2019 Lecture 11sdf
17/41
Detector for a known waveform
The NP approach applies to all cases where likelihoods areavailable.
An important special case is that of a known waveforms[n]
embedded in WGN sequencew[n]
:
H1 :x[n] =s[n] + w[n]
H0 :x[n] =w[n].
An example of a case where the waveform is known couldbe detection of radar signals, where a pulses[n]transmitted by us is reflected back after some propagationtime.
http://find/7/23/2019 Lecture 11sdf
18/41
Detector for a known waveform
For this case the likelihoods are
p(x| H1) =
N1n=0
122
exp
(x[n] s[n])2
22
,
p(x| H0) =N1n=0
122
exp
(x[n])2
22
.
The likelihood ratio test is easily obtained as
p(x| H1)
p(x| H0) =exp
1
22
N1n=0
(x[n] s[n])2
N1n=0
(x[n])2
> .
http://goforward/http://find/http://goback/7/23/2019 Lecture 11sdf
19/41
Detector for a known waveform
This simplifies by taking the logarithm from both sides:
1
22N1
n=0(x[n] s[n])2
N1
n=0(x[n])2> ln.
This further simplifies into
1
2
N1
n=0
x[n]s[n] 1
22
N1
n=0
(s[n])2 >ln.
http://find/7/23/2019 Lecture 11sdf
20/41
Detector for a known waveform
Sinces[n]is a known waveform (= constant), we cansimplify the procedure by moving it to the right hand sideand combining it with the threshold:
N1
n=0
x[n]s[n]> 2
ln +
1
2
N1
n=0
(s[n])2
.
We can equivalently call the right hand side as ourthreshold (say ) to get the final decision rule
N1n=0
x[n]s[n]> .
http://find/7/23/2019 Lecture 11sdf
21/41
Examples
This leads into some rather obvious results.
The detector for a known DC level in WGN is
N1n=0
x[n]A > AN1n=0
x[n]>
Equally well we can set a new threshold and call it
=/(AN). This way the detection rule becomes: x > .Note that a negativeAwould invert the inequality.
http://find/7/23/2019 Lecture 11sdf
22/41
Examples
The detector for a sinusoid in WGN is
N1n=0
x[n]A cos(2f0n + )> AN1n=0
x[n] cos(2f0n + )> .
Again we can divide byAto get
N1n=0
x[n] cos(2f0n + )> .
In other words, we check the correlation with the sinusoid.Note that the amplitude A does not affect our statistic, onlythe threshold which is anyway selected according to thefixedPFArate.
http://goforward/http://find/http://goback/7/23/2019 Lecture 11sdf
23/41
Examples
As an example, the below picture shows the detectionprocess with =0.5.
0 100 200 300 400 500 600 700 800 9001
0
1Noiseless signal
0 100 200 300 400 500 600 700 800 9005
0
5Noisy signal
0 100 200 300 400 500 600 700 800 900 100050
0
50
Detection result
http://find/7/23/2019 Lecture 11sdf
24/41
Detection of random signals
The problem with the previous approach was that themodel was too restrictive; the results depend on how wellthe phases match.
The model can be relaxed by consideringrandom signals,whose exact form is unknown, but the correlation structureis known. Since the correlation captures the frequency (butnot the phase), this is exactly what we want.
In general, the detection of a random signal can beformulated as follows.
http://find/7/23/2019 Lecture 11sdf
25/41
Detection of random signals
Supposes N(0, Cs)andw N(0, 2I). Then the detection
problem is a hypothesis test
H0 :x N(0, 2I)
H1 :x N(0, Cs+ 2I)
It can be shown (see Kay-2, p. 145), that the decision rulebecomes
DecideH1, ifxTs> ,
where
s=Cs(Cs+ 2I)1x.
http://find/7/23/2019 Lecture 11sdf
26/41
Detection of random signals
The termsis in fact the estimate of the signal; morespecifically, the linear Bayesian MMSE estimator, whichassumes linearity for the estimator (similar to BLUE).
A particular special case of a random signal is the Bayesian
linear model. The Bayesian linear model assumes linearityx =H +w
together with a prior for the parameters, such as N(0, 2I)
Consider the following detection problem:
H0 :x =w
H1 :x =H +w
http://find/7/23/2019 Lecture 11sdf
27/41
Detection of random signals
Within the earlier random signal framework, this is writtenas
H0
:x N(0, 2I)
H1 :x N(0, Cs+ 2I)
withCs =HCHT.
The assumptions N(0, HCHT)states that the exact
form of the signal is unknown, and we only know itscovariance structure.
http://goforward/http://find/http://goback/7/23/2019 Lecture 11sdf
28/41
Detection of random signals
This is helpful in the sinusoidal detection problem: we arenot interested in the phase (included by the exactformulationx[n] =A cos(2f0n + )), but only in thefrequency (as described by the covariance matrixHCH
T).
Thus, the decision rule becomes:
DecideH1, ifxTs> ,
where
s=Cs(Cs+ 2I)1x
=HCHT(HCH
T + 2I)1x
http://find/7/23/2019 Lecture 11sdf
29/41
Detection of random signals
Luckily the decision rule simplifies quite a lot by noticingthat the last part is the MMSE estimate of:
xTs=xTCHT(HCH
T + 2I)1x
=xT
H.
An example of applying the linear model is in Kay:Statistical Signal Processing, vol. 2; Detection Theory,pages 155-158.
In the example, a Rayleigh fading sinusoid is studied,which has an unknown amplitudeAand phase term.Only the frequencyf0is assumed to be known.
http://find/7/23/2019 Lecture 11sdf
30/41
Detection of random signals
This can be manipulated into a linear model form with twounknowns corresponding toAand.
The final result is the decision rule:
N1n=0
x[n] exp(2if0n) > .
As an example, the below picture shows the detectionprocess with =0.5.
Note the simplicity of Matlab implementation:
h = exp(-2*pi*sqrt(-1)*f0*n);
y = abs(conv(h,x));
http://goforward/http://find/http://goback/7/23/2019 Lecture 11sdf
31/41
Detection of random signals
0 100 200 300 400 500 600 700 800 9001
0
1Noiseless signal
0 100 200 300 400 500 600 700 800 9005
0
5
Noisy signal
0 100 200 300 400 500 600 700 800 9000
50Detection result
http://find/7/23/2019 Lecture 11sdf
32/41
Receiver Operating Characteristics
A usual way of illustrating the detector performance is theReceiver Operating Characteristicscurve (ROC curve).
This describes the relationship betweenPFAandPDfor allpossible values of the threshold.
The functional relationship betweenPFAandPDdependson the problem (and the selected detector, although we
have proven that LRT is optimal).
http://find/7/23/2019 Lecture 11sdf
33/41
Receiver Operating Characteristics
For example, in the DC level example,
PD() =
12
exp(x 1)2
2 dxPFA() =
12
exp
x2
2
dx
It is easy to see the relationship:
PD() =1
12
exp
x2
2
dx=PFA( 1).
http://find/7/23/2019 Lecture 11sdf
34/41
Receiver Operating Characteristics
Plotting the ROC curve for allresults in the followingcurve.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Probability of False Alarm, PFA
ProbabilityofDetection,
PD
Large threshold
Small threshold
http://find/7/23/2019 Lecture 11sdf
35/41
Receiver Operating Characteristics
The higher the ROC curve, the better the performance.
A random guess has diagonal ROC curve.
In the DC level case, the performance increases if the noisevariance2 decreases. Below are the ROC plots for variousvalues of2.
http://find/7/23/2019 Lecture 11sdf
36/41
Receiver Operating Characteristics
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Probability of False Alarm, PFA
Proba
bilityofDetection,
PD
variance = 0.2
variance = 0.4
variance = 0.6
variance = 0.8
variance = 1Random guess
This gives rise to a widely used measure for detectorperformance: theArea Under (ROC) Curve, or AUCcriterion.
http://find/7/23/2019 Lecture 11sdf
37/41
Composite hypothesis testing
In the previous examples the parameter values specifiedthe distribution completely; e.g., eitherA =1 orA =0.
Such cases are calledsimple hypotheses testing.
Often we cant specify exactly the parameters for eithercase, but instead a range of values for each case.
An example could be our DC modelx[n] =A +w[n]with
H1 :A 0H0 :A=0
http://find/7/23/2019 Lecture 11sdf
38/41
Composite hypothesis testing
The question can be posed in a probabilistic manner asfollows:
What is the probability of observing x[n]ifH
0would hold?
If the probability is small (e.g., allx[n] [0.5, 1.5], and letssay the probability of observingx[n]underH0is 1 %), thenwe can conclude that the null hypothesis can berejected
with 99% confidence.
http://find/7/23/2019 Lecture 11sdf
39/41
An example
As an example, consider detecting a biased coin in a cointossing experiment.
If we get 19 heads out of 20 tosses, it seems rather likely
that the coin is biased. How to pose the question mathematically?
Now the hypotheses is
H1 :coin is biased: p 0.5
H0 :coin is unbiased:p=0.5,
wherepdenotes the probability of a head for our coin.
http://find/7/23/2019 Lecture 11sdf
40/41
An example
Additionally, lets say, we want 99% confidence for the test.
Thus, we can state the hypothesis test as: "what is theprobability of observing at least 19 heads assuming
p=0.5?"
This is given by the binomial distribution20
19
0.519 0.51
19 heads+
or
0.520
20 heads
0.00002.
Since 0.00002 < 1%, we can reject the null hypothesis andthe coin is biased.
http://goforward/http://find/http://goback/7/23/2019 Lecture 11sdf
41/41
An example
Actually, the 99% confidence was a bit loose in this case.
We could have set a 99.98% confidence requirement andstill reject the null hypothesis.
The upper limit for the confidence (here 99.98%) is widelyused and called thep-value.
More specifically,
The p-value is the probability of obtaining a test statistic at least
as extreme as the one that was actually observed, assuming thatthe null hypothesis is true.
http://find/