Lecture 11sdf

  • Upload
    p6e7p7

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

  • 7/23/2019 Lecture 11sdf

    1/41

    Detection Theory

    http://find/http://goback/
  • 7/23/2019 Lecture 11sdf

    2/41

    Detection theory

    A the last topic of the course, we will briefly considerdetection theory.

    The methods are based on estimation theory and attemptto answer questions such as

    Is a signal of specific model present in our time series? E.g.,detection of noisy sinusoid; beep or no beep?

    Is the transmitted pulse present at radar signal at time t? Does the mean level of a signal change at timet? After calculating the mean change in pixel values of

    subsequent frames in video, is there something moving inthe scene?

    http://find/
  • 7/23/2019 Lecture 11sdf

    3/41

    Detection theory

    The area is closely related tohypothesis testing, which is widelyused e.g., in medicine: Is the response in patients due to the newdrug or due to random fluctuations?

    In our case, the hypotheses could be

    H1 :x[n] =A cos(2f0n + ) + w[n]

    H0 :x[n] =w[n]

    This example corresponds to detection of noisy sinusoid.

    The hypothesisH1corresponds to the case that the sinusoid ispresent and is calledalternative hypothesis.

    The hypothesisH0corresponds to the case that the

    measurements consists of noise only and is callednull hypothesis.

    http://find/
  • 7/23/2019 Lecture 11sdf

    4/41

    Introductory Example

    Neyman-Pearson approachis the classical way of solvingdetection problems in an optimal manner.

    It relies on so calledNeyman-Pearson theorem.

    Before stating the theorem, consider a simplistic detectionproblem, where we observe one samplex[n]from one oftwo densities: N(0, 1)orN(1, 1).

    The task is to choose the correct density in an optimalmanner.

    http://find/
  • 7/23/2019 Lecture 11sdf

    5/41

    Introductory Example

    4 3 2 1 0 1 2 3 40

    0.1

    0.2

    0.3

    0.4

    x[0]

    Likelihood

    p(x; H0) p(x; H1)

    Our hypotheses are now

    H1 :=1,

    H0 :=0.

    An obvious approach for deciding the density wouldchoose the one, which is higher for a particularx[0].

    http://find/
  • 7/23/2019 Lecture 11sdf

    6/41

    Introductory Example

    More specifically, study the likelihoods and choose themore likely one.

    The likelihoods are

    H1 :p(x[0]|=1) = 1

    2exp

    (x[n] 1)2

    2

    .

    H0 :p(x[0]|=0) = 1

    2exp

    (x[n])2

    2 . Now, one should selectH1if

    p(x[0]|=1)> p(x[0]|=0).

    http://find/
  • 7/23/2019 Lecture 11sdf

    7/41

    Introductory Example

    Lets state this in terms ofx[0]:

    p(x[0]|=1)> p(x[0]|=0)

    p(x[0]|=1)

    p(x[0]|=0) >1

    12

    exp

    (x[n]1)2

    2

    12

    exp (x[n])2

    2 >1

    exp

    (x[n] 1)2) x[n]2

    2

    >1

    http://find/
  • 7/23/2019 Lecture 11sdf

    8/41

    Introductory Example

    (x[n]2 (x[n] 1)2)>0 2x[n] 1>0

    x[n]>1

    2 .

    In other words, chooseH1ifx[0]>0.5 andH0ifx[0]

  • 7/23/2019 Lecture 11sdf

    9/41

    Introductory Example

    Note, that it is also possible to study posterior probabilityratiosp(H1 |x)/p(H0 |x)instead of the above likelihoodratiop(x| H1)/p(x| H0).

    However, using Bayes rule, thisMAP testturns out to be

    p(x| H1)

    p(x| H0) >

    p(H1)

    p(H2),

    i.e., the only effect of using posterior probability is on thethreshold for the LRT.

    http://find/
  • 7/23/2019 Lecture 11sdf

    10/41

    Error Types

    It might be that the detection problem is not symmetric

    and some errors are more costly than others. For example, when detecting a disease, a missed detection

    is more costly than a false alarm.

    The tradeoffbetween misses and false alarms can be

    adjusted using the threshold of the LRT.

    http://find/
  • 7/23/2019 Lecture 11sdf

    11/41

    Error Types

    The below figure illustrates the probabilities of the twokinds of errors. The red area on the left corresponds to theprobability of choosingH1whileH0would hold (false

    match). The blue area is the probability of choosingH0whileH1would hold (missed detection).

    4 3 2 1 0 1 2 3 40

    0.1

    0.2

    0.3

    0.4

    x[0]

    Likelihood

    http://find/
  • 7/23/2019 Lecture 11sdf

    12/41

    Error Types

    It can be seen that we can decrease either probabilityarbitrarily small by adjusting the detection threshold.

    4 3 2 1 0 1 2 3 40

    0.1

    0.2

    0.3

    0.4

    x[0]

    Likelihood

    4 3 2 1 0 1 2 3 40

    0.1

    0.2

    0.3

    0.4

    x[0]

    Likelihood

    Left: large threshold; small probability of false match (red),but a lot of misses (blue).

    Right: small threshold; only a few missed detections(blue), but a huge number of false matches (red).

    http://find/
  • 7/23/2019 Lecture 11sdf

    13/41

    Error Types

    Probability of false alarm for the threshold =1.5 is

    PFA =P(x[0]> |= 0) =

    1.5

    12

    exp

    (x[n])2

    2

    dx[n] 0.0668.

    Probability of missed detection is

    PM =P(x[0]> | =1) =

    1.5

    12

    exp

    (x[n] 1)2

    2

    dx[n] 0.6915.

    An equivalent, but more useful term is the complement ofPM:probability of detection:

    PD =1 PM =

    1.5

    12

    exp

    (x[n] 1)2

    2

    dx[n] 0.3085.

    http://find/
  • 7/23/2019 Lecture 11sdf

    14/41

    Neyman-Pearson Theorem

    SincePFAandPDdepend on each other, we would like tomaximizePDsubject to given maximum allowedPFA.Luckily the following theorem makes this easy.

    Neyman-Pearson Theorem: For a fixed PFA, the likelihood

    ratio test maximizesPDwith the decision rule

    L(x) = p(x; H1)

    p(x; H0) > ,

    with thresholdis the value for whichx:L(x)>

    p(x; H0) dx=PFA.

    http://find/
  • 7/23/2019 Lecture 11sdf

    15/41

    Neyman-Pearson Theorem

    As an example, suppose we want to find the best detectorfor our introductory example, and we can tolerate 10%false alarms (PFA =0.1).

    According to the theorem, the detection rule is:

    SelectH1ifp(x|=1)p(x|=0)

    >

    The only thing to find out now is the thresholdsuch that

    p(x|=0) dx=0.1.

    This can be done with Matlab functionicdf, which solvesthe inverse cumulative distribution function.

    http://find/
  • 7/23/2019 Lecture 11sdf

    16/41

    Neyman-Pearson Theorem

    Unfortunatelyicdfsolves thefor which

    p(x|=0) dx=0.1 instead of

    p(x|=0) dx=0.1.

    Thus, we have to use the function like this:icdf(norm, 1 - 0.1, 0, 1), which gives 1.2816.

    Similarly, we can also calculate thePDwith this threshold:

    PD =

    1.2816p(x|=1) dx 0.3891.

    http://find/
  • 7/23/2019 Lecture 11sdf

    17/41

    Detector for a known waveform

    The NP approach applies to all cases where likelihoods areavailable.

    An important special case is that of a known waveforms[n]

    embedded in WGN sequencew[n]

    :

    H1 :x[n] =s[n] + w[n]

    H0 :x[n] =w[n].

    An example of a case where the waveform is known couldbe detection of radar signals, where a pulses[n]transmitted by us is reflected back after some propagationtime.

    http://find/
  • 7/23/2019 Lecture 11sdf

    18/41

    Detector for a known waveform

    For this case the likelihoods are

    p(x| H1) =

    N1n=0

    122

    exp

    (x[n] s[n])2

    22

    ,

    p(x| H0) =N1n=0

    122

    exp

    (x[n])2

    22

    .

    The likelihood ratio test is easily obtained as

    p(x| H1)

    p(x| H0) =exp

    1

    22

    N1n=0

    (x[n] s[n])2

    N1n=0

    (x[n])2

    > .

    http://goforward/http://find/http://goback/
  • 7/23/2019 Lecture 11sdf

    19/41

    Detector for a known waveform

    This simplifies by taking the logarithm from both sides:

    1

    22N1

    n=0(x[n] s[n])2

    N1

    n=0(x[n])2> ln.

    This further simplifies into

    1

    2

    N1

    n=0

    x[n]s[n] 1

    22

    N1

    n=0

    (s[n])2 >ln.

    http://find/
  • 7/23/2019 Lecture 11sdf

    20/41

    Detector for a known waveform

    Sinces[n]is a known waveform (= constant), we cansimplify the procedure by moving it to the right hand sideand combining it with the threshold:

    N1

    n=0

    x[n]s[n]> 2

    ln +

    1

    2

    N1

    n=0

    (s[n])2

    .

    We can equivalently call the right hand side as ourthreshold (say ) to get the final decision rule

    N1n=0

    x[n]s[n]> .

    http://find/
  • 7/23/2019 Lecture 11sdf

    21/41

    Examples

    This leads into some rather obvious results.

    The detector for a known DC level in WGN is

    N1n=0

    x[n]A > AN1n=0

    x[n]>

    Equally well we can set a new threshold and call it

    =/(AN). This way the detection rule becomes: x > .Note that a negativeAwould invert the inequality.

    http://find/
  • 7/23/2019 Lecture 11sdf

    22/41

    Examples

    The detector for a sinusoid in WGN is

    N1n=0

    x[n]A cos(2f0n + )> AN1n=0

    x[n] cos(2f0n + )> .

    Again we can divide byAto get

    N1n=0

    x[n] cos(2f0n + )> .

    In other words, we check the correlation with the sinusoid.Note that the amplitude A does not affect our statistic, onlythe threshold which is anyway selected according to thefixedPFArate.

    http://goforward/http://find/http://goback/
  • 7/23/2019 Lecture 11sdf

    23/41

    Examples

    As an example, the below picture shows the detectionprocess with =0.5.

    0 100 200 300 400 500 600 700 800 9001

    0

    1Noiseless signal

    0 100 200 300 400 500 600 700 800 9005

    0

    5Noisy signal

    0 100 200 300 400 500 600 700 800 900 100050

    0

    50

    Detection result

    http://find/
  • 7/23/2019 Lecture 11sdf

    24/41

    Detection of random signals

    The problem with the previous approach was that themodel was too restrictive; the results depend on how wellthe phases match.

    The model can be relaxed by consideringrandom signals,whose exact form is unknown, but the correlation structureis known. Since the correlation captures the frequency (butnot the phase), this is exactly what we want.

    In general, the detection of a random signal can beformulated as follows.

    http://find/
  • 7/23/2019 Lecture 11sdf

    25/41

    Detection of random signals

    Supposes N(0, Cs)andw N(0, 2I). Then the detection

    problem is a hypothesis test

    H0 :x N(0, 2I)

    H1 :x N(0, Cs+ 2I)

    It can be shown (see Kay-2, p. 145), that the decision rulebecomes

    DecideH1, ifxTs> ,

    where

    s=Cs(Cs+ 2I)1x.

    http://find/
  • 7/23/2019 Lecture 11sdf

    26/41

    Detection of random signals

    The termsis in fact the estimate of the signal; morespecifically, the linear Bayesian MMSE estimator, whichassumes linearity for the estimator (similar to BLUE).

    A particular special case of a random signal is the Bayesian

    linear model. The Bayesian linear model assumes linearityx =H +w

    together with a prior for the parameters, such as N(0, 2I)

    Consider the following detection problem:

    H0 :x =w

    H1 :x =H +w

    http://find/
  • 7/23/2019 Lecture 11sdf

    27/41

    Detection of random signals

    Within the earlier random signal framework, this is writtenas

    H0

    :x N(0, 2I)

    H1 :x N(0, Cs+ 2I)

    withCs =HCHT.

    The assumptions N(0, HCHT)states that the exact

    form of the signal is unknown, and we only know itscovariance structure.

    http://goforward/http://find/http://goback/
  • 7/23/2019 Lecture 11sdf

    28/41

    Detection of random signals

    This is helpful in the sinusoidal detection problem: we arenot interested in the phase (included by the exactformulationx[n] =A cos(2f0n + )), but only in thefrequency (as described by the covariance matrixHCH

    T).

    Thus, the decision rule becomes:

    DecideH1, ifxTs> ,

    where

    s=Cs(Cs+ 2I)1x

    =HCHT(HCH

    T + 2I)1x

    http://find/
  • 7/23/2019 Lecture 11sdf

    29/41

    Detection of random signals

    Luckily the decision rule simplifies quite a lot by noticingthat the last part is the MMSE estimate of:

    xTs=xTCHT(HCH

    T + 2I)1x

    =xT

    H.

    An example of applying the linear model is in Kay:Statistical Signal Processing, vol. 2; Detection Theory,pages 155-158.

    In the example, a Rayleigh fading sinusoid is studied,which has an unknown amplitudeAand phase term.Only the frequencyf0is assumed to be known.

    http://find/
  • 7/23/2019 Lecture 11sdf

    30/41

    Detection of random signals

    This can be manipulated into a linear model form with twounknowns corresponding toAand.

    The final result is the decision rule:

    N1n=0

    x[n] exp(2if0n) > .

    As an example, the below picture shows the detectionprocess with =0.5.

    Note the simplicity of Matlab implementation:

    h = exp(-2*pi*sqrt(-1)*f0*n);

    y = abs(conv(h,x));

    http://goforward/http://find/http://goback/
  • 7/23/2019 Lecture 11sdf

    31/41

    Detection of random signals

    0 100 200 300 400 500 600 700 800 9001

    0

    1Noiseless signal

    0 100 200 300 400 500 600 700 800 9005

    0

    5

    Noisy signal

    0 100 200 300 400 500 600 700 800 9000

    50Detection result

    http://find/
  • 7/23/2019 Lecture 11sdf

    32/41

    Receiver Operating Characteristics

    A usual way of illustrating the detector performance is theReceiver Operating Characteristicscurve (ROC curve).

    This describes the relationship betweenPFAandPDfor allpossible values of the threshold.

    The functional relationship betweenPFAandPDdependson the problem (and the selected detector, although we

    have proven that LRT is optimal).

    http://find/
  • 7/23/2019 Lecture 11sdf

    33/41

    Receiver Operating Characteristics

    For example, in the DC level example,

    PD() =

    12

    exp(x 1)2

    2 dxPFA() =

    12

    exp

    x2

    2

    dx

    It is easy to see the relationship:

    PD() =1

    12

    exp

    x2

    2

    dx=PFA( 1).

    http://find/
  • 7/23/2019 Lecture 11sdf

    34/41

    Receiver Operating Characteristics

    Plotting the ROC curve for allresults in the followingcurve.

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Probability of False Alarm, PFA

    ProbabilityofDetection,

    PD

    Large threshold

    Small threshold

    http://find/
  • 7/23/2019 Lecture 11sdf

    35/41

    Receiver Operating Characteristics

    The higher the ROC curve, the better the performance.

    A random guess has diagonal ROC curve.

    In the DC level case, the performance increases if the noisevariance2 decreases. Below are the ROC plots for variousvalues of2.

    http://find/
  • 7/23/2019 Lecture 11sdf

    36/41

    Receiver Operating Characteristics

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Probability of False Alarm, PFA

    Proba

    bilityofDetection,

    PD

    variance = 0.2

    variance = 0.4

    variance = 0.6

    variance = 0.8

    variance = 1Random guess

    This gives rise to a widely used measure for detectorperformance: theArea Under (ROC) Curve, or AUCcriterion.

    http://find/
  • 7/23/2019 Lecture 11sdf

    37/41

    Composite hypothesis testing

    In the previous examples the parameter values specifiedthe distribution completely; e.g., eitherA =1 orA =0.

    Such cases are calledsimple hypotheses testing.

    Often we cant specify exactly the parameters for eithercase, but instead a range of values for each case.

    An example could be our DC modelx[n] =A +w[n]with

    H1 :A 0H0 :A=0

    http://find/
  • 7/23/2019 Lecture 11sdf

    38/41

    Composite hypothesis testing

    The question can be posed in a probabilistic manner asfollows:

    What is the probability of observing x[n]ifH

    0would hold?

    If the probability is small (e.g., allx[n] [0.5, 1.5], and letssay the probability of observingx[n]underH0is 1 %), thenwe can conclude that the null hypothesis can berejected

    with 99% confidence.

    http://find/
  • 7/23/2019 Lecture 11sdf

    39/41

    An example

    As an example, consider detecting a biased coin in a cointossing experiment.

    If we get 19 heads out of 20 tosses, it seems rather likely

    that the coin is biased. How to pose the question mathematically?

    Now the hypotheses is

    H1 :coin is biased: p 0.5

    H0 :coin is unbiased:p=0.5,

    wherepdenotes the probability of a head for our coin.

    http://find/
  • 7/23/2019 Lecture 11sdf

    40/41

    An example

    Additionally, lets say, we want 99% confidence for the test.

    Thus, we can state the hypothesis test as: "what is theprobability of observing at least 19 heads assuming

    p=0.5?"

    This is given by the binomial distribution20

    19

    0.519 0.51

    19 heads+

    or

    0.520

    20 heads

    0.00002.

    Since 0.00002 < 1%, we can reject the null hypothesis andthe coin is biased.

    http://goforward/http://find/http://goback/
  • 7/23/2019 Lecture 11sdf

    41/41

    An example

    Actually, the 99% confidence was a bit loose in this case.

    We could have set a 99.98% confidence requirement andstill reject the null hypothesis.

    The upper limit for the confidence (here 99.98%) is widelyused and called thep-value.

    More specifically,

    The p-value is the probability of obtaining a test statistic at least

    as extreme as the one that was actually observed, assuming thatthe null hypothesis is true.

    http://find/