A Track Scoring MOP for Perimeter Surveillance …fusion.isif.org/proceedings/fusion12CD/html/pdf/276_485.pdfscan systems. Blair [10] describes a set of metrics used to evaluate performance

A Track Scoring MOP for Perimeter Surveillance Radar Evaluation

Michel Pelletier Sutharsan Sivagnanam

FLIR - Radars Laval, QC, Canada

[email protected] [email protected]

Erik P. Blasch Defence R&D Canada-Valcartier

Québec City, QC, Canada [email protected]

Abstract— For real-world tracking performance evaluation, there are issues associated with measurement-to-track assignment, operating conditions of sensor modeling and uncertainty, and coordination between sensors for accurate track evaluation. In this paper, we demonstrate a real-world application of track performance evaluation for perimeter surveillance radars. To this end, a new track scoring measure of performance (MOP) is introduced called the Target Coverage Consistency (TCC). It is similar to the percentage-time tracked, however the score is modulated to account for divergent, redundant and false track times. Through a simple example and real-world data analysis, we demonstrate the usefulness of the scoring methodology for performance assessment in relation to real operational sensors.

Keywords: Performance Evaluation, Perimeter Surveillance, Radar, Detection, Tracking.

I. INTRODUCTION

Perimeter surveillance is an expanding field with both civilian and military applications. The main challenge is to provide continuous and uninterrupted protection against intrusions with low false-alarm rates in widely varying environments and weather conditions. Radar technology is well suited for such applications since it offers all-weather coverage over large areas. In recent years, several commercial systems have become available that can detect intruders (e.g. humans and vehicles) up to several kilometers. These systems typically use advanced detection and tracking schemes to perform well in high clutter (e.g. clutter maps, adaptive thresholds, multiple-hypothesis tracking, etc.), as shown in Figure 1. One important challenge for designers and users of such systems is to evaluate their global performance.

Tracking performance evaluation (TPE) metrics are important and many can be found in textbooks [1], survey papers [2], and theoretical applications [3]. A summary of metrics is provided in [4] with variations of the terminology. The field has developed a number of useful metrics and MOPs of which we restrict our discussion to those of interest for this study. One important class includes the root mean square error (RMSE) metrics to determine track accuracy. In addition to tracker metrics, fidelity metrics such as the correct assignment ratio (CAR) and track purity (TP) focus on the measurement-to-track assignment, or in distributed cases, track-to-track assignment. Furthermore, for operational analyses, the user is focused on utility metrics. Utility metrics are those that capture

the system performance to reflect the nature of the problem such as surveillance (i.e. detect and track all targets with a low false-alarm rate). In [5] and [6], evaluation of algorithms is performed using several metrics and MOPs for track-while-scan systems. Blair [10] describes a set of metrics used to evaluate performance in Single Integrated Air Picture (SIAP) systems.

Figure 1. Surveillance radar measurement collection.

Although no metric or MOP can capture all aspects of tracker performance, even for the case of a single sensor with no identification (ID) or classification information, some approaches try to encompass several aspects of TPE. In [6], several metrics are combined to obtain global performance measures. The Optimal Subpattern Assignment (OSPA) [7] is another example of global measure, since it incorporates distance, cardinality and purity terms.

In the particular application of perimeter surveillance,

typically there should be no or very few targets in the critical zones. The critical operational performance factors concern mainly:

1. Response time: (timeliness) How fast is an alarm generated following an intrusion? This is related namely to the detection performance and track initiation time.

2. Track consistency/correctness: How consistently does the track follow the target? is there always a

2028

track? and is it always in the vicinity of the target? Is the right number of tracks reported? and how fast does the track terminate when the target is no longer present?

3. False alarm rate: (confidence) how often are false tracks created, and how long do they last?

Other important but less critical performance criteria include: 4. Accuracy: how accurate is the reported information?

namely track location and velocity. 5. Track ID/purity: are the targets consistently tracked

with the same ID? Are there ID switches? 6. Identification/classification: is additional information

available such as target physical size, aspect, radar cross-section (RCS), Doppler signature, and salient feature analysis, etc.

All criteria from the above list provide useful information that is key to improve situation awareness. In this paper, a new MOP is presented called the Target Coverage Consistency (TCC). TCC was devised to help rapidly evaluate different detection and tracking algorithms in the context of perimeter surveillance. It attempts to quantify how well the security system is reporting intruders in the sensor coverage area. It is similar to the percentage-time tracked [5] (denoted as the tracked ratio Rtrk in this paper) and the Track Probability of Detection (TPD) [4], however it is affected by track redundancy (duplication), instability (divergence), as well as the presence of false tracks. The issues of the TCC MOP derive from the use of operational radars for perimeter surveillance of critical sites, as shown in Figure 2.

Figure 2. Operational Perimeter Surveillance Radars

Section 2 presents the formal definition of the new TCC MOP. In Section 3, methods to compute the TCC are explained. A simple illustrative example is presented in Section 4, while Section 5 presents the results obtained when using the MOP with experimental data, as part of a complete evaluation system for perimeter surveillance. This is followed by the conclusion and references.

II. TARGET COVERAGE CONSISTENCY (TCC)

In order to develop the TCC MOP for track scoring, we begin with definitions.

A. Preliminary Definitions

Target: Intruder in the sensor coverage area that has been modeled by a trajectory in time, called the ground-truth.

Track: Output of the tracker that tries to follow the targets.

A track that is never associated to any target is a false track. A target that is never assigned any track is an undetected target. In the following definitions, assume that a given scenario contains N targets Li, i � [1, …, N], and each target exists for a time ti in the sensor coverage area (or in the area to be protected). The following measures are needed to define the TCC score:

Ri-trk: The target tracked ratio is the fraction (or percentage) of time that target Li exists and is tracked, i.e. it has at least one track associated to it (association is discussed in Section 3). If we define the tracked-time ti-trk as the time Liexists and is tracked, then Ri-trk � ti-trk / ti.

ti-div: Let the target divergent time ti-div be defined as the total time for which any track that was once associated to Lihas diverged, i.e. it is too far from Li to be considered associated to it, or target Li no longer exists at that time. The divergent time therefore includes track overshoot/termination time.

Ri-div : Now define Ri-div � ti-div / ti as the target divergent ratio.

We also need the target duplicate ratio:

Ri-dup � ti-dup / ti: The ratio (or percentage) of time that target Li exists and has more than one track associated to it.

B. TCC Definition

We can now define the TCC for target Li as:

( ) )1(1

100

dupidivi

trkii RR

RTCC−−

−

++=

For a global scenario including N targets as well as M false tracks, each existing for a time tj-false, j � [1, …, M], the TCC MOP is given by:

)2(11

1

��

= −=

=

+= m

j falsejn

i i

in

i i

tt

TCCtTCC

αα

α

Where � � [0, 1] is a constant that determines how targets are weighted with respect to one another, depending on their existence time. For example:

� = 1: The influence of targets in the global score is directly proportional to their existence time.

� = 0: All targets have the same weight in the global score, independently of their lifetime.

� = 0.5: Target weight is proportional to the square-root of the lifetime.

2029

In order to properly account for the negative effect of false alarms, the false times tj-false values are typically bounded downwards to some minimal acceptable value.

III. COMPUTATION OF TCC

A. Track-to-Target Associations

In order to compute the measures described above, it is necessary to determine which tracks are associated with which targets at each instant in time. In order to account for possible duplicate time, the association is not one-to-one, however, at each time instant, a track can only be associated to one ground-truth target [6].

Once the association is done, it is straightforward to compute the values of Ri-trk, Ri-div , TCCi, etc. defined in the previous section.

B. Association Scheme

Track-to-truth association should be done globally using all target ground truths and tracks over all scans. This is difficult to achieve in practice and thus approximation schemes are typically used. Two-dimensional global assignment schemes are often used to find the truth-to-track association [9]. Different techniques to do this are the Hungarian [11], Munkres [13], JVC (Jonker-Volgenant-Castanon) [14] and auction [12] algorithms. In [8] four methodologies are proposed to disallow or discourage track switches in the association process. Recently, the OSPA metric [7] was proposed to find better associations of multiple tracks to ground truths. This metric also utilizes the standard two-dimensional association techniques, and an association cost function is defined explicitly.

Here we propose to use the following association algorithm to compute the TCC:

1. (Time synchronization) Assuming a list of ground-truth targets is available which includes the positions of targets at specific times (typically periodic), interpolate all the tracks to the same times as the targets.

2. (Gating) At each ground-truth target time, determine which tracks, if any, are included in the target gate. In order for TCC to be algorithm-free [4], the gates should not be related to the tracker gates, rather, they must be related to the desired precision for the application or the expected accuracy of the sensor. This approach also allows evaluating different sensors for the same application.

3. (Track association list) For each track and for each time, place the associated targets in an ordered list, starting with the best match, i.e. the closest.

4. (Final association) For each track, determine a uniquetarget associated at each time. To do this, use the following algorithm:

a. Form the closest-association sequences for all times at which tracks are matched with targets. A closest-association sequence is defined to be a time interval

for a given track in which the best matched target is the same.

b. Find the longest (in time) closest-associationsequence. If there is more than one longest sequence with the same length, choose the one with the minimal sum of distances between track and target (i.e. the best-matched sequence). This will uniquely identify a track and an associated target for that time interval.

c. For each time in the sequence, eliminate any other match to other targets.

d. For each time in the sequence, check if the identified target is present in other track assignments. If so, and if that match is not the only one, eliminate that match.

e. When this is done, re-form all the closest-associationsequences (same as step a. above but accounting for removed matches) and repeat the whole process starting at step b., but exclude the already processed sequences formed from the previously-identified track-target matches.

5. (Target association list) Now that unique track-to-target associations are ready, prepare the reverse list, i.e. for all ground-truth targets and all times, prepare the list of associated tracks.

C. Association Matrix

Once the target-to-track association lists are ready in step 5 above, the computation of the TCC is straightforward. Assuming the update period is constant and the weighting factor � is chosen to be unity, the process is somewhat similar to building a confusion matrix C as described in [3]: Given the tracks �1 , ..., �b and a set of ground-truth targets g1 , ..., ga , C is:

� 1

� 2

� b

.

..

C 11 C 12 C 1a...

C 21 C 22 C 2a...

C b1 C b2 C ba...

: : :: : :: : :

g 1 g 2 g a ...

C 01 C 02 C 0a...

Targets

Tracks

where Cji is the number of associations of target gi to track �j. The first row, C0i, consists of the number of target points that were not associated to any track, i.e. un-detected/un-tracked time. In order to compute TCC, the confusion matrix is augmented as follows:

1. For all track/target cells, add a count for duplicate track time, i.e. incorrect associations.

2. For all tracks (rows), add a column for divergent time, i.e. when the tracks cannot be associated with any real track.

2030

Once this is done, TCC computations are straightforward. The divergent-time of a track is added to all targets which have non-zero tracked-time or duplicate-time for that track.

IV. EXAMPLE AND COMPARISON

As an illustrative example, consider the following 20 scan scenario with three tracks T1, T2 and T3, and three targets L1, L2and L3, as shown in Figure 3. :

xx x x x xt=7

T3

t=1

L1

t=1L3

t=2L2

T2

t=20

T1

x x x x x x x x xx

xx

xx

xx

o

oo

o

o

oo

oo

o

o

oo

o

oo

o

o

Figure 3. Example Tracking Scenario

For this simple example, the association process is illustrated in Figure 4.

123456789

1011121314151617181920

Tim

e

---11

11111

221--22

2112

-22--

21122

21121

122

22

1

----

---1

Steps 1 + 2

Targ

et 1

Targ

et 2

Targ

et 3

Steps 3 + 4

111

22111

222---

11

1

3

22

1

1

2

--

111

22211-

2

22

22

--

222

1

3

Trac

k 1

Trac

k 2

----

--

Trac

k 3

Step 5

}Duplicate

Figure 4. Association Process

Using �=1, the association matrix in this case is given by: Target: 1 2 3 None Track Trk Dup Trk Dup Trk Dup Div

1 7 0 5 0 1 0 3 2 4 0 7 2 0 0 5 3 0 0 0 0 0 0 6

none 6 - 3 - 7 - -

We can now compute and compare different MOPs:

Target: 1 2 3 Global

Rtrk .65 .80 .12 -

Rdiv .47 .53 0 -

Rdup 0 .13 0 -

TCC .44 .48 .09 .33

TPD .65 .67 .12 .48

(WA)CAR .64 .58 1 .62

As expected, the TCC gives lower scores than TPD and Rtrk, since it explicitly reduces target scores to account for track redundant and divergent times. The difference between Rtrk and TPD is that the latter only gives points for valid tracks, i.e. when the track-to-target mapping is one-to-one. If we plot all the results from the MOPs, we see that target 3 has a high weighted average CAR, while targets 1 and 2 have a high TPD and tracked-ratio Ri-trk.

Figure 5. Spider Plot of Different MOPs

From the association matrix, we can also see that track 3 has divergent time only (last column), i.e. it has no valid association and therefore it is a false track. The TCC accounts for this and is reduced accordingly (from 0.38 to 0.33).

2031

V. EXPERIMENTAL RESULTS

The TCC score is used in a tracker automated test program called Testor, along with other MOPs and metrics. This program inputs XML batch files that describe the tests to be run. In the example presented here, the batch consists of 27 recorded raw data files (radar returns) along with the corresponding ground-truth and exclusion-zones files. Out of the 27 files, 21 come from experimental data recorded with the Ranger R2/3 (formerly STS-1400/2800) radar at customer sites, and five come from the R3D (formerly STS-1400D) Doppler radar. The test batch also includes one simulated data file that tests tracker behavior for 40 different target maneuvers.

Figure 6. and Figure 7. show the PPI output for two different experimental test files. The Testor output statistics for the file of Figure 7 is shown in Figure 8. As can be seen on Figure 8, the program always compares the test statistics to the reference software version. This is very useful namely to evaluate the performance of a new algorithm or tuning, as well as making sure the results are identical when no changes to the algorithms were made. In the cases shown here, the newer algorithm tested simply consists in reducing the detection threshold to double the single-scan probability of false detection.

Figure 6. PPI view of R2 radar output

Figure 9. shows graphs made from the data that is exported by Testor. The data includes all important information about the tracks, targets (ground-truth information), contacts, as well as the tracker association information, i.e. which contacts were used to update which tracks.

Figure 7. PPI view of R3D radar output

Job#4 : 2010-08-09 11.43.08 ( JobStats ) +---------------------+-------------+-------------+----------------+------+ | Statistic Evaluated | Ref | 2.00 | 2.vs.Ref | Best | +---------------------+-------------+-------------+----------------+------+ | FileTime [s] | 647.85 | 647.85 | | | | NbTracks [] | 6.00 | 9.00 | 50% | | | DetectTime [s] | 20.00 | 20.00 | | | | NbTargets [] | 6.00 | 6.00 | | | | NbGood [] | 6.00 | 6.00 | | | | NbSwap [] | 0.00 | 0.00 | | | | NbDup [] | 0.00 | 0.00 | | | | NbFalse [] | 0.00 | 3.00 | -99999% | Ref | | NbUndet [] | 0.00 | 0.00 | | | | Pt [%] | 100.00 | 100.00 | | | | TargetTime [s] | 1922.00 | 1922.00 | | | | FalseTime [s] | 0.00 | 93.34 | -99999% | Ref | | Rf [%] | 0.00 | 14.41 | -14% | Ref | | DivTime [s] | 50.00 | 86.00 | -72% | Ref | | Rdiv [%] | 2.60 | 4.47 | -2% | Ref | | DupTime [s] | 0.00 | 0.00 | | | | Rdup [%] | 0.00 | 0.00 | | | | TrackedTime [s] | 1702.00 | 1711.00 | 1% | 2 | | Rtrk [%] | 88.55 | 89.02 | 0% | 2 | | Pd [%] | 92.04 | 94.64 | | | | InitTime [s] | 9.20 | 9.93 | | | | KillTime [s] | 10.38 | 12.30 | | | | PosErr_Min [m] | 0.05 | 0.04 | 15% | 2 | | PosErr_Max [m] | 52.28 | 31.58 | 40% | 2 | | PosErr_Avg [m] | 3.38 | 3.47 | -2% | Ref | | PosErr_RMS [m] | 5.17 | 5.11 | 1% | 2 | | | | | | | +---------------------+-------------+-------------+----------------+------+ | FAR [s] | 9999.90 | 215.95 | -98% | Ref | | TCC Score [%] | 83.75 | 69.58 | -14% | Ref | +---------------------+-------------+-------------+----------------+------+

Figure 8. Testor results for file of Figure 7.

Figure 9. (a) Data exported for file of Figure 7.

2032

Figure 9. (b)(c) Data exported for file of Figure 7.

Figure 10. shows the global statistics computed for the complete batch of 22 files recorded in non-Doppler mode. In these results, we use a relative target weighting factor � = 0.5.

Not surprisingly, the second column of the statistics show that probability of detection and tracked time are increased, and similarly, false and divergent track times increase as well. For the original algorithm, we have Ri-div = divergent time / target lifetime = 2370/29415 = 0.0806. Likewise, for the change, we have 3202/29415 = 0.1089 which is an increase of 3% (shown as a negative percentage in Testor since it is a performance degradation). The total false track time increases from 2212 to 3741 which is a 76% increase. The overall effect on the TCC score is negative (4% decrease), and the false-alarm rate (shown as the inverse: the mean-time between false tracks) is increased also.

Overall Performance ------------------- +---------------------+--------------+--------------+-------------+------+ | Statistic Evaluated | Ref | 2.00 | 2.vs.Ref | Best | +---------------------+--------------+--------------+-------------+------+ | FileTime [s] | 12099.58 | 12099.58 | | | | NbTracks [] | 221.00 | 235.00 | 6% | | | DetectTime [s] | 20902.00 | 21399.00 | 2% | 2 | | NbTargets [] | 175.00 | 175.00 | | | | NbGood [] | 203.00 | 204.00 | 0% | 2 | | NbSwap [] | 0.00 | 0.00 | | | | NbDup [] | 0.00 | 0.00 | | | | NbFalse [] | 18.00 | 31.00 | -72% | Ref | | NbUndet [] | 4.00 | 4.00 | | | | Pt [%] | 97.71 | 97.71 | | | | TargetTime [s] | 29415.00 | 29415.00 | | | | FalseTime [s] | 2122.20 | 3741.08 | -76% | Ref | | Rf [%] | 17.54 | 30.92 | -13% | Ref | | DivTime [s] | 2370.00 | 3202.00 | -35% | Ref | | Pdiv [%] | 8.06 | 10.89 | -3% | Ref | | DupTime [s] | 76.00 | 81.00 | -7% | Ref | | Rdup [%] | 0.26 | 0.28 | -0% | Ref | | TrackedTime [s] | 22383.00 | 22487.00 | 0% | 2 | | Rtrk [%] | 76.09 | 76.45 | 0% | 2 | | Pd [%] | 71.06 | 72.75 | 2% | 2 | | InitTime [s] | 4.55 | 4.48 | | | | KillTime [s] | 5.02 | 5.74 | | | | PosErr_Min [m] | 0.00 | 0.00 | 14% | 2 | | PosErr_Max [m] | 28.74 | 38.19 | -33% | Ref | | PosErr_Avg [m] | 2.12 | 2.25 | -6% | Ref | | PosErr_RMS [m] | 3.41 | 3.55 | -4% | Ref | | | | | | | +---------------------+--------------+--------------+-------------+------+ | FAR [s] | 672.20 | 390.31 | -42% | Ref | | TCC Score [%] | 65.58 | 61.11 | -4% | Ref | +---------------------+--------------+--------------+-------------+------+

Tracker: stats/stats_AllTests_2012-02-27 04.52.08.xml TCC Score was 40% better, 9% same, 50% worse. (total of 22 files) TCC Score was 24% better, 47% same, 28% worse. (total of 175 tracks)

FAR was 4% better, 72% same, 22% worse. (total of 22 files)

Figure 10. Testor results for R2/3 batch

In future analyses, it would be interesting to compare the effects of duplicate, divergent and false track times on different metrics and MOPs. Using the Track Coverage Consistency (TCC), all three types of degradations are accounted for, however their effects are intertwined in the overall score.

VI. CONCLUSION

This paper presented a new track scoring MOP that was developed to rapidly assess and compare detection and tracking algorithms, in the context of radar perimeter surveillance. The TCC MOP is directly related to the percentage-time a target is tracked, furthermore it is also affected by redundant, unstable/divergent, and false track time. It can therefore be seen as a general MOP that captures namely:

• Target probability of detection • Track continuity, maintenance and lifetime • Track consistency/correctness/confidence • Spurious, unstable and false tracks • Track loss and seduction • Premature and late track initiation and deletion • Undetected/missed targets

To some extent, the TCC also accounts for response time and segmented/broken tracks, however it does not quantify track purity or track identification swap. Since these issues, especially timeliness, were identified as being important for perimeter surveillance, it would be possible to augment the TCC to account for them as well. Since TCC already encompasses many criteria, future work will focus instead on adding more MOPs to the Testor program, namely for timeliness, e.g. the percentage of targets tracked within 10 seconds.

2033

One of the main challenges of using the new MOP and the Testor program is the necessity of having ground-truth data. Since most experimental tests are not instrumented, ground-truth files are typically obtained by analysis of the radar contacts and reverse-engineering. In order to alleviate this task, the Testor program outputs a text file that is used as the first draft for a ground-truth scenario file. The automatic generation of better ground-truth files is currently the subject of ongoing research.

ACKNOWLEDGEMENT

For their contributions to this work over the years, the authors would like to thank Nicolas Therrien, Maxime Larocque, Rui Yang, Kevin Currie, Martin Gingras, Jean-Pierre Soucy and Patrick Lamontagne. The continuous support of Pierre Poitevin as well as Thia Kirubarajan and his team is also greatly appreciated.

REFERENCES

[1] S. Blackman and R. Popoli, Design and Analysis of Modern Tracking Systems, Artech House Publisher, Boston, 1999.

[2] Y. Bar-Shalom & X. Li, Multitarget-Multisensor Tracking: Principles and Techniques, YBS, New York, 1995.

[3] E. Blasch and P. Valin, “Track Purity and Current Assignment Ratio for Target Tracking and Identification Evaluation,” Int. Conf. on Information Fusion, 2011.

[4] Gorji, A. A., Tharmarasa, R. and T. Kirubarajan, “Performance Measures for Multiple Target Tracking Problems”, International Conference on Information Fusion, 2011.

[5] B. Ristik, “A Tool for Track-While-Scan Algorithm Evaluation”, Proceedings of Information, Decision and Control, pp 105-110, Australia, 1999.

[6] S. B. Colegrove, L. M. Davis and S. J. Davey, “Performance Assessment of Tracking Systems”, International Symposium on Signal Processing and its Applications (ISSPA), pp. 188-191, August 1996.

[7] Ristric. B, Vo. B and Clasrk. D, “Performance evaluation of multi-target tracking using the OSPA metric”, International Conference on Information Fusion, 2010.

[8] O. E. Drummond, “Methodologies for Performance evauation of Multitarget Multisensor tracking,” Proc. SPIE, Vol. 3809, pp. 355-369, 1999.

[9] L. D. Smith, “A comparison of Track to Truth Assignment Methods”, 2011 IEEE Aerospace Conference,. 2011.

[10] W. D. Blair, “Multitarget Tracking Metrics for SIAP systems”, International Conference on Information Fusion, 2008.

[11] H. W. Kuhn, “The Hungarian method for the assignment problem”, Naval Research Logistics Quarterly, No. 2, pp. 83-97, 1995.

[12] S. Deb, K, Pattipati and Y. Bar-Shalom, “A multisensory-multitarget data association algorithm for heterogeneous sensors,” IEEE Transactions on Aerospace and Electronic Systems, Vol. 29, No. 2, pp. 523-538, April 1997.

[13] S. Blackman, Multiple Target Tracking with Radar Applications, Dedham, MA: Artech House, 1986.

[14] L. Mark, Performance comparison of 2D assignment algorithms for assigning truths objects to measured tracks, Proceedings of SPIE Signal and Data Processing of Small Targets, 2000.

2034

Documents

A Track Scoring MOP for Perimeter Surveillance …fusion.isif.org/proceedings/fusion12CD/html/pdf/276_485.pdfscan systems. Blair [10] describes a set of metrics used to evaluate performance