27
Classification Trees in Army Application If the only tool you have is a hammer... Barry A. Bodt US Army Research Laboratory Heavy metal and rabbit sperm Target identification False alarm rates in intrusion detection Course of action analysis in military planning Network traffic

Classification Trees in Army Application If the only tool you have is a hammer... Barry A. Bodt US Army Research Laboratory Heavy metal and rabbit sperm

Embed Size (px)

Citation preview

Classification Trees in Army ApplicationIf the only tool you have is a hammer...

Barry A. BodtUS Army Research Laboratory

• Heavy metal and rabbit sperm

• Target identification

• False alarm rates in intrusion detection

• Course of action analysis in military planning

• Network traffic

Heavy Metal and Rabbit SpermBackground

Key Points

• Observed that males in contact with heavy metals (e.g., lead) hadlow fertility rates

• Low fertility rate was not associated with the percent motile sperm

• Independently found that capacitation (ability to fertilize) of sperm, unobservable, was related to an observable termed hyperactivated motility

• Conjectured that while heavy metals did not necessarily kill spermthey might prevent hyperactivated motility associated with capacitation

• Why did the Army care? 1) soldier exposure, 2)biomarkers

Data Collection

• Computer assisted videomicrography

• Track rabbit sperm cells at 30 frames/sec

• Solution preparation conditions consistent with hyperactivated cells(322) and hyperactivated cells (899)

• For individual cell motion, tracked straight line velocity (VST), curvilinear velocity (VC) [um/sec], average amplitude of lateral headdisplacement (AALH)[um], beat cross frequency (BCF), Wob, etc.

• Wobble (Wob) was the ratio of the average path velocity (VAP) (7 frame smooth) to the VC (piecewise path computation).

Avr PathPath

Summary of Individual Classification Ability

Classification Results

0 50 100 150 200 250 300 350

1.0

0.8

0.6

0.4

0.2

0.0

WOB

VC

DiscriminantModel

CARTH - hyperactivatedN - non-hyperactivated

299 H 3 N

2 H 864 N

0 H 9 N

0 H 14 N

21 H 9 N

Note: used Systat Cart add-on module and FACT

Effect of Pb on Sperm Motility

Percent Motile Hyperactivated

Percent Motile

PMH PM

Time Time

Concentration Concentration

8080

00

Target IdentificationBackground

• Goal: Broadly to identify certain targets based on acoustic andseismic features

Specifically to uncover a minimal set of features thatmaximally separate targets

• Past Work: Simple power spectral estimates

Back-propagation neural network

• Approach:Focus on features … peek inside the “black box”

Key Points

Data Collection

• Four vehicles

• 2 runs for each vehicle on a predetermined far-near-far path

• 1 run consists of approximately 125 contiguous 1-second windows

• 1,041 windows were analyzed

• 11-valued acoustic feature vector for each 1-second window

• 4 seismic features for each 1-second window … first four central moments

• Data were separated into learning and test samples in the ratios 9:1,8:2, 7:3, 6:4.

Exploratory Data Analysis

4321

1.0

0.8

0.6

0.4

0.2

0.0

Group

H3

4321

1.0

0.8

0.6

0.4

0.2

0.0

Group

H6

Box Plot Summaries

Box Plot of H3 by Group Box Plot of H6 by Group

Exploratory Data Analysis

1 2 3 4

1.00.90.80.70.60.50.40.30.20.10.0

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

H7

H4

1.00.90.80.70.60.50.40.30.20.10.0

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

H3

H7

Scatter Plot Summaries

Scatter Plot of H4 vs H7 Scatter Plot of H3 vs H7

Classification and Regression Trees Models

Model Development

H5 > 0.10

1

8

2

5

3 4

Skw. > 1.25

H4 > 0.05

H2 > 0.19

H5 > 0.05

Kur. > -0.38

H3 > 0.07

6 7H4 > 0.12 9 10

Fund. > 0.19

CART Tree (Default) Based on the 9:1 Learning Sample

Group 1 ( 82.0 % )Group 2 ( 77.7 % )Group 3 ( 71.9 % )Group 4 ( 80.8 % )

Note: used Statistica Cart and Quest

Test Sample ResultsGroup 1 ( 68.8 % )Group 2 ( 70.8 % )Group 3 ( 87.5 % )Group 4 ( 83.3 % )

CART Tree (Depth = 3) Based on the 9:1 Learning Sample

H5 > 0.099

1 2 3

Skw. > 1.248 H2 > 0.193

4 5Fund. > 0.222

Model Sensitivity

H5 > 0.10

1 2 5

3 4

Skw. > 1.32

H4 > 0.06 H2 > 0.19

6

Fund. > 0.26

8:2 7:3 6:4Group 1 | 78.6 % 64.7% 70.9%Group 2 | 51.2 % 62.1% 57.7%Group 3 | 80.3 % 76.2% 78.6%Group 4 | 79.1 % 81.8% 78.3%

CART Tree (Depth = 3) Based on the 8:2 Learning Sample

Test Sample Results

Discriminant Analysis

Model Development

Root 2

Root 1

G 4G 3G 2G 1

6420-2-4-6

6

4

2

0

-2

-4

-6

Scatter Plot of Discriminant Roots for 9:1 Test Sample

Scatter Plot of Discriminant Roots for 9:1 Test Sample

Root 3

Root 1

G 4G 3G 2G 1

6420-2-4-6

4

3

2

1

0

-1

-2

-3

-4

Discriminant and CART Model Comparisons

Learning / Test Sample Ratio9:1 8:2 7:3 6:4

Group C D C D C D C D1 68.8 87.5 78.6 83.9 64.7 64.7 70.9 76.62 70.8 75.0 51.2 51.2 62.1 54.5 57.7 57.73 87.5 87.5 80.3 84.8 76.2 79.8 78.6 90.34 83.3 75.0 79.1 83.7 81.8 87.0 78.3 73.6

Total 76.9 81.7 73.6 77.4 71.5 72.1 71.4 75.0

Learning-to-Test Sample Ratio9:1 8:2 7:3 6:4

FEATURE C D C D C D C DFUND H2 H3 H4 H5 H7 H8 Skewness

TOTAL 4 5 5 6 5 4 5 6

False Alarm Rate in Intrusion DetectionBackground

Key Points

• Purpose: to develop a filter to reduce the false alarm rate

• Network intrusion is a big concern; many safeguards are imposed

• JIDS from LLNL yields one-hour snapshots of user activity and network response

• Weigh activity against hits on a flag file of strings associatedwith intrusion (e.g., “Permission denied”, “Hosing Trusted Host”)

• Alerts are generated when threshold counts are exceeded for flagfile entries. A severity index is also considered.

• Investigate alerts in light of user log; many false alarms; need automated “collective” interpretation of strings

Data Collection

• Data gathered at ARL over a past year had 940 alerts

• Network administrator classified user activity as legitimate (644),attempted break-in (285), successful break-in (11)

• C-program preprocessed activity log creating 259 columnsaugmented with 1 user-intent column x 940 rows

• Cells are frequencies of jth string in ith alarm

• Dimensionality and a sparse matrix pose a challenge

• Preliminary screen for useful features using SPSS discriminant analysis; Statistica CART would not allow the 260 columns

• Completed analysis with CART and Quest in Statistica

CART Tree

Figure 3. Classification Tree for User Class

1

2 3

4 5 6 7

8 9

10 11

12 13

14 15

V26 .5

V50 .5 V32 .5

V164 2.5

V3 .5

V48 4.5

V46 1.

328 612

314 14 608 4

605 3

595 10

565 30

11 19

2

2 3

2 3 3 1

3 1

3 2

3 1

1 3

Classification Results

Observed User Intent

Percent Correctly Classified

Predicted User Intent

Successful Break-in

Attempted Break-in

LegitimateActivity

Successful Break-in

90.9 10 0 1

Attempted Break-in

100 0 285 0

LegitimateActivity

92.7 8 39 597

 

Course of Action (COA)Background

Key Points

• COA Goals• accomplish the mission• while positioning the force• to retain initiative for future operations

• Importance of battlefield metrics (during battle) to decisions inlight of COA goals

• High fidelity simulation, One Semi Automated Forces (OneSAF)

• Study purpose: to uncover patterns in the data relating early battlefield conditions to ultimate battle outcome, thereby providing support for battlefield metrics

Battle Scenario

Company Objective

Town

BMP-2

BMP-2

BMP-2

T-80

T-80

T-80 T-80

T-72M T-72M

T-72M

T-72M

T-72M

Data Collection

• Scenario runs require direct oversight and take 30 - 90 minutes;preprocessing requires an additional 1 hour on average

• A variety of machines are used including SGI Octane2, SGI OnyxHPC Sun 1000s, Sun UltraSparc 60

• Killer-victim scoreboard utility for OneSAF was developed toextract detailed battlefield information

• Data matrix is (4 responses+ 429 measures [distributed over 3 time slices] + 1 time stamp + 1 machine ID) x 231 battles (25 available)

• Time-slice stopping times linked to blue munition use 10%, 25%, 45%.

• Example Measures• number of 125 HEAT taken by Platoon 1 by time 1• damage level of Platoon 1 by time 2• number of 105 SABOT hits by Platoon 3 on T80 by time 1

Benefits of Damage Inflicted by Platoon 3

MBTSCORE

DRP3S1

DRP3S2

DRP3S3

DRP2S2

1086420-2

8

7

6

5

4

3

2

1

A Trade-off Between Platoons 1 and 2

Network Traffic and Computer SecurityFuture Work

Key Points

• User activity profiles

• 350 MB of synthesized network traffic data

• Classfication trees will have a role

References

R.J. Young, B.A. Bodt, Development of Computer Directed Methods for the Identificationof Hyperactivated Motion Using Motion Patterns Developed by Rabbit Sperm During Incubation Under Capacitation Conditions, Journal of Andrology, 15: 362-377, July, 1994.

R.J. Young, B.A. Bodt, D.H. Heitkamp, The Action of Metallic Ions on the PrecociousDevelopment by Rabbit Sperm of Motion Patterns that are Characteristic of Hyperactivated Motility, Molecular Reproduction and Development, 41: 239-248, June, 1995.

B.A. Bodt, An Analysis of the Discriminating Utility of Acoustic and Seismic Signatures for a Vehicle Classification Example, ARL-TR-1874, January 1999.

L. Eggen, B. Bodt, H. Kash, C. Hansen, "Reducing the False Alarm Rate in Information Assurance," ARL-TR-2348, July, 2001.

Bodt, B. et al., “Pursuit of New Battlefield Metrics through Simulation and Statistical Modeling,”70th Military Operations Research Society Symposium, June, 2002.