ACE ProposersDay ProgramBrief · 2019-06-07 · ACE will apply machine learning to large training exercises to experiment, explore mosaic tactics in the real world Red Flag: two-week

Air Combat EvolutionDan “Animal” Javorsek, PhD

Program Manager, DARPA/STO

Demonstrate trusted, scalable, human-level autonomy for air combat

A.C.E.

ACE Proposers Day

17 May 2019

Distribution A: Approved for Public Release, Distribution Unlimited

Future U.S. Combat Success Requires AI Capable UAVs

Can we use existing methods designed for humans to mature autonomy?

2

“In the future, it is desirable to have each

operator control multiple unmanned

systems, thus shifting the human’s role from

operator towards mission manager.”

Unmanned Systems Roadmap, 2018


Build performance and trust the way we do with humans

Striker Escort

Suppression of Enemy Air Defenses

Point Protection

Traffic Avoidance

Autopilot

Terrain Avoidance

Navigation

Mosaic Warfare

3

Dogfight

Physics-Based Maneuver Systems

Nonlinear Interactive

Systems

Low

erPr

oble

m C

ompl

exity

H

ighe

r

Lower Cognitive Workload Higher

Dogfight is gateway to nonlinear combat autonomy

Combat autonomy is stuck here!


• Need performance from automated tactical decision making• Must build pilot trust in combat automation• Scale performance and maintain trust up the stack• Demonstrate performance on increasingly realistic platforms

Technical Challenges

4

local

globa

l

will push combat autonomy up the stack

Maneuver

Individual Tactical Behaviors (1v1)

Team Tactical Behaviors (2v1, 2v2)

Multi-aircraft Operational Behaviors

Heterogeneous Multi-aircraft Strategic Behaviors

incr

easi

ng n

onlin

earit

y

current automation lives here

ACE will build scalable performance and trust in combat autonomy

AlphaDogfight

AlphaMosaic

darpa.mil


ACE Program Structure

5

darpa.mil


ACE Program Schedule

6

FY 2019 FY 2020 FY 2021 FY 2022 FY 2023

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

Task

AFWERX AlphaDogfight Trials – AFRL

Technical Area 1Increase performance for local behaviors• Develop & evaluate 1v1 & 2vX dogfight algorithms• Increase performance via complexity build-up

Technical Area 2 (Human Use)Build trust for local behaviors• Create Human-Machine Interface• Implementation of Dual Operational Task (DOT)

Trust Assessments for all phases

Technical Area 3Scale performance & trust to global behaviors• Learning transference of TA1 algorithms applied

to large force exercise data analytics• DOT Mission Commander scenario development

Technical Area 4 Full-scale experimentation infrastructure• Full-scale (FS) aircraft purchase, modification,

airworthiness, training, and testing• Implementation and assessment of TA1

algorithms and TA2 HMI

Experimentation Integration Team (EIT)• Interface control documentation (ICD) and

application programming interface (API) development & maintenance

• Lead ICD/API working groups• M&S and sub-scale (SS) environment

development and performance assessment

Phase 1: M&S Phase 2: Sub-scale (SS) Phase 3: Full-scale (FS)Competition

1 Performer

FSPurchase

FS Modification AirworthinessComplete

SS 1v1 Training

1v1 Training

Test PlanComplete

1v1 Comp

SSPurchase

2v1 Training

2v1 Comp

FlightTesting

SSCompetition

SS 2v1Training

GroundDemo

FlightDemo

1 Performer

1 Performer

Single Experimentation Integration Team

This BAA Multiple Performers

2v2 Comp

2v2 Training

SS 2v2Training

ModPlan Complete

SS Modification

AdversaryA/C Decision

Multiple Performers

TA1 Performer(s)



7

…to be released at a later date


TA1: Build combat autonomy for local behaviors• Challenge: Dogfight represents a new class of games

- Continuous, unbounded, incomplete knowledge

- Adversary can actively conceal/deceive

- High-tempo with simultaneous players

• Insights: - Contrary to popular belief the Dogfight is manifold bounded

- Established tech base actively addressing these challenges

- Hybridized AI approaches blended with rules-based tree search show strong promise

• Technical Area 1 (TA1) Objectives:- Develop and demonstrate within visual range (WVR) individual

and team control algorithms

- Implementation in M&S, sub-scale unmanned aerial vehicles (UAVs), and full-scale combat representative aircraft

- Success metric: Win Probability (PW)

8

Consequence-Normalized Crosscheck Ratio (RN)

Win

Pro

babi

lity

(Pw)

1v1

Phase 1:M&S

performance

Phase 2:Sub-scale50/50

Phase 3:Full-scale

trust

Sim Combat AircraftCommercial

UAVs

perfo

rman

ce

trust

2v1

2v2

TA1 increases performance of dogfight automation in increasingly realistic scenarios

Game Complexity:(state-space complexity)

Tic-tac-toe Checkers Chess Go

Sequential Games

Drivingor Atari

Board Games Atari StarcraftPoker

103 1020 1047 10170

0.5 30060

State of the art AI exceeds requirement for ACE in many dimensions

Information Observability:(perfect vs imperfect info)

Tempo:(actions per minute)

Starcraft

10270

Starcraft

Not solicited in this BAA…planned for Oct 2019Sim graphic source: Ernest et al., J Def Manag 2016


• Attract non-traditional DARPA performers- AI video gaming world - Utilize the AFWERX Other Transactional Consortium and AFRL’s

Autonomy Research Collaboration Network (ARCNet)

• AFWERX AlphaDogfight Trials (solicitation June 2019) - Modeled after StarCraft 2, Defense of the Ancients 2, Quake III bot ladders- https://www.afwerxchallenge.com/- Prove algorithms against game adversary and each other

DARPA-AFWERX-AFRL Collaboration

9

© 2019 BLIZZARD ENTERAINMENT

© 2019 FlightGear© 2019 Digital Combat Simulator

© 2

019

Dig

ital C

omba

t Sim

ulat

or

© 2019 Falcon 40



10Distribution A: Approved for Public Release, Distribution Unlimited

TA2: Build trust for local behaviors• Challenge: Modeling pilot trust

- Trust is a subjective relational experience - Trust depends on performance, situation, & consequences

• Insights: - Crosscheck ratio is one reflection of pilot trust if given an

appropriate dual operational task paradigm - Crosscheck ratio can be measured using commercial

eyetrackers

• Technical Area 2 (TA2) Objectives:- Develop experimental methodology for modeling and

measuring pilot trust in the dogfight combat autonomy- Design and develop Human-Machine Interfaces (HMIs)- Model and measure pilot trust using a Dual Operational Task

(DOT) implementation- Provide plan for Institutional Review Board (IRB) approval for

all Human Subjects Research (HSR)- Success metrics: Crosscheck Ratio (RN), Trust Calibration

Error (e)

11


Win

Pro

babi

lity

(Pw)

1v1

Phase 1:M&S

performance


Phase 3:Full-scale

trust


UAVs

perfo

rman

ce

trust

2v1

2v2

Win

Pro

babi

lity

(Pw)

Crosscheck Ratio (R)(Unmonitored/Monitored Timeshare)

Trust Calibration Error (e)

Unlimited(Fighter)

Limited(Bomber

)

Trust Calibration

Curvefrom M&S

Measure workload distribution

TA2 increases trust in dogfight automation in increasingly realistic scenarios

Unaware(Cruise Missile)

Calibrate pilot trustMission Commander Task

Dogfight Task

Div

ide

the

pilo

t’s a

ttent

ion

Sim graphic source: Ernest et al., J Def Manag 2016

darpa.mil

Source: USAF

Duchowski, A. T. (2018)

©2019 Designtechnica

Corporation


Cross TA interactions featuring TA2

12

darpa.mil



13

darpa.mil


TA3 scales performance and trust to global behaviors in simulation

TA3: Scale performance & trust to global behaviors• Challenge: Extending learning to new scales without developing

independent algorithms at each level- Tailored algorithms for each scale can produce new behaviors- Aircraft capabilities (weapons, sensors, performance) vary widely and must

be incorporated- Algorithms retraining necessary when new information is introduced

• Insights: - STO seedling data suggests that algorithms can be quickly and consistently

adapted from one scale to another - Implementation of machine learning transference neural network

• Technical Area 3 (TA3) Objectives:- Develop data set and model for large force exercise data analytics- Develop Dual Operational Task Mission Commander scenarios- Scale local combat autonomy to, and develop battle management for, large

force exercise data analytics- Quantify relationship between local behavior and global behavior

performance metrics- Success metric: Kill Ratio (RK)

14


Kill

Rat

io (R

K)Sim

perfo

rman

ce

trustSIA: Semi-intelligent Autonomy Sim graphic source: Ernest et al., J Def Manag 2016Distribution A: Approved for Public Release, Distribution Unlimited

Analytical Models Can Aid in Identifying Tactics Otherwise Unthinkable*

15

This is a crazy idea, right?29Aug2014 – Dodgers (Mattingly) employ four

man shift, Padres hitter (Smith) grounds out

*limited by multiple factors: creativity, complexity, convention, training

Standard Infield Deploymentconfiguration generally deployed for 100+ yrs

Source: Fan Graphs

1223% increase in shifts since 2011 – Source: 538.comNot completely analytically derived – used against Ted Williams in the ‘40s

David Ortiz, one of the best hitters in baseball, becomes below average against shift

Analytically derived infield shift deployment against

David Ortiz

Ortiz BABIP w/out shift (bottom left) 0.341

Ortiz BABIP w/ shift (top right) 0.284

Runs saved w/ shift 11**BABIP = Batting Average on Balls in Play; average BABIP ~ 0.300

**Baseball analysts generally equate 10 runs equal to a win

Currently lack real world data set to even analyze, develop mosaic tactics, strategies beyond M&S Distribution A: Approved for Public Release, Distribution Unlimited

16

Explosion of tracking data made it possible to apply machine learning to build increasingly fine-grained models of player and team behavior

Data-driven “Ghosting” allows for scalable quantification, analysis, and comparison of player and team behavior

ACE will apply machine learning to large training exercises to experiment, explore mosaic tactics in the real world

Red Flag: two-week advanced aerial combat training exercise held several times a year by the United States Air Force.

AlphaMosaic

source: Le, Carr, Yue, Lucey; Data-Driven Ghosting using Deep Imitation Learning 2017

source: af.mil



17

darpa.mil

darpa.mil


ACE Program Structure: TA4


TA4: Full-scale Air Combat Experimentation Infrastructure• Aircraft modification background:

- DARPA Controlled Safety Review Process - Aircraft capable of dogfight maneuvers

o Existing autopilots capable of 3D maneuverso Architectures capable of real-time insertion of data streams into the

functioning operational systemo Two seats (safety pilot + evaluation pilot)

• Objectives:- Supply full-scale aircraft and integrate dogfighting

algorithms- Develop and integrate HMIs for full-scale aircraft- Retain safety pilot override controls and/or autopilot

disconnect for trust assessments- Perform all safety/airworthiness reviews for supervised

live dogfight engagements- Execute full-scale live flight experiments

19


Win

Pro

babi

lity

(Pw)

1v1

Phase 1:M&S

performance


Phase 3:Full-scale

trust


UAVs

perfo

rman

ce

trust

2v1

2v2

Sim graphic source: Ernest et al., J Def Manag 2016Distribution A: Approved for Public Release, Distribution Unlimited


20

darpa.mil


Cross TA interactions, all TAs

21

darpa.mil


Metrics: Build trust in AI the same way we do with pilots

22

Phase 1 Phase 2 Phase 3

M&S Subscale Full-scale

TA1: Increase performance for local behaviors

Win Probability (PW): Limited Th: 50% Ob: 100%

For 2D: unopposed, unaware, O/D/HA-limited, O/D/HA-unlimited; 3D: repeat all

Win Probability (PW): Limited Th: 75% Ob: 100%

For 2D: unopposed, unaware, O/D/HA-limited, O/D/HA-unlimited; 3D: repeat all

Win Probability (PW): Limited 1v1 Th: 75% Limited 2v1 Th: 90%Limited 2v2 Th: 80% Ob: 100%

TA2: Build trust for local behaviors

CrosscheckRatio (R):Th: 0.50Ob: 0.95

N/A


Trust Calibration Error (e):Th: 0.20Ob: 0.0


Trust Calibration Error (e):Th: 0.10Ob: 0.0

Phase 1 Phase 2 Phase 3

M&S

TA3: Scale performance & trust to global behaviors

Mission Commander Scenarios:Th: 3Ob: 5

Kill Ratio (RK): Th: 10:1Ob: 50:1

Kill Ratio (RK): Th: 30:1Ob: 50:1

Th: ThresholdOb: Objective

O/D/HA: Offensive/Defensive/High Aspect Initial ConditionsUnopposed: Pre-planned maneuvers Unaware: Station keeping on unaware adversary

Limited: Baseline adversary with standard gameplan, limited maneuver potential, and thrustUnlimited: Adversary with no gameplan, maneuver potential, or thrust restrictions

Ernest et al., J Def Manag 2016

Ernest et al., J Def Manag 2016


• Overall Scientific and Technical Merit- Standard DARPA BAA language

- Technical Area 2 proposals should:

• Emphasize Dogfight task HMI which is integral to trust assessment (HMI can affect trust independent of algorithm performance)

• Develop a Mission commander task HMI that is representative enough to perform dual operational task evaluations

- Technical Area 3 proposals should:• Provide a detailed model architecture, data analytics plan

• Include an implementation plan for AFSIM and NGTS with relative merits for each or reason for proprietary environment

- Technical Area 4 proposals should:

• Consider different cost options (lease vs buy) with price per flight hour (including operations and maintenance) and operational tempo limitations (flights per week per aircraft)

• Consider alternate platforms and human-only adversary options to enable cost and schedule flexibility

• Include recommended partnership information (POC, availability, etc) if considering government furnished operational aircraft

• Potential Contribution and Relevance to the DARPA Mission• Standard DARPA BAA language

• Cost and Schedule Realism• Standard DARPA BAA language

Source Selection and Evaluation Criteria


• Teaming encouraged• Highlight previous experience • Schedule

- Proposers Day – 17 May 2019- BAA released – May 2019- Optional 1-on-1s – 05 June, 07 June 2019

- Email: [email protected] by 29 May 2019 to request - FAQ/Questions Due Date – 07 June 2019- Full Proposals Due – BAA release + 45 days

Submission Highlights



Documents

ACE ProposersDay ProgramBrief · 2019-06-07 · ACE will apply machine learning to large training exercises to experiment, explore mosaic tactics in the real world Red Flag: two-week