Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Edge Cases andAutonomous Vehicle
SafetySSS 2019, Bristol7 February 2019
© 2019 Philip Koopman
Prof. Philip Koopman
@PhilKoopman
2© 2019 Philip Koopman
Making safe robots Doer/Checker safetyEdge cases matter Robust perception mattersThe heavy tail distribution Fixing stuff you see in testing
isn’t enoughPerception stress testing Finding the weaknesses in perception
UL 4600: autonomy safety standard
Overview
[General Motors]
3© 2019 Philip Koopman
98% Solved For 20+ YearsWashington DC to San Diego CMU Navlab 5 Dean Pomerleau Todd Jochem
https://www.cs.cmu.edu/~tjochem/nhaa/nhaa_home_page.html
AHS San Diego demo Aug 1997
July1995
4© 2019 Philip Koopman
1985 1990 1995 2000 2005 2010
DARPAGrand
Challenge
DARPALAGR
ARPADemo
II
DARPASC-ALV
NASALunar Rover
NASADante II
AutoExcavator
AutoHarvesting
Auto Forklift
Mars Rovers
Urban Challenge
DARPAPerceptOR
DARPAUPI
Auto Haulage
Auto Spraying
Laser Paint RemovalArmy
FCS
Carnegie Mellon University Faculty, staff, studentsOff-campus Robotics Institute facility
NREC: 30+ Years Of Cool Robots
SoftwareSafety
5© 2019 Philip Koopman
The Big Red Button era
Before Autonomy Software Safety
6© 2019 Philip Koopman
Traditional Validation Meets Machine Learning Use traditional software
safety where you can
..BUT..
Machine Learning (inductive training) No requirements
–Training data is difficult to validate No design insight
–Generally inscrutable; prone to gaming and brittleness
?
7© 2019 Philip Koopman
APD (Autonomous Platform Demonstrator)
TARGET GVW: 8,500 kg TARGET SPEED: 80 km/hr
Approved for Public Release. TACOM Case #20247 Date: 07 OCT 2009
Safety critical speed limit enforcement
8© 2019 Philip Koopman
Specify unsafe regions
Specify safe regions Under-approximate to simplify
Trigger system safety responseupon transition to unsafe region
Safety Envelope Approach to ML Deployment
UNSAFE!
9© 2019 Philip Koopman
“Doer” subsystem Implements normal, untrusted functionality
“Checker” subsystem – Traditional SW Implements failsafes (safety functions)
Checker entirely responsible for safety Doer can be at low Safety Integrity Level Checker must be at higher SIL
(Also known as a “safety bag” approach)
Architecting A Safety Envelope SystemDoer/Checker Pair
Low SIL
High SILSimpleSafetyEnvelopeChecker
ML
10© 2019 Philip Koopman
Validating an Autonomous Vehicle Pipeline
ControlSystems
ControlSoftwareValidation
Doer/CheckerArchitecture
AutonomyInterface To
Vehicle
TraditionalSoftwareValidation
Perception presents a uniquely difficult assurance challenge
Randomized& HeuristicAlgorithms
Run-TimeSafety EnvelopesDoer/Checker
Architecture
MachineLearning
BasedApproaches
???
11© 2019 Philip Koopman
Validation Via Brute Force Road Testing? If 100M miles/critical mishap… Test 3x–10x longer than mishap rate Need 1 Billion miles of testing
That’s ~25 round tripson every road in the world With fewer than 10 critical mishaps…
12© 2019 Philip Koopman
Good for identifying “easy” cases Expensive and potentially dangerous
Brute Force AV Validation: Public Road Testing
http://bit.ly/2toadfa
13© 2019 Philip Koopman
Safer, but expensive Not scalable Only tests things you have thought of!
Closed Course Testing
Volvo / Motor Trend
14© 2019 Philip Koopman
Highly scalable; less expensive Scalable; need to manage fidelity vs. cost Only tests things you have thought of!
Simulation
http://bit.ly/2K5pQCN
Udacity http://bit.ly/2toFdeT
Apollo
15© 2019 Philip Koopman
You should expect theextreme, weird, unusual Unusual road obstacles Extreme weather Strange behaviors
Edge Case are surprises You won’t see these in testing
Edge cases are the stuff you didn’t think of!
What About Edge Cases?
https://www.clarifai.com/demo
http://bit.ly/2In4rzj
16© 2019 Philip Koopman
Unusual road obstacles & obstacles Extreme weather Strange behaviors
Just A Few Edge Cases
http://bit.ly/2top1KD
http://bit.ly/2tvCCPK
https://dailym.ai/2K7kNS8
https://en.wikipedia.org/wiki/Magic_Roundabout_(Swindon)
https://goo.gl/J3SSyu
17© 2019 Philip Koopman
Where will you be after 1 Billion miles of validation testing?
Assume 1 Million miles between unsafe “surprises” Example #1:
100 “surprises” @ 100M miles / surprise– All surprises seen about 10 times during testing– With luck, all bugs are fixed
Example #2: 100,000 “surprises” @ 100B miles / surprise– Only 1% of surprises seen during 1B mile testing– Bug fixes give no real improvement (1.01M miles / surprise)
Why Edge Cases Matter
https://goo.gl/3dzguf
18© 2019 Philip Koopman
The Real World: Heavy Tail Distribution(?)
Common ThingsSeen In Testing
Edge CasesNot Seen In Testing
(Heavy Tail Distribution)
19© 2019 Philip Koopman
Need to find “Triggering Events” to inject into sims/testing
The Heavy Tail Testing Ceiling
20© 2019 Philip Koopman
Need to collect surprises Novel objects Novel operational conditions
Corner Cases vs. Edge Cases Corner cases: infrequent combinations
– Not all corner cases are edge cases Edge cases: combinations that behave unexpectedly
Issue: novel for person ≠ novel for Machine Learning ML can have “edges” in unexpected places ML might train on features that seem irrelevant to people
Edge Cases Part 1: Triggering Event Zoo
https://goo.gl/Ni9HhU Not A Pedestrian
21© 2019 Philip Koopman
A scalable way to test & train on Edge Cases
What We’re Learning With Hologram
Your fleet and your data lake
Hologram cluster tests
your CNN
Hologram cluster
identifies weaknesses
& helps retrain your CNN
Your CNN becomes
more robust
22© 2019 Philip Koopman
Edge Cases Part 2: Brittleness
https://goo.gl/5sKnZV
QuocNet:
Car Not aCar
MagnifiedDifference Bus
Not aBus
MagnifiedDifference
AlexNet:
Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint arXiv:1312.6199 (2013).
Malicious Image Attacks Reveal Brittleness:
https://goo.gl/ZB5s4Q(NYU Back Door Training)
23© 2019 Philip Koopman
Sensor data corruption experiments
ML Is Brittle To Environment Changes
Synthetic Equipment Faults
Gaussian blur
Exploring the response of a DNN to environmentalperturbations from “Robustness Testing forPerception Systems,” RIOT Project, NREC, DIST-A.
Defocus & haze area significant issue
Gaussian Blur &Gaussian Noise cause
similar failures
24© 2019 Philip Koopman
False positive on lane markingFalse negative real bicyclist
False negative whenin front of dark vehicle
False negative whenperson next to light pole
Context-Dependent Perception FailuresPerception failures are often context-dependent False positives and false negatives are both a problem
Will this pass a “vision test” for bicyclists?
25© 2019 Philip Koopman
Mask-R CNN: examples of systemic problems we found
Example Triggering Events via Hologram
“Red objects”
Notes: These are baseline, un-augmented images.(Your mileage may vary on your own trained neural network.)
“Columns”
“Camouflage”
“Sun glare”
“Bare legs”
“Children”
“Single Lane Control”
26© 2019 Philip Koopman
27© 2019 Philip Koopman
28© 2019 Philip Koopman
29© 2019 Philip Koopman
Centered on a Safety Case Credit for existing safety standards Mix & match cross-standards techniques Discourages questionable practices
“Unknowns” are first class citizens Balance between analysis & field experience Field monitoring required to feed back to argumentation Assessment findings & field data used to update standard
Plan: public standard by end of 2019 Standards Technical Panel forming Spring 2019 to review draft
UL 4600 – An Autonomy Safety Standard
30© 2019 Philip Koopman
SSS 2019, Phil Koopman, Aaron Kane, Jen Black List of pitfalls we’ve seen in autonomy safety arguments Conformance to an existing standard Proven in Use Field Testing Vehicle simulation Formal proof of correctness Other issues (e.g., unwarranted independence assumptions)
Also, SafeAI Paper on taxonomy of ODD/OEDR Koopman, P. & Fratrik, F., "How many operational design domains,
objects, and events?" SafeAI 2019, AAAI, Jan 27, 2019.
“Credible Autonomy Safety Argumentation”
31© 2019 Philip Koopman
More safety transparency Independent safety assessments Industry collaboration on safety
Minimum performance standards Share data on scenarios and obstacles Safety for on-road testing (driver & vehicle)
Autonomy software safety standards Traditional software safety … PLUS … Dealing with surprises and brittleness Data collection and feedback on field failures
Ways To Improve AV Safety
http://bit.ly/2MTbT8F (sign modified)
Mars
Thanks!