Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Vijay John, Yuquan Xu , Seiichi Mita,
Smart Vehicle Research Center
Hossein Tehrani, Tomoyoki Oishi,
Masataka Konishi, Hakusho Chin
Advanced Mobility Development
Kazuhisa Ishimaru, Sakiko Nishino
Research Division 2.0
• ADAS and Automated Driving
• World 3D Reconstruction
• 3D Deep Sensor Fusion
• Future Plan
• Conclusion
ADAS Applications are booming
• Adaptive Cruise Control (ACC)• Adaptive Front Lights (AFL)• Driver Monitoring System (DMS)• Forward Collision Warning (FCW)• Intelligent Speed Adaptation (ISA)• Lane Departure Warning (LDW)• Pedestrian Detection System (PDS)• Surround-View Cameras (SVC)• Autonomous Emergency Braking (AEB)
Vehicle Platform
Sensors Configuration
Sensor Fusion &
Perception
(360 deg Scene
Understanding)
Path Planning /
Behavior
Generation
Control
Cameras
Stereo
Laser Sensor
RADAR
Sonar
GPS/IMU
Deep Understanding of Environment
Sensors
<Stereo Vision>Far
Close
far - small shift
Shift = Disparity
close – big shift
Close
Far
Left Right
Road 3D-Object
Car Barrier
Close
Far
Calibration
<Process>
Disparity Calculation (Stereo matching)
Detection
Disparity map
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 =𝐵 . 𝑓
𝑑𝑖𝑠𝑝𝑎𝑟𝑖𝑡𝑦
Each pixel has such matching cost curve, which constructs
the Matching Cost Space
<Cost Space >
24
40
55
0
Imag
e h
eigh
t
Finding true disparity value for every pixel from Matching Cost Space
< Matching Cost >Left Image
Disparity: 55 40
Right Image
Disparity: 55 40 24
Ground Truth
Matching cost curve
Neighbors:
low matching cost
Focus on Red pixel
10
20
30
40
50
60
Dis
pa
rity
Mat
chin
g c
ost
300 400 500 6000.008
0.064
Horizontal Axis
Left Image Right Image
Viterbi algorithm can find the global optimum
Pixel
430 440 450 460 470 480 490
Mat
chin
g co
st
0.006
0.018
0.012
60
50
40
30
20
10 D
isp
arit
y
0
Viterbi node Path cost
Block Matching (wrong)
Viterbi (optimal)
Exploiting the neighbors’ matching cost can be translated into Mathematical Optimization about the Shortest Path Problem
Cost for VSLj
Cost for VSLj+1
DSL_RDLUk , k=1,…,K
Costs
Regrouping
Right down to left up Viterbi direction
Left up to right down Viterbi direction
DSL_RDLUk , k=1,…,K
Merge Costs
Average
VSLj , j=1,…,640
VSLj−1 VSLj VSLj+1 VSLj+2
Huge Networks with Parallel Optimization
< SGBM > < Proposed Method(Multi-Path Viterbi) >
Merge
Viterbi ←
→
Viterbi ↑ ↓
Viterbi ↖
↘
Viterbi ↗
↙
Cost
space
Out: disparity
Viterbi→
Cost space
+
Output: disparity
Merge
Diagonal directions …
MPV Optimize the Accumulates Information Step by Step
Independently mergeHierarchically merge
Viterbi←
Viterbi↑
Viterbi↓Viterbi
↗Viterbi
↙Viterbi
↖Viterbi
↘
< MPV >
Multi Path Viterbi Conventional method:
SGBM
Block Matching
Dense & smooth
Image Size :1280×960
Calculation Time : 15ms/FrameGPU : GeFORCE GTX 1080
Nagoya Urban Road
Tokyo Metropolitan Highway
Calculation Time :15ms/FrameGPU : GeFORCE GTX 1080
Calculation Time : 15ms/FrameGPU : GeFORCE GTX 1080
Electromagnetic Range Wave: Camera ,Laser, Radar
0.1μm 1μm 10μm 1mm100μm
Laser
Visible Light Camera
Infrared Camera
Car Radar
10mm
Car Sonar
Electromagnetic Wave
Sound Wave
Wave Length
Can See Detail
Can See Far
Fog & Dust Rain
~100/m3~100/cm3
Sun Light
Snow
Perception
(Learning
Framework)
Cameras
Stereo
Laser Sensor
Training a learning framework for perception tasksSensors
Free Space
Objects
Traditional Learning
Feature Extraction (HOG,
DPM etc)
Feature Classification
(SVM, Random Forest etc)
Deep learning
(Feature Extraction +
Feature Classification)
• Single sensor-based learning is not robust or descriptive enough
• Challenges
– Environmental Variation (occlusion, illumination variation, etc.)
– High Inter-Class and Intra-Class Variability
Perception
(Learning
Framework)
Single Sensor
Camera, Lidar, Stereo
Labels
There are many vehicle varieties with different orientations
We have a large number of On-Road Objects
We have a lot of variety of on road objects!!!!
We have the different type of road boundaries
We have a lot of variety of Free Space Boundary!!!!
Concrete CurbGuardrail
Wall
Pylon Divider
Illumination variation as observed by a monocular camera image with appearance features
• Sensor Fusion-based learning with
Complementary Sensors addresses these issues
• Monocular Camera appearance features and depth features are Complementary Features
Sensor
Fusion and
Perception
(Learning
Framework)
Cameras
Stereo
Laser Sensor
Sensor Fusion
Labels
Monocular Camera Depth Camera
Monocular Camera ⇒ Rich Appearance Information
Depth Camera ⇒ Depth Information (3D Data)
Inexpensive Stereo-based Depth Inexpensive
Illumination VariationIllumination Invariant due to robust stereo algorithm [1]
[1] Xu et al. Real-time Stereo Disparity Quality Improvement for Challenging Traffic Environments, IV 2018
Depth information from stereo camera robust to illumination variation
Appearance and Depth Features are Fused within a Deeplearning Framework for Environment Perception
Deep learning framework
Appearance(Monocular camera)
Descriptive Appearance Features
Depth (Stereo Camera/Laser)Illumination invariant
depth features
Sensor fusion with complementary features
3D Environment Perception
Image & Depth
Sensor Fusion : Raw Data Level Fusion
Image + FeatureFeature Extraction
Free Space Detection
Object Detection
Image
Depth
Image Feature Extraction
Depth Feature Extraction
Feature
Integration
Free Space Detection
Object Detection
Deep Features
Sensor Fusion : Feature Level Fusion
Good
Bad
Intensity Input
Depth Input
C1_Int
Concatenation
Concatenation
Concatenation
Concatenation Free Space Output
Object Output
C2_Int
C1_Dep C2_Dep
C2_Dep
C2_Int
US_DC1_Int
C2_Dep
C2_Dep C1_Dep
C1_Int
C1_Dep
DC1_Int
DC1_Dep
DC2_Int
DC2_Dep
Upsampling_
DC1_Dep
Upsampling_DC1_Int
Upsampling_DC2_Int
Upsampling_
DC2_Dep
Feature Map Feature Map
Feature Map
Feature Map
Feature Map
Feature Map
C2_Int
C2_Int C1_Int
Concatenation
Skip Connections
Entire Depth Encoder Feature Maps (m,n,n) are transferred to Free Space and Object Decoder Feature Maps (o,n,n) for Concatenation (m+o,n,n)
Skip Connections
Entire Intensity Encoder Feature Maps (m,n,n) are transferred to Free Space and Object Decoder Feature Maps (o,n,n) for Concatenation (m+o,n,n)
Skip Connections
Free Space
Objects
Objects
Free Space
Depth
Intensity
Depth Feature Extraction
Image Feature Extraction
FeatureIntegration
Objects Detection
FreespaceDetection
ChiNet
Intensity Image
Disparity Image
Free Space
• Trained with 9000 Samples from Japanese Highway dataset• Manually annotated free space and objects
• Trained on Keras with theano backend• Trained with Nvidia Titan X GPU
Objects
Free Space
Objects
Implemented on GeForce Titan X using Keras with Theano backend
Comparison : “Intensity” vs “Intensity and Depth”
Intensity and Disparity fusion
Wrong boundary Pylon notdetected
Car detectionNot accurate
Car notdetected
Car detectedPylon detected
Car, better detection
Accurate boundary
Intensity image only
Intensity image only
Intensity and Depth Fusion
Pylon notdetected
Pylon detected
False object
No false object
Wrong boundary
better boundary
Evaluation ResultComparison : “Intensity” vs “Intensity and Dept
Some of Learned Image Feature
Depth
Intensity Image
• Vehicle Lower Part• Free Space
• Sky• Driving Lane
• Edge• Free Space
Strong
Weak
Some of Learned Depth Features
Depth
Intensity Image
• Close Distance Objects• Close Free Space
• Edges
• Far Distance Objects• Far Free Space
Strong
Weak
Pixel Data After Mean Centering
We have different distribution even after mean centering
Day Time Day Time
Night Time
Night Time
Electromagnetic Range Wave: Camera ,Laser, Radar
0.1μm 1μm 10μm 1mm100μm
Laser
Visible Light Camera
Infrared Camera
Car Radar
10mm
Car Sonar
Electromagnetic Wave
Sound Wave
Wave Length
Can See Detail
Can See Far
Fog & Dust Rain
~100/m3
~100/cm3
Sun Light
Snow
Robust and Reliable Area
Thermal Camera
Normal Camera
Thermal Camera Normal Camera
Stability against a variety of light conditions
32 cm
Pedestrian
Pedestrian
Pedestrian
Pedestrian
RGB Camera
ThermalCamera
3D Dense Data
RGBFeature
Extraction
Depth Feature
Extraction
Feature
Integration
Free Space Detection
Object Detection
ThermalFeature
Extraction
320TOPS
Automated Driving Unit
DRIVE PX PEGASUS
Support the High Speed and Processing Requirement for Lev. 5
Process and Integration
Laser , Stereo, Sonar
Far Infrared Camera, Visible Camera
Milliwave Radar
IMU & GNSS & Map
• Sensor fusion of appearance and depth features for environment perception
• Increased robustness and perception accuracy
• ChiNet advantages
– Precise object boundary detection
– Detection of small objects in the road
– Detection of far-away objects
• Computational time
– Reduction of computational time to ~15 mspossible with optimized CUDA libraries and advances in GPU computing