Vijay John, Yuquan Xu , Seiichi Mita, Smart Vehicle ......Vijay John, Yuquan Xu , Seiichi Mita, Smart Vehicle Research Center Hossein Tehrani, Tomoyoki Oishi, Masataka Konishi, Hakusho

Vijay John, Yuquan Xu , Seiichi Mita,

Smart Vehicle Research Center

Hossein Tehrani, Tomoyoki Oishi,

Masataka Konishi, Hakusho Chin

Advanced Mobility Development

Kazuhisa Ishimaru, Sakiko Nishino

Research Division 2.0

• ADAS and Automated Driving

• World 3D Reconstruction

• 3D Deep Sensor Fusion

• Future Plan

• Conclusion

ADAS Applications are booming

• Adaptive Cruise Control (ACC)• Adaptive Front Lights (AFL)• Driver Monitoring System (DMS)• Forward Collision Warning (FCW)• Intelligent Speed Adaptation (ISA)• Lane Departure Warning (LDW)• Pedestrian Detection System (PDS)• Surround-View Cameras (SVC)• Autonomous Emergency Braking (AEB)

Vehicle Platform

Sensors Configuration

Sensor Fusion &

Perception

(360 deg Scene

Understanding)

Path Planning /

Behavior

Generation

Control

Cameras

Stereo

Laser Sensor

RADAR

Sonar

GPS/IMU

Deep Understanding of Environment

Sensors

<Stereo Vision>Far

Close

far - small shift

Shift = Disparity

close – big shift

Close

Far

Left Right

Road 3D-Object

Car Barrier

Close

Far

Calibration

<Process>

Disparity Calculation (Stereo matching)

Detection

Disparity map

𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 =𝐵 . 𝑓

𝑑𝑖𝑠𝑝𝑎𝑟𝑖𝑡𝑦

Each pixel has such matching cost curve, which constructs

the Matching Cost Space

<Cost Space >

24

40

55

0

Imag

e h

eigh

t

Finding true disparity value for every pixel from Matching Cost Space

< Matching Cost >Left Image

Disparity: 55 40

Right Image

Disparity: 55 40 24

Ground Truth

Matching cost curve

Neighbors:

low matching cost

Focus on Red pixel

10

20

30

40

50

60

Dis

pa

rity

Mat

chin

g c

ost

300 400 500 6000.008

0.064

Horizontal Axis

Left Image Right Image

Viterbi algorithm can find the global optimum

Pixel

430 440 450 460 470 480 490

Mat

chin

g co

st

0.006

0.018

0.012

60

50

40

30

20

10 D

isp

arit

y

0

Viterbi node Path cost

Block Matching (wrong)

Viterbi (optimal)

Exploiting the neighbors’ matching cost can be translated into Mathematical Optimization about the Shortest Path Problem

Cost for VSLj

Cost for VSLj+1

DSL_RDLUk , k=1,…,K

Costs

Regrouping

Right down to left up Viterbi direction

Left up to right down Viterbi direction

DSL_RDLUk , k=1,…,K

Merge Costs

Average

VSLj , j=1,…,640

VSLj−1 VSLj VSLj+1 VSLj+2

Huge Networks with Parallel Optimization

< SGBM > < Proposed Method(Multi-Path Viterbi) >

Merge

Viterbi ←

→

Viterbi ↑ ↓

Viterbi ↖

↘

Viterbi ↗

↙

Cost

space

Out: disparity

Viterbi→

Cost space

+

Output: disparity

Merge

Diagonal directions …

MPV Optimize the Accumulates Information Step by Step

Independently mergeHierarchically merge

Viterbi←

Viterbi↑

Viterbi↓Viterbi

↗Viterbi

↙Viterbi

↖Viterbi

↘

< MPV >

Multi Path Viterbi Conventional method:

SGBM

Block Matching

Dense & smooth

Image Size ：１２８０×９６０

Calculation Time : 15ms／FrameGPU ： GeFORCE GTX 1080

Nagoya Urban Road

Tokyo Metropolitan Highway

Calculation Time :15ms／FrameGPU ： GeFORCE GTX 1080

Calculation Time : 15ms／FrameGPU ： GeFORCE GTX 1080

Electromagnetic Range Wave: Camera ,Laser, Radar

0.1μm 1μm 10μm 1ｍｍ100μm

Laser

Visible Light Camera

Infrared Camera

Car Radar

10ｍｍ

Car Sonar

Electromagnetic Wave

Sound Wave

Wave Length

Can See Detail

Can See Far

Fog & Dust Rain

~100／ｍ3~100/cｍ3

Sun Light

Snow

Perception

(Learning

Framework)

Cameras

Stereo

Laser Sensor

Training a learning framework for perception tasksSensors

Free Space

Objects

Traditional Learning

Feature Extraction (HOG,

DPM etc)

Feature Classification

(SVM, Random Forest etc)

Deep learning

(Feature Extraction +

Feature Classification)

• Single sensor-based learning is not robust or descriptive enough

• Challenges

– Environmental Variation (occlusion, illumination variation, etc.)

– High Inter-Class and Intra-Class Variability

Perception

(Learning

Framework)

Single Sensor

Camera, Lidar, Stereo

Labels

There are many vehicle varieties with different orientations

We have a large number of On-Road Objects

We have a lot of variety of on road objects!!!!

We have the different type of road boundaries

We have a lot of variety of Free Space Boundary!!!!

Concrete CurbGuardrail

Wall

Pylon Divider

Illumination variation as observed by a monocular camera image with appearance features

• Sensor Fusion-based learning with

Complementary Sensors addresses these issues

• Monocular Camera appearance features and depth features are Complementary Features

Sensor

Fusion and

Perception

(Learning

Framework)

Cameras

Stereo

Laser Sensor

Sensor Fusion

Labels

Monocular Camera Depth Camera

Monocular Camera ⇒ Rich Appearance Information

Depth Camera ⇒ Depth Information (3D Data)

Inexpensive Stereo-based Depth Inexpensive

Illumination VariationIllumination Invariant due to robust stereo algorithm [1]

[1] Xu et al. Real-time Stereo Disparity Quality Improvement for Challenging Traffic Environments, IV 2018

Depth information from stereo camera robust to illumination variation

https://www.researchgate.net/publication/325987237_Real-time_Stereo_Disparity_Quality_Improvement_for_Challenging_Traffic_Environments

Appearance and Depth Features are Fused within a Deeplearning Framework for Environment Perception

Deep learning framework

Appearance(Monocular camera)

Descriptive Appearance Features

Depth (Stereo Camera/Laser)Illumination invariant

depth features

Sensor fusion with complementary features

3D Environment Perception

Image & Depth

Sensor Fusion : Raw Data Level Fusion

Image + FeatureFeature Extraction

Free Space Detection

Object Detection

Image

Depth

Image Feature Extraction

Depth Feature Extraction

Feature

Integration


Object Detection

Deep Features

Sensor Fusion : Feature Level Fusion

Good

Bad

Intensity Input

Depth Input

C1_Int

Concatenation

Concatenation

Concatenation

Concatenation Free Space Output

Object Output

C2_Int

C1_Dep C2_Dep

C2_Dep

C2_Int

US_DC1_Int

C2_Dep

C2_Dep C1_Dep

C1_Int

C1_Dep

DC1_Int

DC1_Dep

DC2_Int

DC2_Dep

Upsampling_

DC1_Dep

Upsampling_DC1_Int

Upsampling_DC2_Int

Upsampling_

DC2_Dep

Feature Map Feature Map

Feature Map

Feature Map

Feature Map

Feature Map

C2_Int

C2_Int C1_Int

Concatenation

Skip Connections

Entire Depth Encoder Feature Maps (m,n,n) are transferred to Free Space and Object Decoder Feature Maps (o,n,n) for Concatenation (m+o,n,n)

Skip Connections

Entire Intensity Encoder Feature Maps (m,n,n) are transferred to Free Space and Object Decoder Feature Maps (o,n,n) for Concatenation (m+o,n,n)

Skip Connections

Free Space

Objects

Objects

Free Space

Depth

Intensity

Depth Feature Extraction

Image Feature Extraction

FeatureIntegration

Objects Detection

FreespaceDetection

ChiNet

Intensity Image

Disparity Image

Free Space

• Trained with 9000 Samples from Japanese Highway dataset• Manually annotated free space and objects

• Trained on Keras with theano backend• Trained with Nvidia Titan X GPU

Objects

Free Space

Objects

Implemented on GeForce Titan X using Keras with Theano backend

Comparison : “Intensity” vs “Intensity and Depth”

Intensity and Disparity fusion

Wrong boundary Pylon notdetected

Car detectionNot accurate

Car notdetected

Car detectedPylon detected

Car, better detection

Accurate boundary

Intensity image only

Intensity image only

Intensity and Depth Fusion

Pylon notdetected

Pylon detected

False object

No false object

Wrong boundary

better boundary

Evaluation ResultComparison : “Intensity” vs “Intensity and Dept

Some of Learned Image Feature

Depth

Intensity Image

• Vehicle Lower Part• Free Space

• Sky• Driving Lane

• Edge• Free Space

Strong

Weak

Some of Learned Depth Features

Depth

Intensity Image

• Close Distance Objects• Close Free Space

• Edges

• Far Distance Objects• Far Free Space

Strong

Weak

Pixel Data After Mean Centering

We have different distribution even after mean centering

Day Time Day Time

Night Time

Night Time

Electromagnetic Range Wave: Camera ,Laser, Radar

0.1μm 1μm 10μm 1ｍｍ100μm

Laser

Visible Light Camera

Infrared Camera

Car Radar

10ｍｍ

Car Sonar

Electromagnetic Wave

Sound Wave

Wave Length

Can See Detail

Can See Far

Fog & Dust Rain

~100／ｍ3

~100/cｍ3

Sun Light

Snow

Robust and Reliable Area

Thermal Camera

Normal Camera

Thermal Camera Normal Camera

Stability against a variety of light conditions

32 cm

Pedestrian

Pedestrian

Pedestrian

Pedestrian

RGB Camera

ThermalCamera

3D Dense Data

RGBFeature

Extraction

Depth Feature

Extraction

Feature

Integration


Object Detection

ThermalFeature

Extraction

320TOPS

Automated Driving Unit

DRIVE PX PEGASUS

Support the High Speed and Processing Requirement for Lev. 5

Process and Integration

Laser , Stereo, Sonar

Far Infrared Camera, Visible Camera

Milliwave Radar

IMU & GNSS & Map

• Sensor fusion of appearance and depth features for environment perception

• Increased robustness and perception accuracy

• ChiNet advantages

– Precise object boundary detection

– Detection of small objects in the road

– Detection of far-away objects

• Computational time

– Reduction of computational time to ~15 mspossible with optimized CUDA libraries and advances in GPU computing

Documents

Vijay John, Yuquan Xu , Seiichi Mita, Smart Vehicle ......Vijay John, Yuquan Xu , Seiichi Mita, Smart Vehicle Research Center Hossein Tehrani, Tomoyoki Oishi, Masataka Konishi, Hakusho