New Perspective on Perception and Prediction Pipeline for ......2021/06/25 · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

New Perspective on Perception and Prediction Pipeline for Autonomous Driving

Xinshuo Weng, Kris Kitani

Robotics Institute, Carnegie Mellon University

Presented at the Workshop on Robust Video Scene Understanding: Tracking and Video Segmentation

June 25, 2021

1

2

Perception and prediction are important components in the autonomous driving stack

Standard Perception and Prediction Pipeline

3

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Perception

Prediction

4

LiDAR RGB

3D ObjectDetection



Sensor Data


5

3D ObjectDetection

Sensor Data

Detection results




6

3D ObjectDetection

Sensor Data

Tracking results




7

3D ObjectDetection

Sensor Data



Forecasting results


Any possible improvement at the pipeline level?

8

Lots of progress on improving each individual module

The pipeline stays the same!

3D ObjectDetection



Sensor Data

9

Limitation of the Perception and Prediction Pipeline

3D ObjectDetection



Sensor Data

Is this really the best place to perform prediction?

Sequential processes propagate errors (rarely evaluated in papers)

Limited evaluation based on down-stream tasks like planning and control

No consensus on how to propagate uncertainty

Requires predefined object categories

10


3D ObjectDetection



Sensor Data

Can we do prediction here?

11


3D ObjectDetection



Sensor DataCan we do prediction here?

12

Our recent work on new perception and prediction pipelines

1. A forecast-then-track pipeline

2. Parallelized tracking and prediction

SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting

• Track-then-forecast pipeline:• Detection -> MOT -> Trajectory Forecasting

• Forecast-then-track pipeline• Sequential Pointcloud Forecasting -> Detection -> MOT

• Differences• Invert the order of forecasting

• Forecast at the sensor level (not object positions)

13

Move forecasting upfront

Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020

SPF2 Provides Stable Object Trajectory Prediction

• Predicted point clouds preserve object information

• Equivalent to results obtained from standard pipeline, i.e., object trajectories

14Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020

• Advantages?

• Does not require human annotation for forecasting

• Point cloud data is prevalent nowadays

• The key is this new task -- Sequential Pointcloud Forecasting (SPF)

15

Performance Scaling with More Point Cloud Data


• CD: Chamfer distance (lower is better)

• EMD: Earth Mover’s Distance (lower is better)

16

Performance Scaling with More Point Cloud Data


SPF: Sequential Pointcloud Forecasting

• Goal: a sequence of past clouds -> a sequence of future clouds

• Predict the entire scene, including background

• Deal with large-scale points (1.5M) rather than 1k points


Comparison to trajectory forecasting Comparison to point cloud generation

SPFNet• How to learn representations from large-scale point cloud sequences?

• Baseline: FC-LSTM autoencoder model

• Four modules

• Frame-wise point cloud encoder

• FC-LSTM for temporal modeling

• Frame-wise point cloud decoder

• Losses: chamfer distance


Predicted Point Clouds Preserves Objects


Detection on Predicted Point Clouds

20

• Green: detected 3D boxes

• Yellow: GT 3D boxes

• Detections mostly match with GT on predicted point clouds from SPFNet


Tracking on Predicted Point Clouds

21

• Color represents the object identity

• Predicted point clouds from SPFNet preserves objects


22

Our recent work on new perception and prediction pipelines

1. A forecast-then-track pipeline

2. Parallelized tracking and prediction

Limitation of the Standard Pipeline

• Pipeline in a sequential order• Downstream takes the upstream outputs as inputs

• Limitation?• Errors from upstream are propagated to downstream

• Can we go beyond the sequential pipeline?

23

3D ObjectDetection



Sensor Data

GT past trajectories

Predicted trajectories

Tracking results

Predicted trajectories

Data association

error in tracking

PTP: Parallelized Tracking and Prediction

24

3D ObjectDetection


Trajectory Prediction

Sensor Data

Sequential Pipeline Parallelized Tracking and Prediction (PTP)

Feature Extraction

Matching

Feature Extraction

Trajectory Decoder

3D ObjectDetection



Sensor Data

Shared Feature Learning

Matching Trajectory Decoder

Similar components, which aims to encode object features from past information

Module-specific components

X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021

Our Method Can Jointly Track and Predict

26X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021

3D tracking only Tracking and prediction

Parallelized Tracking and Prediction

• Advantages

• Reduce error propagation from tracking to prediction

• Share feature learning -> efficient and improves performance

• Overview

27

Forecasting

Predicted trajectoriesin future T frames

Edge features

DiversitysamplingNode features

GNN for featureinteraction

Detected objects in current frame

Objects trajectories inpast H frames

Last frameCurrent frame

Featureextraction

Featureextraction

3D MOThead

Trajectoryprediction

head

Shared Feature Learning3D MOT

3D ObjectDetection



Sensor Data




• Shared feature learning• Use LSTM/MLP to learn motion features from trajectories

• Encode features from nearby objects by modeling interaction with GNNs

28

3D ObjectDetection



Sensor Data




• 3D multi-object tracking• MLP uses edge features as inputs to regress pairwise similarity scores

• During training, estimated affinity matrix is supervised with GT

• During testing, estimated affinity matrix is fed to Hungarian algorithm

29

3D ObjectDetection



Sensor Data




• Trajectory prediction• Conditional VAE is used to predict future trajectories

• Diversity sampling function to increase trajectory sample diversity

30

3D ObjectDetection



Sensor Data



31

Limitation of PTP framework

1. Posterior collapse in trajectory prediction

2. Loss of information in feature encoding

Latent space

Learn a generativemodel

DataContext feature

Future trajectories

Limitation of Random Sampling in CVAE

Low sample efficiency!

Trajectory Space

Generator

32Y. Yuan, K. Kitani. Diverse Trajectory Forecasting with Determinantal Point Processes. ICLR 2020

Latent space

Diversity SamplingFunction (DSF)

Latent codes

Diversity Sampling Function

Diversity losson samples

Trajectory Space

Generator

DataContext feature

Future trajectories

33Y. Yuan, K. Kitani. Diverse Trajectory Forecasting with Determinantal Point Processes. ICLR 2020

34

Limitation of PTP framework

1. Posterior collapse in trajectory prediction

2. Loss of information in feature encoding

Limitation of the Feature Encoding

• Temporal and social information are encoded separately: LSTM + GNNs

• Loss of information in the later feature encoding stage

35X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021

Encoding Temporal and Social Features Jointly using Transformer

• Allowing an agent's state at one time to directly interact with another agent's state at any timestamp (future or past)

36Y. Yuan, X. Weng, Y. Ou, K. Kitani. AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting. 2021

Temporal-then-Social (Standard)

Temporal and Social together (Ours)

State-of-the-art Performance on ETH/UCY/nuScenes

37Y. Yuan, X. Weng, Y. Ou, K. Kitani. AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting. 2021

ETH/UCY

nuScenes

Our paper is on arXiv!

All-Inclusive Perception Dataset: AIODrive

• Complete annotations for all mainstream perception tasks

• Comprehensive sensor modalities (SPAD, high-density long-range LiDAR)

• Diverse environmental variations (out of distribution data)

• Built with Carla, state-of-the-art high-fidelity simulator

38Weng et al. All-In-One Drive: A Comprehensive Perception Dataset with High-Density Long-Range Point Clouds. 2021

New Perspective on Perception and Prediction Pipeline for Autonomous Driving

Xinshuo Weng, Kris Kitani

Robotics Institute, Carnegie Mellon University

Presented at the Workshop on Robust Video Scene Understanding: Tracking and Video Segmentation

June 25, 2021

39

Documents

New Perspective on Perception and Prediction Pipeline for ......2021/06/25 · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani