38
New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani Robotics Institute, Carnegie Mellon University Presented at the Workshop on Robust Video Scene Understanding: Tracking and Video Segmentation June 25, 2021 1

New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

New Perspective on Perception and Prediction Pipeline for Autonomous Driving

Xinshuo Weng, Kris Kitani

Robotics Institute, Carnegie Mellon University

Presented at the Workshop on Robust Video Scene Understanding: Tracking and Video Segmentation

June 25, 2021

1

Page 2: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

2

Perception and prediction are important components in the autonomous driving stack

Page 3: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Standard Perception and Prediction Pipeline

3

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Perception

Prediction

Page 4: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

4

LiDAR RGB

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Standard Perception and Prediction Pipeline

Page 5: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

5

3D ObjectDetection

Sensor Data

Detection results

3D Multi-Object Tracking

Trajectory Forecasting

Standard Perception and Prediction Pipeline

Page 6: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

6

3D ObjectDetection

Sensor Data

Tracking results

3D Multi-Object Tracking

Trajectory Forecasting

Standard Perception and Prediction Pipeline

Page 7: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

7

3D ObjectDetection

Sensor Data

3D Multi-Object Tracking

Trajectory Forecasting

Forecasting results

Standard Perception and Prediction Pipeline

Page 8: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Any possible improvement at the pipeline level?

8

Lots of progress on improving each individual module

The pipeline stays the same!

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Page 9: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

9

Limitation of the Perception and Prediction Pipeline

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Is this really the best place to perform prediction?

Sequential processes propagate errors (rarely evaluated in papers)

Limited evaluation based on down-stream tasks like planning and control

No consensus on how to propagate uncertainty

Requires predefined object categories

Page 10: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

10

Any possible improvement at the pipeline level?

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

Can we do prediction here?

Page 11: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

11

Any possible improvement at the pipeline level?

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Forecasting

Sensor DataCan we do prediction here?

Page 12: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

12

Our recent work on new perception and prediction pipelines

1. A forecast-then-track pipeline

2. Parallelized tracking and prediction

Page 13: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting

• Track-then-forecast pipeline:• Detection -> MOT -> Trajectory Forecasting

• Forecast-then-track pipeline• Sequential Pointcloud Forecasting -> Detection -> MOT

• Differences• Invert the order of forecasting

• Forecast at the sensor level (not object positions)

13

Move forecasting upfront

Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020

Page 14: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

SPF2 Provides Stable Object Trajectory Prediction

• Predicted point clouds preserve object information

• Equivalent to results obtained from standard pipeline, i.e., object trajectories

14Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020

Page 15: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

• Advantages?

• Does not require human annotation for forecasting

• Point cloud data is prevalent nowadays

• The key is this new task -- Sequential Pointcloud Forecasting (SPF)

15

Performance Scaling with More Point Cloud Data

Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020

Page 16: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

• CD: Chamfer distance (lower is better)

• EMD: Earth Mover’s Distance (lower is better)

16

Performance Scaling with More Point Cloud Data

Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020

Page 17: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

SPF: Sequential Pointcloud Forecasting

• Goal: a sequence of past clouds -> a sequence of future clouds

• Predict the entire scene, including background

• Deal with large-scale points (1.5M) rather than 1k points

17Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020

Comparison to trajectory forecasting Comparison to point cloud generation

Page 18: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

SPFNet• How to learn representations from large-scale point cloud sequences?

• Baseline: FC-LSTM autoencoder model

• Four modules

• Frame-wise point cloud encoder

• FC-LSTM for temporal modeling

• Frame-wise point cloud decoder

• Losses: chamfer distance

18Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020

Page 19: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Predicted Point Clouds Preserves Objects

19Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020

Page 20: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Detection on Predicted Point Clouds

20

• Green: detected 3D boxes

• Yellow: GT 3D boxes

• Detections mostly match with GT on predicted point clouds from SPFNet

Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020

Page 21: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Tracking on Predicted Point Clouds

21

• Color represents the object identity

• Predicted point clouds from SPFNet preserves objects

Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020

Page 22: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

22

Our recent work on new perception and prediction pipelines

1. A forecast-then-track pipeline

2. Parallelized tracking and prediction

Page 23: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Limitation of the Standard Pipeline

• Pipeline in a sequential order• Downstream takes the upstream outputs as inputs

• Limitation?• Errors from upstream are propagated to downstream

• Can we go beyond the sequential pipeline?

23

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Forecasting

Sensor Data

GT past trajectories

Predicted trajectories

Tracking results

Predicted trajectories

Data association

error in tracking

Page 24: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

PTP: Parallelized Tracking and Prediction

24

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Prediction

Sensor Data

Sequential Pipeline Parallelized Tracking and Prediction (PTP)

Feature Extraction

Matching

Feature Extraction

Trajectory Decoder

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Prediction

Sensor Data

Shared Feature Learning

Matching Trajectory Decoder

Similar components, which aims to encode object features from past information

Module-specific components

X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021

Page 25: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Our Method Can Jointly Track and Predict

26X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021

3D tracking only Tracking and prediction

Page 26: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Parallelized Tracking and Prediction

• Advantages

• Reduce error propagation from tracking to prediction

• Share feature learning -> efficient and improves performance

• Overview

27

Forecasting

Predicted trajectoriesin future T frames

Edge features

DiversitysamplingNode features

GNN for featureinteraction

Detected objects in current frame

Objects trajectories inpast H frames

Last frameCurrent frame

Featureextraction

Featureextraction

3D MOThead

Trajectoryprediction

head

Shared Feature Learning3D MOT

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Prediction

Sensor Data

Shared Feature Learning

X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021

Page 27: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Parallelized Tracking and Prediction

• Shared feature learning• Use LSTM/MLP to learn motion features from trajectories

• Encode features from nearby objects by modeling interaction with GNNs

28

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Prediction

Sensor Data

Shared Feature Learning

X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021

Page 28: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Parallelized Tracking and Prediction

• 3D multi-object tracking• MLP uses edge features as inputs to regress pairwise similarity scores

• During training, estimated affinity matrix is supervised with GT

• During testing, estimated affinity matrix is fed to Hungarian algorithm

29

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Prediction

Sensor Data

Shared Feature Learning

X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021

Page 29: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Parallelized Tracking and Prediction

• Trajectory prediction• Conditional VAE is used to predict future trajectories

• Diversity sampling function to increase trajectory sample diversity

30

3D ObjectDetection

3D Multi-Object Tracking

Trajectory Prediction

Sensor Data

Shared Feature Learning

X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021

Page 30: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

31

Limitation of PTP framework

1. Posterior collapse in trajectory prediction

2. Loss of information in feature encoding

Page 31: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Latent space

Learn a generativemodel

DataContext feature

Future trajectories

Limitation of Random Sampling in CVAE

Low sample efficiency!

Trajectory Space

Generator

32Y. Yuan, K. Kitani. Diverse Trajectory Forecasting with Determinantal Point Processes. ICLR 2020

Page 32: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Latent space

Diversity SamplingFunction (DSF)

Latent codes

Diversity Sampling Function

Diversity losson samples

Trajectory Space

Generator

DataContext feature

Future trajectories

33Y. Yuan, K. Kitani. Diverse Trajectory Forecasting with Determinantal Point Processes. ICLR 2020

Page 33: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

34

Limitation of PTP framework

1. Posterior collapse in trajectory prediction

2. Loss of information in feature encoding

Page 34: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Limitation of the Feature Encoding

• Temporal and social information are encoded separately: LSTM + GNNs

• Loss of information in the later feature encoding stage

35X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021

Page 35: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

Encoding Temporal and Social Features Jointly using Transformer

• Allowing an agent's state at one time to directly interact with another agent's state at any timestamp (future or past)

36Y. Yuan, X. Weng, Y. Ou, K. Kitani. AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting. 2021

Temporal-then-Social (Standard)

Temporal and Social together (Ours)

Page 36: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

State-of-the-art Performance on ETH/UCY/nuScenes

37Y. Yuan, X. Weng, Y. Ou, K. Kitani. AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting. 2021

ETH/UCY

nuScenes

Our paper is on arXiv!

Page 37: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

All-Inclusive Perception Dataset: AIODrive

• Complete annotations for all mainstream perception tasks

• Comprehensive sensor modalities (SPAD, high-density long-range LiDAR)

• Diverse environmental variations (out of distribution data)

• Built with Carla, state-of-the-art high-fidelity simulator

38Weng et al. All-In-One Drive: A Comprehensive Perception Dataset with High-Density Long-Range Point Clouds. 2021

Page 38: New Perspective on Perception and Prediction Pipeline for ......2021/06/25  · New Perspective on Perception and Prediction Pipeline for Autonomous Driving Xinshuo Weng, Kris Kitani

New Perspective on Perception and Prediction Pipeline for Autonomous Driving

Xinshuo Weng, Kris Kitani

Robotics Institute, Carnegie Mellon University

Presented at the Workshop on Robust Video Scene Understanding: Tracking and Video Segmentation

June 25, 2021

39