Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
New Perspective on Perception and Prediction Pipeline for Autonomous Driving
Xinshuo Weng, Kris Kitani
Robotics Institute, Carnegie Mellon University
Presented at the Workshop on Robust Video Scene Understanding: Tracking and Video Segmentation
June 25, 2021
1
2
Perception and prediction are important components in the autonomous driving stack
Standard Perception and Prediction Pipeline
3
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Forecasting
Sensor Data
Perception
Prediction
4
LiDAR RGB
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Forecasting
Sensor Data
Standard Perception and Prediction Pipeline
5
3D ObjectDetection
Sensor Data
Detection results
3D Multi-Object Tracking
Trajectory Forecasting
Standard Perception and Prediction Pipeline
6
3D ObjectDetection
Sensor Data
Tracking results
3D Multi-Object Tracking
Trajectory Forecasting
Standard Perception and Prediction Pipeline
7
3D ObjectDetection
Sensor Data
3D Multi-Object Tracking
Trajectory Forecasting
Forecasting results
Standard Perception and Prediction Pipeline
Any possible improvement at the pipeline level?
8
Lots of progress on improving each individual module
The pipeline stays the same!
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Forecasting
Sensor Data
9
Limitation of the Perception and Prediction Pipeline
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Forecasting
Sensor Data
Is this really the best place to perform prediction?
Sequential processes propagate errors (rarely evaluated in papers)
Limited evaluation based on down-stream tasks like planning and control
No consensus on how to propagate uncertainty
Requires predefined object categories
10
Any possible improvement at the pipeline level?
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Forecasting
Sensor Data
Can we do prediction here?
11
Any possible improvement at the pipeline level?
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Forecasting
Sensor DataCan we do prediction here?
12
Our recent work on new perception and prediction pipelines
1. A forecast-then-track pipeline
2. Parallelized tracking and prediction
SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting
• Track-then-forecast pipeline:• Detection -> MOT -> Trajectory Forecasting
• Forecast-then-track pipeline• Sequential Pointcloud Forecasting -> Detection -> MOT
• Differences• Invert the order of forecasting
• Forecast at the sensor level (not object positions)
13
Move forecasting upfront
Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020
SPF2 Provides Stable Object Trajectory Prediction
• Predicted point clouds preserve object information
• Equivalent to results obtained from standard pipeline, i.e., object trajectories
14Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020
• Advantages?
• Does not require human annotation for forecasting
• Point cloud data is prevalent nowadays
• The key is this new task -- Sequential Pointcloud Forecasting (SPF)
15
Performance Scaling with More Point Cloud Data
Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020
• CD: Chamfer distance (lower is better)
• EMD: Earth Mover’s Distance (lower is better)
16
Performance Scaling with More Point Cloud Data
Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020
SPF: Sequential Pointcloud Forecasting
• Goal: a sequence of past clouds -> a sequence of future clouds
• Predict the entire scene, including background
• Deal with large-scale points (1.5M) rather than 1k points
17Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020
Comparison to trajectory forecasting Comparison to point cloud generation
SPFNet• How to learn representations from large-scale point cloud sequences?
• Baseline: FC-LSTM autoencoder model
• Four modules
• Frame-wise point cloud encoder
• FC-LSTM for temporal modeling
• Frame-wise point cloud decoder
• Losses: chamfer distance
18Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020
Predicted Point Clouds Preserves Objects
19Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020
Detection on Predicted Point Clouds
20
• Green: detected 3D boxes
• Yellow: GT 3D boxes
• Detections mostly match with GT on predicted point clouds from SPFNet
Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020
Tracking on Predicted Point Clouds
21
• Color represents the object identity
• Predicted point clouds from SPFNet preserves objects
Weng et al. Inverting the Pose Forecasting Pipeline with SPF2: Sequential Pointcloud Forecasting for Sequential Pose Forecasting. CoRL 2020
22
Our recent work on new perception and prediction pipelines
1. A forecast-then-track pipeline
2. Parallelized tracking and prediction
Limitation of the Standard Pipeline
• Pipeline in a sequential order• Downstream takes the upstream outputs as inputs
• Limitation?• Errors from upstream are propagated to downstream
• Can we go beyond the sequential pipeline?
23
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Forecasting
Sensor Data
GT past trajectories
Predicted trajectories
Tracking results
Predicted trajectories
Data association
error in tracking
PTP: Parallelized Tracking and Prediction
24
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Prediction
Sensor Data
Sequential Pipeline Parallelized Tracking and Prediction (PTP)
Feature Extraction
Matching
Feature Extraction
Trajectory Decoder
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Prediction
Sensor Data
Shared Feature Learning
Matching Trajectory Decoder
Similar components, which aims to encode object features from past information
Module-specific components
X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021
Our Method Can Jointly Track and Predict
26X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021
3D tracking only Tracking and prediction
Parallelized Tracking and Prediction
• Advantages
• Reduce error propagation from tracking to prediction
• Share feature learning -> efficient and improves performance
• Overview
27
Forecasting
Predicted trajectoriesin future T frames
Edge features
DiversitysamplingNode features
GNN for featureinteraction
Detected objects in current frame
Objects trajectories inpast H frames
Last frameCurrent frame
Featureextraction
Featureextraction
3D MOThead
Trajectoryprediction
head
Shared Feature Learning3D MOT
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Prediction
Sensor Data
Shared Feature Learning
X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021
Parallelized Tracking and Prediction
• Shared feature learning• Use LSTM/MLP to learn motion features from trajectories
• Encode features from nearby objects by modeling interaction with GNNs
28
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Prediction
Sensor Data
Shared Feature Learning
X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021
Parallelized Tracking and Prediction
• 3D multi-object tracking• MLP uses edge features as inputs to regress pairwise similarity scores
• During training, estimated affinity matrix is supervised with GT
• During testing, estimated affinity matrix is fed to Hungarian algorithm
29
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Prediction
Sensor Data
Shared Feature Learning
X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021
Parallelized Tracking and Prediction
• Trajectory prediction• Conditional VAE is used to predict future trajectories
• Diversity sampling function to increase trajectory sample diversity
30
3D ObjectDetection
3D Multi-Object Tracking
Trajectory Prediction
Sensor Data
Shared Feature Learning
X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021
31
Limitation of PTP framework
1. Posterior collapse in trajectory prediction
2. Loss of information in feature encoding
Latent space
Learn a generativemodel
DataContext feature
Future trajectories
Limitation of Random Sampling in CVAE
Low sample efficiency!
Trajectory Space
Generator
32Y. Yuan, K. Kitani. Diverse Trajectory Forecasting with Determinantal Point Processes. ICLR 2020
Latent space
Diversity SamplingFunction (DSF)
Latent codes
Diversity Sampling Function
Diversity losson samples
Trajectory Space
Generator
DataContext feature
Future trajectories
33Y. Yuan, K. Kitani. Diverse Trajectory Forecasting with Determinantal Point Processes. ICLR 2020
34
Limitation of PTP framework
1. Posterior collapse in trajectory prediction
2. Loss of information in feature encoding
Limitation of the Feature Encoding
• Temporal and social information are encoded separately: LSTM + GNNs
• Loss of information in the later feature encoding stage
35X. Weng*, Y. Yuan*, K. Kitani. PTP: Parallelized Tracking and Prediction with Graph Neural Networks and Diversity Sampling. ICRA 2021
Encoding Temporal and Social Features Jointly using Transformer
• Allowing an agent's state at one time to directly interact with another agent's state at any timestamp (future or past)
36Y. Yuan, X. Weng, Y. Ou, K. Kitani. AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting. 2021
Temporal-then-Social (Standard)
Temporal and Social together (Ours)
State-of-the-art Performance on ETH/UCY/nuScenes
37Y. Yuan, X. Weng, Y. Ou, K. Kitani. AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting. 2021
ETH/UCY
nuScenes
Our paper is on arXiv!
All-Inclusive Perception Dataset: AIODrive
• Complete annotations for all mainstream perception tasks
• Comprehensive sensor modalities (SPAD, high-density long-range LiDAR)
• Diverse environmental variations (out of distribution data)
• Built with Carla, state-of-the-art high-fidelity simulator
38Weng et al. All-In-One Drive: A Comprehensive Perception Dataset with High-Density Long-Range Point Clouds. 2021
New Perspective on Perception and Prediction Pipeline for Autonomous Driving
Xinshuo Weng, Kris Kitani
Robotics Institute, Carnegie Mellon University
Presented at the Workshop on Robust Video Scene Understanding: Tracking and Video Segmentation
June 25, 2021
39