Upload
embedded-vision-alliance
View
1.053
Download
0
Embed Size (px)
Citation preview
1
High-resolution 3D Reconstruction
on a Mobile Processor Michael Mangan
Senior Product Manager
Qualcomm Technologies, Inc.
May 3, 2016
2
30
years of driving the evolution of wireless
#1
in 3G/4G LTE modem
#1
in RF
Source: Qualcomm Incorporated data. Currently, Qualcomm semiconductors are products of Qualcomm Technologies, Inc. or its subsidiaries
IHS, Jan. ’16 (RF); Strategy Analytics, Dec. ’15 (modem, AP)
3
Qualcomm® Snapdragon™ Chipsets drive new experiences
Context aware computing
Machine learning
Computing performance
VR / AR - beyond small screen
360 degree camera
3D and low-light photography
Security
Biometric sensor
Virtual SIM/Multiple devices
Ultra HD VoLTE / audio quality
4G+
Wi-Fi
Superior converged connectivity
Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.
Gaming
4
What is Active Depth Capture?
Depth provides z-dimension to a scene; a photograph provides only x-y information.
Two ways to capture depth information from a scene or object:
Passive Depth Capture:
(No IR Transmitter)
• Stereo RGB cameras can passively
generate a depth map of a scene.
• Baseline separation between the cameras
causes parallax between the two received
images.
• Parallax can be used to infer a disparity
estimate, which in turn is used to
generate a depth map.
Active Depth Capture:
(IR Transmitter)
• IR laser transmits, various
techniques are used to infer depth
from the reflected laser.
» Time of Flight
» Active Stereo
» Structured Light
5
Depth from Structured Light— Technology Overview
Depth information is generated
using a structured light sensor
• Coded pattern is projected onto the scene
using near infrared (NIR) light
• NIR camera receives the reflected,
distorted pattern
• Codes in the received image are matched
against known codes in the transmitted
pattern
• Depth at each code location estimated
from the disparity between original and
received code positions, leading to
a dense depth map
NIR image
Depth map
coded pattern
transmitter receiver
6
Scanner Flow in Action
3DR_Demo.mp4
7
Scanner Block Diagram
Scan
Starts
Color + Depth
(Structure Light Depth
Based Generation)
Live 3D
Renderer/Viewer
USER MOVES USER STOPS
Scan
Finishes
USE CASE:
3D Printing, Social
Networking, Gaming
Avatars, etc.
Computer Vision Based
Initial Pose Estimation
Inertial Motion
Sensor Fusion
Bundle
Adjustment HD Texture
Generation
3D Mesh
Generation
Color
Correction
TR
AC
KIN
G /
AL
IGN
ME
NT
8
Scanner System Architecture
3D Scanner Application
RGBD Image Grabber
Camera 2 API Depth JNI 3D Scanner JNI
Depth Engine
(DSP/HVX)
RGB
Grabber
NIR
Grabber
3D Scanner Engine
(CPU/GPU)
SysFS Camera HAL Camera HAL
Raw
RGB Data
Raw
NIR Data Driver
Laser NIR Camera RGB
Camera Active Sensing Module
Note: Arrows indicate
dependency, not dataflow
Ap
ps (J
ava
) M
idd
lew
are
(C+
+)
Driv
ers
(C)
Ha
rdw
are
9
3DR Workload Summary— Running on Snapdragon 820
3D Reconstruction requires running
several computational demanding
processes simultaneously:
1. Camera Pose Tracking
2. Sensor Fusion
3. Bundle Adjustment
4. Rendering
5. Mesh Generation
6. Texture Mapping
7. Structured Light Sensor Decoding
Thanks to the heterogeneous computational
framework of the Snapdragon 820, we are able
to do all of this at 15 FPS:
Cryo—CPU/Neon: • Pose Tracking
• Bundle Adjustment
• Sensor Fusion
• Mesh Generation
Adreno—GPU: • Rendering
• Texture Mapping
Hexagon—DSP/HVX: • Depth from Structured
Light
3DR powered by
Snapdragon 820
Spectra ISP: • RGB sensor processing
• Depth sensor interface
10
Highest quality 3DR requires
great HW & SW. Efficient CV
SW algorithms, operating with
accurate depth sensors, &
power efficient processors,
bring commercial grade 3DR
to mobile platforms.
Lessons Learned
Running 3DR on mobile
requires tuning algorithms for
power as well as performance.
Power efficient heterogeneous
processors are mandatory for
3DR to run within mobile power
and thermal envelopes.
The heterogeneous
processing cores on
Snapdragon 820, enable a
high-quality, 3DR experience
on mobile platforms.
11
3DR Algorithmic Details
12
Scanner Block Diagram
Scan
Starts
Color + Depth
(Structure Light Depth
Based Generation)
Live 3D
Renderer/Viewer
USER MOVES USER STOPS
Scan
Finishes
USE CASE:
3D Printing, Social
Networking, Gaming
Avatars, etc.
Computer Vision Based
Initial Pose Estimation
Inertial Motion
Sensor Fusion
Bundle
Adjustment HD Texture
Generation
3D Mesh
Generation
Color
Correction
TR
AC
KIN
G /
AL
IGN
ME
NT
13
Based on the Iterative Closest Point (ICP) Concept, minimize the sum of pixel
intensity differences (errors) and the sum of depth errors to align Images
𝑐𝑜𝑠𝑡 = 𝑃𝑖𝑥𝑒𝑙 𝐼𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 𝐸𝑟𝑟𝑜𝑟 2 + 𝜆 𝑃𝑖𝑥𝑒𝑙 𝐷𝑒𝑝𝑡ℎ 𝐸𝑟𝑟𝑜𝑟 2
Pixel Intensity Error Depth Error
• F. Steinbruecker,et al., “Real-Time Visual Odometry from Dense RGB-D Images”, ICCV 2011
• C. Kerl et al., “Dense Continuous-Time Tracking and Mapping with Rolling Shutter RGB-D Cameras”, ICCV 2015
Computer Vision Based
Pose Estimation (6-DOF)
14
Flow
Reference Image
Current Image
Warp
subtract
Repeat to
Minimize Error
– =
Warped Image Error Image
Computer Vision Based
Pose Estimation (6-DOF)
15
Example
Computer Vision Based
Pose Estimation (6-DOF)
Track.mp4
16
The Vision Pose will likely contain some errors. • One example is lack of geometrical and textural structures
This can be overcome by fusing the vision pose with the Inertial Motion Unit (IMU) of the tablet
Using The Extended Kalman Filter (EKF) concept, one can predict poses from the IMU.
These are then fused in the update step of EKF to obtain the fused pose estimate
Motion Sensor Fusion
• M. Li et al., “3-D motion estimation and online temporal calibration for camera-IMU systems”, ICRA 2013
• S. Weiss et al., “Real-Time Metric State Estimation for Modular Vision-Inertial Systems. in IEEE International Conference on Robotics and Automation ”, ICRA 2011
Extended
Kalman Filter
(Predict)
Vision Based
Pose
Estimation
Extended
Kalman Filter
(Update)
Gyro
Accelerometer
17
Fused Poses need to be refined in order
to reduce the visual errors. • Reason: Poses are being computed locally,
“between consecutive frames”
We use bundle adjustment to find optimal
global or semi-global poses • Construct links (red lines) between captured frames
(blue nodes). Links are established if the re-projection
between captured images is above a certain threshold
• Jointly optimize the connected nodes
Bundle Adjustment
• V. Indelman et al., “Incremental Light Bundle Adjustment for Robotics Navigation”, IROS 2013
• R. Newcombe et al., “KinectFusion: Real-Time Dense Surface Mapping and Tracking”, IEEE ISMAR 2011
• K. Konolige et al., “FrameSLAM: from Bundle Adjustment to Realtime Visual Mappping”. IEEE Transactions on Robotics 2008
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-0.2
0
0.2
0.4
0.6
0.8
1
1.2
18
Having computed the 3D points, we need to generate the 3D surface mesh that best describes the
scene while reducing the noise
Many methods are available in the literature for surface reconstruction: Moving Least Squares
(MLS), TSDF & Poisson. Any can be used in theory. TSDF is the least computational demanding,
MLS and Poisson are more demanding
These are then followed by the marching cubes concept to generate the mesh
Surface Reconstruction / Mesh Generation
• S. Fleischmann et al., “Robust Moving Least-squares Fitting with Sharp Features”, ACM SIGGRAPH 2005
• M. Kazdan et al., “Poisson Surface Reconstruction”, Symposium on Geometry Processing 2006
• R. Newcombe et al., “KinectFusion: Real-Time Dense Surface Mapping and Tracking”, IEEE ISMAR 2011
19
Captured color images can suffer from casting due to many reasons like different lighting
sources. We need to correct that so that the overall color of the 3D model is in harmony
Solution: Estimate Color Casts & Remove them • Gray points provide best estimate about color
• Estimate gray pixels & shift the appropriate channel gain to bring them to neutral gray
• Repeat until convergence
Color Correction
• J. Huo et al., ‘”Robust Automatic White Balance Algorithm Using Gray Color Points in Images”, IEEE Trans. Consumer Electronics, 2006
BE
FO
RE
AF
TE
R
20
The captured images need to be joined in one or more images called Texture Maps
Texture mapping can be thought of as “3D stitching of the images on the 3D model”
Obtaining the Texture Map consists in general of two steps:
• Determine where to put the pixels on a 3D model (texture coordinates)
• Determine what is the color of the pixel given a sequence of input images
Texture Mapping
• P. Debevec et al., “Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping”, Eurographics Rendering Workshop 1998
• M. Waechter et al., “Let There Be Color! Large-Scale Texturing of 3D Reconstructions”, ECCV 2015
Input Camera Images Output Texture Map Colored 3D Model
Using the Texture Map
Rabbit.mp4
21
Some 3DR Examples
22
Using our system we can scan
a small toy, human face/body
or an object
All of this can happen easily
on the Snapdragon 820, thanks
to its powerful heterogeneous
computational framework
Some Results
Sairam.mp4 Suzy.mp4 Printer.mp4 Bunny.mp4
Thank you
Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2016 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Why Wait is a trademark of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable.Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT.
23