42
Video-Based In Situ Tagging on Mobile Phones Wonwoo Lee, Youngmin Park, Vincent Lepetit, Woontack Woo IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011

Video-Based In Situ Tagging on Mobile Phones

  • Upload
    lluvia

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Wonwoo Lee, Youngmin Park, Vincent Lepetit , Woontack Woo IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011. Video-Based In Situ Tagging on Mobile Phones. Outline. Introduction Online Target Learning Detection and Tracking - PowerPoint PPT Presentation

Citation preview

Page 1: Video-Based  In Situ  Tagging on Mobile Phones

Video-Based In Situ Tagging on Mobile Phones

Wonwoo Lee, Youngmin Park, Vincent Lepetit, Woontack Woo

IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011

Page 2: Video-Based  In Situ  Tagging on Mobile Phones

Outline

Introduction Online Target Learning Detection and Tracking Experimental Results Conclusion

Page 3: Video-Based  In Situ  Tagging on Mobile Phones

Introduction

Objective : Augment a real-world scene with minimal user intervention on a mobile phone.

“Anywhere Augmentation” Considerations:

Avoid reconstruction of 3D scene Perspective patch recognition Mobile phone processing power Mobile phone accelerometers Mobile phone Bluetooth connectivity

http://www.youtube.com/watch?v=Hg20kmM8R1A

Page 4: Video-Based  In Situ  Tagging on Mobile Phones

Introduction

The proposed method follows a standard procedure of target learning and detection.

Input Image

Online Learning

Real-time Detection

Page 5: Video-Based  In Situ  Tagging on Mobile Phones

Introduction

The proposed method follows a standard procedure of target learning and detection

Input Image

Online Learning

Real-time Detection

Page 6: Video-Based  In Situ  Tagging on Mobile Phones

Online Target Learning

Input: Image of the target plane Output: Patch data and camera poses

Assumptions Known camera parameters Horizontal or vertical surface

Page 7: Video-Based  In Situ  Tagging on Mobile Phones

Online Target Learning

Input Image

Frontal View Generation

Blurred Patch Generation

Post-processing

Input Image

Frontal View Generation

Blurred Patch Generation

Post-processing

Page 8: Video-Based  In Situ  Tagging on Mobile Phones

Frontal View Generation

We need a frontal view to create the patch data and their associated poses.

Targets whose frontal views are available.

Page 9: Video-Based  In Situ  Tagging on Mobile Phones

Frontal View Generation

However, frontal views are not always available in the real world.

Targets whose frontal views are NOT available.

Page 10: Video-Based  In Situ  Tagging on Mobile Phones

Frontal View Generation

Objective : Fronto-parallel view image from the input image.

Approach : Exploit the phone’s built-in accelerometer.

Assumption : Patch is on horizontal or vertical surface.

Page 11: Video-Based  In Situ  Tagging on Mobile Phones

Frontal View Generation

The orientation of a target (H / V) is recommended based on the current pose of the phone.

Vertical

π/4

-π/4

Parallel to Ground

G (detected by acceleromaeter)

Horizontal

Horizontal

Page 12: Video-Based  In Situ  Tagging on Mobile Phones

Frontal View Generation

Under the 1 degree of freedom assumption Frontal view camera: [I|0] Captured view camera: [R|c]

T = -Rc

• Function to warp image to virtual frontal view. [12]

[12] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge, U.K.: Cambridge Univ. Press, 2000.

Page 13: Video-Based  In Situ  Tagging on Mobile Phones

Frontal View Generation

Page 14: Video-Based  In Situ  Tagging on Mobile Phones

Online Target Learning

Input Image

Frontal View Generation

Blurred Patch Generation

Post-processing

Page 15: Video-Based  In Situ  Tagging on Mobile Phones

Blurred Patch Generation

Objective: Learn the appearances of a target surface fast.

Approach : Adopt the approach of patch learning in ”Gepard” [6]

Real-time learning of a patch on the desktop computer.

[6] S. Hinterstoisser, V. Lepetit, S. Benhimane, P. Fua, and N. Navab,“Learning real-time perspective patch rectification,” Int. J. Comput. Vis.,vol. 91, pp. 107–130, Jan. 2011.

Page 16: Video-Based  In Situ  Tagging on Mobile Phones

Review: Gepard[6]

Fast patch learning by linearizing image warping with principal component analysis.

“Mean patch” as a patch descriptor. Difficult to directly apply to mobile phone

platform. Low performance of mobile phone CPU Large amount of pre-computed data is required

(about 90MB)

Page 17: Video-Based  In Situ  Tagging on Mobile Phones

Modified Gepard[6]

Remove need for fronto-parallel view Using phone’s accelerometers and limiting to 2 planes

Skip the Feature Point Detection step Instead use larger patches for robustness

Replace how templates are constructed By blurring instead

Added Bluetooth sharing of AR configuration

Page 18: Video-Based  In Situ  Tagging on Mobile Phones

Blurred Patch Generation

Approach: Use blurred patch instead of mean patch

Page 19: Video-Based  In Situ  Tagging on Mobile Phones

Blurred Patch Generation

Generate blurred patches through multi-pass rendering in a GPU. Faster image processing through a GPU’s

parallelism.

Page 20: Video-Based  In Situ  Tagging on Mobile Phones

Blurred Patch Generation

1st Pass: Warping Render the input patch from a certain viewpoint Much faster than on CPU

Page 21: Video-Based  In Situ  Tagging on Mobile Phones

Blurred Patch Generation

2nd Pass: Radial blurring to the warped patch Allow the blurred patch covers a range of poses

close to the exact pose

Page 22: Video-Based  In Situ  Tagging on Mobile Phones

Blurred Patch Generation

3rd Pass: Gaussian blurring to the radial-blurred patch Make the blurred patch robust to image noise

Page 23: Video-Based  In Situ  Tagging on Mobile Phones

Blurred Patch Generation

• Fig. 7. Effectiveness of radial blur. Combining the radial blur and the Gaussian blur outperforms simple Gaussian blurring.

Page 24: Video-Based  In Situ  Tagging on Mobile Phones

Blurred Patch Generation

4th Pass: Accumulation of blurred patches in a texture unit. Reduce the number of readback from GPU

memory to CPU memory

Page 25: Video-Based  In Situ  Tagging on Mobile Phones

Online Target Learning

Input Image

Frontal View Generation

Blurred Patch Generation

Post-processing

Page 26: Video-Based  In Situ  Tagging on Mobile Phones

Post-Processing

Downsampling blurred patches (128x128) to (32x32)

Normalization Zero mean and Standard Deviation of 1

Page 27: Video-Based  In Situ  Tagging on Mobile Phones

Detection & Tracking

User points the target through the camera.

Square patch at the center of the image is used for detection.

Page 28: Video-Based  In Situ  Tagging on Mobile Phones

Detection & Tracking

Initial pose is retrieved by comparing the input patch with the learned mean patches.

ESM-Blur[20] is applied for further pose refinement.

NEON instructions are used for faster pose refinement.

[20] Y. Park, V. Lepetit, and W. Woo, “ESM-blur: Handling and rendering blur in 3D tracking and augmentation,” in Proc. Int. Symp. Mixed Augment. Reality, 2009, pp. 163–166.

Page 29: Video-Based  In Situ  Tagging on Mobile Phones

Experimental Results

Patch size: 128 x 128 Number of views used for learning: 225 Maximum radial blur range: 10 degrees Gaussian blur kernel: 11x11 Memory requirement: 900 KB for a target

Page 30: Video-Based  In Situ  Tagging on Mobile Phones

Experimental Results

Page 31: Video-Based  In Situ  Tagging on Mobile Phones

Experimental Results

Page 32: Video-Based  In Situ  Tagging on Mobile Phones

Experimental Results

Page 33: Video-Based  In Situ  Tagging on Mobile Phones

Experimental Results

Page 34: Video-Based  In Situ  Tagging on Mobile Phones

Experimental Results

iPhone3GS / 4 PC

CPU 600MHz / 1GHz Intel QuadCore 2.4 GHz

GPU PowerVR SGX 535 GeForce 8800 GTX

Renderer OpenGL ES 2.0 OpenGL 2.0

Video 480x360 640x480

Page 35: Video-Based  In Situ  Tagging on Mobile Phones

Experimental Results

More views, more rendering. Slow radial blur due on the mobile phone. Possible speed improvement through shader

optimization.

PC iPhone 3GS iPhone 4

Page 36: Video-Based  In Situ  Tagging on Mobile Phones

Experimental Results Comparison with Gepard[6]

[6] S. Hinterstoisser, V. Lepetit, S. Benhimane, P. Fua, and N. Navab,“Learning real-time perspective patch rectification,” Int. J. Comput. Vis.,vol. 91, pp. 107–130, Jan. 2011.

Fig. 11. Planar targets used for evaluation. (a) Sign-1. (b) Sign-2. (c) Car. (d) Wall. (e) City. (f) Cafe. (g) Book. (h) Grass. (i) MacMini. (j) Board. The patches delimited by the yellow squares are used as a reference patch.

Page 37: Video-Based  In Situ  Tagging on Mobile Phones

Experimental Results

Our approach performs slightly worse in terms of recognition rates, but it is better adapted to mobile phones.

Our approach performs slightly worse in terms of recognition rates, but it is better adapted to mobile phones.

Page 38: Video-Based  In Situ  Tagging on Mobile Phones

Experimental Results

The mean patches comparison takes about 3ms with 225 views.

The speed of pose estimation and tracking with ESM-Blur depend on the accuracy of the initial pose provided by patch detection.

Page 39: Video-Based  In Situ  Tagging on Mobile Phones

Limitations

Weak to repetitive textures and reflective surfaces.

Currently single target only.

Page 40: Video-Based  In Situ  Tagging on Mobile Phones

Conclusion

Potential applications AR tagging on the real world AR apps “anywhere anytime”

Future work More optimization on mobile phones Detection of multiple targets at the same time

Page 41: Video-Based  In Situ  Tagging on Mobile Phones

Video

http://www.youtube.com/watch?v=DLegclJVa0E

Page 42: Video-Based  In Situ  Tagging on Mobile Phones

~Thank you for your listening~