"Evolving Algorithmic Requirements for Recognition and Classification in Augmented Reality," a Presentation from CogniVue

Copyright © 2014 CogniVue Corporation 1

Simon Morris /Tom Wilson CogniVue

May 29, 2014

Evolving Algorithmic Requirements for

Recognition and Classification in

Augmented Reality


• The challenge of “always-on” vision for augmented reality

in mobile devices: Power for performance.

• Marker-based augmented reality algorithm flow

• Computational loading examples for Marker-based

• “Marker-less” algorithm flow

• Computational loading estimates for Natural Feature

Tracking (NFT)

• Architectural Implications of always-on feature detection,

extraction and tracking on mobile apps processors

Outline


• IoT devices and wearables are full of various

sensors, but the most valuable sensor is missing:

Challenge of Always-On Vision

• Why?

• Heat dissipation, battery

life, cost effectiveness

• The ability to see


• Why Mobile? — Everything, but kitchen sink

• Gap between always-on vision with existing processing

technology

1. Power

• 30 to max 60 mins with 2000mAh battery

• Eyewear like Google glass has <1 hr video and no vision

2. Performance & Cost

• Expensive use of apps processor resources - need has quad core

Cortex, GPU, DSP, video codec.

• Not efficient — vision operations divided among cores

• Best case stereo VGA, need ~30x this for stereo 1080p 60 fps

Challenge of Always-On Vision


High Level AR Processing Flow

Taken from Fernandez, Orduna, Morillo (2011)

~50ms

<10ms

Latency (ms)

Acquisition Detection Rendering

iPhone4 17.66 23.3


• Markers — simply a special case of a feature

• Well defined to assist in rapid pose

calculation

• High contrast enables easier

detection

• Most markers are simple black and

white squares

• Four known points are important to allow for

subsequent distortion correction and marker

decoding.

Marker Design


• With Marker-Based AR only need low processor demand CV

functions for camera pose estimation

• Toolkits are available to build these applications (e.g. for

iOS see http://www.packtpub.com/article/marker-based-augmented-reality-

on-iPhone-or-iPad)

Marker-Based Camera Pose

Grayscale Binarization Contours Candidates Distortion Correction

Images from En-Co Software Ltd

http://www.packtpub.com/article/marker-based-augmented-reality-on-iPhone-or-iPad

















• Detection & tracking on device must be real-time: <10ms

• Ex. computational loading (Fernandez, Orduna, Morillo 2011):

• Higher performance chips needed to achieve real time

performance, but also keep significantly lower power

• CogniVue G2-APEX at 600MHz can process at <2ms @ 1MP (

real time) for mWatts

Marker-based Processor Loading

Examples


• Marker-based unsuitable in many scenarios (e.g. outdoor

tracking)

• Markerless tracking depends on natural features vs fiducial

markers.

• AR system needs to use some other tracking method

• Sensors for tracking (e.g. GPS); or

• Visual tracking method to estimate camera’s pose (camera-based

tracking, optical tracking, or natural feature tracking)

• Hybrid; e.g. GPS and MEMS gyroscope for position and visual

tracking for orientation

• Tracking and registration becomes more complex with

Natural Feature Tracking (NFT)

• Markerless AR apps with NFT will be widely adopted

Why Markerless?


• Different approaches to pose estimation in a markerless

application

• Requires Feature Detection, Extraction and Matching

How is NFT Different?

Grayscale Keypoint Detection

Feature Extraction

Feature Matching

Pose Estimation


• Interest point detectors:

• Before tracking, features or key points must be detected

• Examples: Harris Corner Detection, GFTT, FAST

• FAST has been preferred for mobile apps as it requires

less processor performance, but not necessarily best for

accuracy and precision

• Feature descriptors for matching

• SIFT, SURF, ORB, HIP

• Mobile AR also involves Pose Estimation (e.g. with RANSAC)

Feature Detection and Extraction


• SIFT is good “acid test” for

detection/extraction performance

• CPU alone experiences long

processing latency with SIFT

• An example of GPU acceleration

shows about a 5x to 10x improvement

• CogniVue G2-APEX core shows ~50x

improvement and is >100x in terms of

performance per power

• G3-APEX will provide a further 4x-8x

Mobile Performance: SIFT

Detection/Extraction

*CPU figures Adapted from Hudelist, Corarzan, Schoeffman, 2014 (corrected to VGA)

**GPU + CPU figures adapted from Rister, Wang, Wu and Cavallaro, 2013 (corrected to VGA)

***Estimate with APEX-1284 configuration @ 600MHz

Device SIFT

CPU*

iPadAir 877

iPad4 1379

iPad3 3518

iPadMini1 3586

iPad2 3515

iPhone4s 4320

iPhone5 1474

iPhone5S 1028

GPU + CPU

SnapDragon S4 404

Nexus 7 472

Galaxy Note II 528

Tegra 250 508

CogniVue ICP ***

G2-APEX ICP 8.6

G3-APEX ICP ~2


• HOG feature descriptor is very similar to SIFT (HOG inspired by SIFT)

• Similar computational complexity as a feature descriptor to SIFT

• “UncannyVision” has shown 8x improvement in HOG performance after

optimizing OpenCV code

• G2-APEX has significant performance per power advantage even over

highly optimized implementation

Impact of Optimization: HOG Example

Core and Implementation HOG + SVM VGA (ms)

Cortex A9 1.2 GHz OpenCV 2320

Cortex A9 1.2GHz Optimized 340

Cortex A15 1.2 GHz OpenCV 1265

Cortex A15 1.2 GHZ Optimized 135

G2-APEX ICP 600MHz 10.5

G3-APEX ICP 600MHz ~2.5


• Real time performance is ~ 50ms from image acquisition to

display (glasses), therefore:

Feature detection, tracking, matching <10ms and <5ms

for low power operation

• NFT processing for “always-on” mobile AR needs >100x

improvement for always-on performance/ power for >1MP

• Wearable AR applications need this performance to make

power-efficient always-on AR and vision applications

• G2-APEX ICP technology offers the necessary acceleration

with G3-APEX ICP cores to bring additional 4x-8x

Architectural Implications

Technology

"Evolving Algorithmic Requirements for Recognition and Classification in Augmented Reality," a Presentation from CogniVue