51
© Copyright National University of Singapore. All Rights Reserved. AUTOMATED LINK GENERATION FOR SENSOR-ENRICHED SMARTPHONE IMAGES IBM Interview talk, Sept 29 th 2015, Singapore Presenter: Seshadri Padmanabha Venkatagiri Advisor: Prof. Chan Mun Choon Collaborating Advisor: Prof. Ooi Wei Tsang

© Copyright National University of Singapore. All Rights Reserved. AUTOMATED LINK GENERATION FOR SENSOR- ENRICHED SMARTPHONE IMAGES IBM Interview talk,

Embed Size (px)

Citation preview

© Copyright National University of Singapore. All Rights Reserved.

AUTOMATED LINK GENERATION FOR SENSOR-ENRICHED SMARTPHONE IMAGES

IBM Interview talk, Sept 29th 2015, SingaporePresenter: Seshadri Padmanabha VenkatagiriAdvisor: Prof. Chan Mun ChoonCollaborating Advisor: Prof. Ooi Wei Tsang

© Copyright National University of Singapore. All Rights Reserved.

MOBILE USER GENERATED CONTENT: PHOTO

SLIDE: 2

More than 1.8 Billion photo uploads in 2014 till date

Source: Kleiner Perkins Caufield & Byers, Internet Trends 2014

© Copyright National University of Singapore. All Rights Reserved.

ADHOC EVENTS: IMPORTANT SOURCE OF MOBILE UGC

SLIDE: 3

Exhibitions

Street Performances

Mishaps

Social events

Well attended

Attendees share same event context

Lack of prior information. Eg. Training data

Lack of planned infrastructure. Eg. GPS, camera deployments etc.

© Copyright National University of Singapore. All Rights Reserved.

OBJECTIVE OF AUTOLINK

SLIDE: 4

To organize noisy unstructured photo collections to improve user interaction

© Copyright National University of Singapore. All Rights Reserved.

APPLICATIONS OF STRUCTURED PHOTO COLLECTIONS

SLIDE: 5

Content Analytics/Discovery

Crowd-sourced surveillance

Photograph this! Scene recommendation

© Copyright National University of Singapore. All Rights Reserved.

CHALLENGE 1: NOISE IN CONTENT

SLIDE: 6

occlusion

Varying lighting conditions

Diverse views

Diverse regions of interest

Diversity in moments captured

Scene-localization issues

Redundancy

© Copyright National University of Singapore. All Rights Reserved.

CHALLENGE 2: RESOURCE BOTTLENECK

SLIDE: 7

Resource Bottleneck

UGC from Ad-hoc events

Exhibitions

Street Performances

Mishaps

Battery Bandwidth

Applications

Photograph this!

Content Analytics/Discovery

Scene recommendation

Surveillance

© Copyright National University of Singapore. All Rights Reserved.

OUR APPROACH: DETAIL-ON-DEMAND PARADIGM

SLIDE: 8

Does not overwhelm users with lots of content. Suited for small-factor devices.

Retrieves specific content, on-demand. Consumes less bandwidth and power

Organizing content by providing progressively detailed content, helps analytics and elicits user interest

© Copyright National University of Singapore. All Rights Reserved.

DETAIL-ON-DEMAND PARADIGM: EXISTING SOLUTIONS

SLIDE: 9

Hyperlinking Provides progressive content Could be adapted to link content from multiple

sources

Zooming Provides progressively detailed content Not suitable for relating content from multiple

sources

WE CHOOSE HYPERLINK BASED DETAIL-ON-DEMAND TO PROVIDE PROGRESSIVELY DETAILED CONTENT

WE CHOOSE HYPERLINK BASED DETAIL-ON-DEMAND TO PROVIDE PROGRESSIVELY DETAILED CONTENT

© Copyright National University of Singapore. All Rights Reserved.

COMPARISON WITH STATE-OF-THE-ART

SLIDE: 10

Technique Automated

Prior information (Eg. Training, visual dictionary)

Camera Calibration

Distinguishes “same” from “similar” scenes

Requires GPS

Hyper-Hitchcock No No No Yes No

• Photo Tourism (Photo Synth)

• Google Street View

Yes No Yes No Yes

ImageWebs Yes Yes Yes No No

Geo-tagging Yes No No No Yes

AutoLink Yes No No Yes No

Manual techniqueManual technique

© Copyright National University of Singapore. All Rights Reserved.

Technique Automated

Prior information (Eg. Training, visual dictionary)

Camera Calibration

Distinguishes “same” from “similar” scenes

Requires GPS

Hyper-Hitchcock No No No Yes No

• Photo Tourism (Photo Synth)

• Google Street View

Yes No Yes No Yes

ImageWebs Yes Yes Yes No No

Geo-tagging Yes No No No Yes

AutoLink Yes No No Yes No

COMPARISON WITH STATE-OF-THE-ART

SLIDE: 11

GPS Error/Non-availability, Camera calibration difficult in ad-hoc events

GPS Error/Non-availability, Camera calibration difficult in ad-hoc events

© Copyright National University of Singapore. All Rights Reserved.

COMPARISON WITH STATE-OF-THE-ART

SLIDE: 12

Technique Automated

Prior information (Eg. Training, visual dictionary)

Camera Calibration

Distinguishes “same” from “similar” scenes

Requires GPS

Hyper-Hitchcock No No No Yes No

• Photo Tourism (Photo Synth)

• Google Street View

Yes No Yes No Yes

ImageWebs Yes Yes Yes No No

Geo-tagging Yes No No No Yes

AutoLink Yes No No Yes No

Visual vocabulary not always available in ad-hoc eventsVisual vocabulary not always available in ad-hoc events

© Copyright National University of Singapore. All Rights Reserved.

COMPARISON WITH STATE-OF-THE-ART

SLIDE: 13

Technique Automated

Prior information (Eg. Training, visual dictionary)

Camera Calibration

Distinguishes “same” from “similar” scenes

Specialized equipment (Eg. GPS, laser range finding)

Hyper-Hitchcock No No No Yes No

• Photo Tourism (Photo Synth)

• Google street view

Yes No Yes No Yes

ImageWebs Yes Yes Yes No No

Geo-tagging Yes No No No Yes

AutoLink Yes No No Yes NoGPS Error/Non-availability GPS Error/Non-availability

© Copyright National University of Singapore. All Rights Reserved.

Technique Automated

Prior information (Eg. Training, visual dictionary)

Camera Calibration

Distinguishes “same” from “similar” scenes

Requires GPS

Hyper-Hitchcock No No No Yes No

• Photo Tourism (Photo Synth)

• Google Street View

Yes No Yes No Yes

ImageWebs Yes Yes Yes No No

Geo-tagging Yes No No No Yes

AutoLink Yes No No Yes No

COMPARISON WITH STATE-OF-THE-ART

SLIDE: 14

Avoids GPS, Visual vocabulary and camera calibrationAvoids GPS, Visual vocabulary and camera calibration

© Copyright National University of Singapore. All Rights Reserved.

APPLICATION CONTEXT

SLIDE: 15

Multiple scenes

Located indoor/outdoor or both

People move between scenes and capture photos

Capture photos with different orientations, and regions of interest

© Copyright National University of Singapore. All Rights Reserved.

PROBLEM

SLIDE: 16

Image collection

Inertial Sensor log

Set of scenes user is interested in

?

Detail-on-demand Image hierarchy

High Context

High Detail

© Copyright National University of Singapore. All Rights Reserved.

ARCHITECTURE

SLIDE: 17

AutoLink Server

Mobile Wireless Network

AutoLinkClient

1

2

Photos uploaded on-demand by smart phones

Client uploads Metadata in the form of sensor log and content characteristics extracted from photo Servers runs AutoLink and performs inter-scene and intra-scene clustering. Users could request photos by navigating through these links.2

1

Image content

hierarchy

High Context

High Detail

© Copyright National University of Singapore. All Rights Reserved.

AUTOLINK OUTLINE

SLIDE: 18

Image features

(4) Sensor-assisted hybrid scene classification

(6) Hierarchy creation

(5) Region estimationPhoto

Scene Region

(3) User selected scenes

AngleTime Step

(2) Sensor log

(1) Capture photoAutoLink

© Copyright National University of Singapore. All Rights Reserved.

STEP 1: RATIONALE OF SCENE CLASSIFICATION

SLIDE: 19

Order of reliability:

Use content features to find matching scene for an image.

If content does not provide a match, use sensors and already-labeled images, to improve the match possibility.

© Copyright National University of Singapore. All Rights Reserved.

STEP 1: SCENE CLASSIFICATION

SLIDE: 20

Content features

User specified scenes

Match content

features

Match content

features

Apply Naïve Bayes Nearest

Neighbor classification

Apply Naïve Bayes Nearest

Neighbor classification

No scene match?

Label image with scene

Label image with scene

Scene match?

Remains unlabeledRemains

unlabeled

© Copyright National University of Singapore. All Rights Reserved.

CONTENT FEATURES

SLIDE: 21

Color-SIFT [Abdel2006]Captures scale-invariant and color-invariant characteristics

Maximally Stable Extremal Regions [Forssen2007]Describes region level features

ORB [Ethan2011]Rotation and noise resistant features

ColorGlobal color histogram description

[Ethan2011] Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. ORB: An efficient alternative to SIFT or SURF. ICCV 2011

[Abdel2006] A. Abdel-Hakim and A. Farag, CSIFT: A SIFT descriptor with color invariant characteristics. CVPR 2006 [Forssen2007] P.-E. Forssen, Maximally stable colour regions for recognition and matching. CVPR 2007

© Copyright National University of Singapore. All Rights Reserved.

NAÏVE BAYES NEAREST NEIGHBOUR

SLIDE: 22

1. Compute descriptors d1, d2, …, dn of the photo

2. Compute descriptors d1, d2, …, dn for all the scenes (Preprocessing step, done only once)

3. For each, di and for each scenes, compute the nearest neighbour of di: NN(di).

4. Choose matching scene with ARG MIN { SUM([di – NN(di)]2) }

© Copyright National University of Singapore. All Rights Reserved.

STEP 1: SCENE CLASSIFICATION

SLIDE: 23

Content (interest points)

User specified scenes

Match content

features

Match content

features

Apply Naïve Bayes Nearest

Neighbor classification

Apply Naïve Bayes Nearest

Neighbor classification

No scene match?

Label image with scene

Label image with scene

scene match?

Combine content, time, sensor features to find matching scene label

Combine content, time, sensor features to find matching scene label

AngleTime Step Content

Remains unlabeledRemains

unlabeled

© Copyright National University of Singapore. All Rights Reserved.

TIME FEATURE: TEMPORAL CLUSTERING

SLIDE: 24

0

5

10

15

20

25

0 20 40 60 80 100Photo Timestamps (in minutes)

Clu

ster

Id

Inter-photo time gaps

Photo-clusters

© Copyright National University of Singapore. All Rights Reserved.

TIME FEATURE: GAP LABELING

SLIDE: 25

t1 = 0min t2 = 1.5min t3 = 2min t4 = 5min t5 = 5.1min

Time

tgap Max{tgap} = 3 min

i1 = Scene 1 i5 = Scene 2i2 = No label i3 = No label i4 = No label

t1 = 0min t2 = 1.5min t3 = 2min t4 = 5min t5 = 5.1min

Time

tgap Max{tgap} = 3 min

i1 = Scene 1 i5 = Scene 2i2 = Scene 1 i3 = Scene 1 i4 = Scene 2

(a)

(b)

If time-gaps is less than a ΔT, which is a threshold obtained from the traces collected from dataset, then time-gap labeling is not applied because, photo as are too close to distinguish cluster boundaries

© Copyright National University of Singapore. All Rights Reserved.

ACCELEROMETER FEATURE: STEP COUNT BETWEEN PHOTO CAPTURES

SLIDE: 26

Photo capture

Steps

Accelerometer +

Step counting algorithm

© Copyright National University of Singapore. All Rights Reserved.

PHOTO ORIENTATION

SLIDE: 27

Photo orientation is obtained from sensor fusion of magnetic field sensor + Gyroscope + Accelerometer

© Copyright National University of Singapore. All Rights Reserved.

STEP 2: REGION ESTIMATION

SLIDE: 28

Image

Scene

Window-based estimation

Window-based estimation

Super-pixel based estimation

Super-pixel based estimation

© Copyright National University of Singapore. All Rights Reserved.

STEP 2: REGION ESTIMATION

SLIDE: 29

Image

Scene

Window-based estimation

Window-based estimation

Super-pixel based estimation

1.Apply SEEDS super-pixel to image and scene.

Super-pixel based estimation

1.Apply SEEDS super-pixel to image and scene.

© Copyright National University of Singapore. All Rights Reserved.

STEP 2: REGION ESTIMATION

SLIDE: 30

Image

Scene

Window-based estimation

Window-based estimation

Super-pixel based estimation

1.Apply SEEDS super-pixel to image and scene. 2.Compute super-pixels which have matching features.3.Estimate bounding box around it.

Super-pixel based estimation

1.Apply SEEDS super-pixel to image and scene. 2.Compute super-pixels which have matching features.3.Estimate bounding box around it.

© Copyright National University of Singapore. All Rights Reserved.

STEP 2: REGION ESTIMATION

SLIDE: 31

Image

Scene

Window-based estimation

1.Estimate region center using horizontal and vertical shifts in matching features of image and scene 2.Correct estimation using compass sensor z-axis

Window-based estimation

1.Estimate region center using horizontal and vertical shifts in matching features of image and scene 2.Correct estimation using compass sensor z-axis

Super-pixel based estimation

1.Apply SEEDS super-pixel to image and scene. 2.Compute super-pixels which have matching features.3.Estimate bounding box around it.

Super-pixel based estimation

1.Apply SEEDS super-pixel to image and scene. 2.Compute super-pixels which have matching features.3.Estimate bounding box around it.

© Copyright National University of Singapore. All Rights Reserved.

ESTIMATING REGION CENTER

SLIDE: 32

Region Center is estimated by computing horizontal and vertical shift in image features between candidate image and a scene reference image.

Scene Reference image Candidate Image

© Copyright National University of Singapore. All Rights Reserved.

ENERGY-DELAY-ACCURACY PERFORMANCE OF REGION CENTER ESTIMATION

SLIDE: 33

Image Resize Factor (%)

Delay (seconds)

Energy (mJoules)

Accuracy(%)

20 0.098 3.6 75.7

40 0.311 12.1 82.5

60 0.899 39.3 91.5

80 2.558 117.4 93.4

100 6.494 292.6 93.5

Subsampling to 60% of the original image size:•Only a 2% reduction in accuracy compared to using the original image •Power reduction by approximately 86%•computation time reduction by 86%

© Copyright National University of Singapore. All Rights Reserved.

STEP 2: REGION ESTIMATION

SLIDE: 34

Image

Scene

Find region center

Find region center

Super-pixel estimationSuper-pixel estimation

1. Iterate through bounding box around the region center

2. Find best match bounding box with the scene.

1. Iterate through bounding box around the region center

2. Find best match bounding box with the scene.

© Copyright National University of Singapore. All Rights Reserved.

STEP 2: REGION ESTIMATION

SLIDE: 35

Image

Scene

Find region center

Find region center

Super-pixel estimationSuper-pixel estimation

1. Iterate through bounding box around the region center

2. Find best match bounding box with the scene.

1. Iterate through bounding box around the region center

2. Find best match bounding box with the scene.

© Copyright National University of Singapore. All Rights Reserved.

STEP 2: REGION ESTIMATION

SLIDE: 36

Image

Scene

Find region center

Find region center

Super-pixel estimationSuper-pixel estimation

1. Iterate through bounding box around the region center

2. Find best match bounding box with the scene.

1. Iterate through bounding box around the region center

2. Find best match bounding box with the scene.

© Copyright National University of Singapore. All Rights Reserved.

STEP 2: REGION ESTIMATION

SLIDE: 37

Image

Scene

Window-based estimation

Window-based estimation

Super-pixel based estimation

Super-pixel based estimation

Use average of both estimates.Red: Our estimateBlue: Ground truth

© Copyright National University of Singapore. All Rights Reserved.

DEMO: REGION CENTER AND REGION WINDOW ESTIMATION

SLIDE: 38

© Copyright National University of Singapore. All Rights Reserved.

STEP 3: ADDING TO IMAGE HIERARCHY USING BOUNDING BOX (BB)

SLIDE: 39

High Context High Detail

Image

?BB <=L1 & >L2?

L1 L2 L3 L4

© Copyright National University of Singapore. All Rights Reserved.

STEP 3: ADDING TO IMAGE HIERARCHY USING BOUNDING BOX(BB)

SLIDE: 40

High Context High Detail

BB <=L2 & >L3?

L1 L2 L3 L4

© Copyright National University of Singapore. All Rights Reserved.

STEP 3: ADDING TO IMAGE HIERARCHY

SLIDE: 41

High Context High Detail

BB <=L3 & >L4?

L1 L2 L3 L4

© Copyright National University of Singapore. All Rights Reserved.

STEP 3: ADDING TO IMAGE HIERARCHY USING BOUNDING BOX(BB)

SLIDE: 42

High Context High Detail

BB matches L4 !

L1 L2 L3 L4

© Copyright National University of Singapore. All Rights Reserved.

DEMO: AUTOLINK IMAGE HIERARCHY

SLIDE: 43

© Copyright National University of Singapore. All Rights Reserved.

EVALUATION: METHODOLOGY

Precision and Recall Running time Bounding box accuracy Ranking accuracy

SLIDE: 44

Metrics

NBNN [Boiman2008] Bag-of-words Structure-from-motion [Wu2013] NBNN + Time only

AutoLink compared with…

Evaluation with 2 datasets:1.Single participant2.6 participant

Evaluation with 2 datasets:1.Single participant2.6 participant

[Wu2013] Changchang Wu, "Towards Linear-time Incremental Structure from Motion", 3DV 2013 [Boiman2008] O. Boiman, E. Shechtman, M. Irani, ” In Defense of Nearest-Neighbor Based Image Classification”, CVPR 2008

© Copyright National University of Singapore. All Rights Reserved.

EVALUATION: ACCURACY

SLIDE: 45

Dataset Single user Multiple user

Approach Precision Recall Precision Recall

AutoLink 0.7 0.78 0.71 0.51

NBNN + Sensors 0.76 0.38 0.74 0.37

NBNN + Time 0.64 0.73 0.64 0.32

NBNN 0.78 0.1 0.78 0.25

Structure-from-Motion

0.53 0.53 0.37 0.37

Bag-of-Visual words

0.19 0.029 0.19 0.19

Upto 70% precision and 78% recall

BOW poor performance without training

SfM does not classify scenes with similar features

© Copyright National University of Singapore. All Rights Reserved.

EVALUATION: RUNNING TIME

SLIDE: 46

Approach Average running time per image

(in milliseconds)

AutoLink 298

NBNN + Sensors 290

NBNN + Time 292

NBNN 287

Structure-from-Motion 5946.7

Bag-of-Visual words 35.9

20 times faster than SfM

BOW faster but poor accuracy

© Copyright National University of Singapore. All Rights Reserved.

EVALUATION: REGION MATCHING

SLIDE: 47

58% of photos have atleast 50% overlap

40% of photos have atleast 65% overlap

© Copyright National University of Singapore. All Rights Reserved.

EVALUATION: TOP-M PREDICTIONS FOR RANDOM BOUNDING BOXES

SLIDE: 48

Top-M predictions Percentage accuracy

1 56.6

2 70.4

3 74.9

5 76.9

10 80.14

15 81.11

20 81.11

Top-2 results have the best match to requested bounding box 70% of the time.

© Copyright National University of Singapore. All Rights Reserved.

THANK YOU

© Copyright National University of Singapore. All Rights Reserved.

CONTENT + TIME + SENSOR

1. For every unlabeled and labeled image pair (i,*), estimate:– Number of steps traversed s– Time gap t

2. For all scenes j, get |s – si,j| and |t – ti,j| using the transition maps. Rank scenes j in decreasing order of this difference.

3. Combine content ranks with the above ranks using mean reciprocal ranking

4. Label, unlabeled image * with the high ranking scene.

SLIDE: 50

© Copyright National University of Singapore. All Rights Reserved.

TRANSITION MAPS FOR TIME AND STEPS

SLIDE: 51

3

2

4

1

(s1,2,t1,2)

s3,4,t3,4

(s2,3,t2,3)

(s3,4,t3,4)

(s1,4,t1,4)

[si,j]: Matrix of steps taken from every scene i to every other scene j

[ti,j]: Matrix of time taken from every scene i to every other scene j