56
Semantic and Diverse Summarization of Egocentric Photo Events Aniol Lidon Baulida Master Computer Vision (UAB, UPC, UPF, UOC) Advisors: Xavier Giró Nieto, Image Processing Group, Universitat Politècnica de Catalunya Petia Radeva, Barcelona Perceptual Computing Lab, Universitat de Barcelona 1

Semantic and Diverse Summarization of Egocentric Photo Events

Embed Size (px)

Citation preview

Page 1: Semantic and Diverse Summarization of Egocentric Photo Events

Semantic and Diverse Summarization of Egocentric

Photo EventsAniol Lidon Baulida

Master Computer Vision (UAB, UPC, UPF, UOC)

Advisors:Xavier Giró Nieto, Image Processing Group, Universitat Politècnica de CatalunyaPetia Radeva, Barcelona Perceptual Computing Lab, Universitat de Barcelona

1

Page 2: Semantic and Diverse Summarization of Egocentric Photo Events

CollaborationBarcelona Perceptual Computing Laboratory :

Marc Bolaños, Petia Radeva

Image Processing Group:

Xavier Giró

Grup de Recerca Cervell, Cognició i Conducta:

Maite Garolera

Institute of Creative Media Technologies:

Matthias Zeppelzauer

2

Page 3: Semantic and Diverse Summarization of Egocentric Photo Events

Motivation• In 2013, 44.4 million people with dementia worldwide.• “Cognitive Stimulation Therapy”

3

Page 4: Semantic and Diverse Summarization of Egocentric Photo Events

Motivation• Lifelogging with Narrative Clip.• Up to 2000~3000 images at day!• Summarization is needed.

4

Page 5: Semantic and Diverse Summarization of Egocentric Photo Events

Goal

5

Automatically summarize events. • Sorting by priority.• Trade-off between relevance and diversity.• Obtaining sorted ranks.

Page 6: Semantic and Diverse Summarization of Egocentric Photo Events

Goal

6

RELEVANCE

Automatically summarize events. • Sorting by priority.• Trade-off between relevance and diversity.• Obtaining sorted ranks.

Page 7: Semantic and Diverse Summarization of Egocentric Photo Events

Goal

7

RELEVANCE

DIVERSITY

Automatically summarize events. • Sorting by priority.• Trade-off between relevance and diversity.• Obtaining sorted ranks.

Page 8: Semantic and Diverse Summarization of Egocentric Photo Events

Sate of the art• This project continues the work started by Ricard Mestre.

– Event segmentation and selecting the most repetitive image from an event.

• Off-the-shelf algorithms used:– Informativeness network: provided by Marc Bolaños (to be published)– Blur detection: Crete et al. The blur effect: perception and estimation with a new no-

reference perceptual blur metric– Saliency Maps: provided by Kevin McGuinness (to be published).– Face detection: Zhu et al. Face detection, pose estimation, and landmark localization in

the wild.– Object Candidates: Arbelaez et al. Multiscale Combinatorial Grouping – Object Detector: Hoffman et al. Large Scale Detection through Adaptation.– Affective: Campos et al. Diving Deep into Sentiment: Understanding Fine-tuned CNNs for

Visual Sentiment Prediction

8

Page 9: Semantic and Diverse Summarization of Egocentric Photo Events

Pipeline

9

Page 10: Semantic and Diverse Summarization of Egocentric Photo Events

Pipeline

10

Page 11: Semantic and Diverse Summarization of Egocentric Photo Events

Prefiltering

11

Aim: Removing uninformative images.

Informativeness network

Fine-tuning by Human Annotations

Filtering out: Discarding absolutely uninformative frames.

Page 12: Semantic and Diverse Summarization of Egocentric Photo Events

Pipeline

12

Page 13: Semantic and Diverse Summarization of Egocentric Photo Events

Pipeline

13

Page 14: Semantic and Diverse Summarization of Egocentric Photo Events

Relevance

14

What is relevance?Frame-level:

•Repeated.• Unusual.• WHAT? Representative of an activity. • WHO? Social interactions. • WHERE? Environment. • WHEN an event has occurred. • HOW activity occurred.

Page 15: Semantic and Diverse Summarization of Egocentric Photo Events

Relevance

15

What is relevance?Frame-level:

• WHAT? Representative of an activity. • Saliency Maps• Object detection

• WHO? Social interactions. • Face detection• Sentiment Analysis (Affectivity)

Page 16: Semantic and Diverse Summarization of Egocentric Photo Events

Relevance Ranking: pipeline

16

Prefiltering

Diversityre-ranking

Page 17: Semantic and Diverse Summarization of Egocentric Photo Events

Relevance rankingSaliency maps

SalNet CNN

Aim: Determining interesting zones.

Scoring for relevance: Averaging all saliency-map values.

17

Page 18: Semantic and Diverse Summarization of Egocentric Photo Events

Relevance ranking

18

Objects

LSDA Large Scale Detection through Adaptation

Object Detector

Aim: Finding well defined objects.

Scoring for relevance: Summing all detected objects scores.

Page 19: Semantic and Diverse Summarization of Egocentric Photo Events

Relevance ranking

19

Faces

Face detection, pose estimation, and landmark localization in the wild.

Aim: Finding well defined faces.

Scoring for relevance: Summing exponentially all faces confidences.

Page 20: Semantic and Diverse Summarization of Egocentric Photo Events

Relevance Ranking: pipeline

20

Prefiltering

Diversityre-ranking

Page 21: Semantic and Diverse Summarization of Egocentric Photo Events

Pipeline

21

Page 22: Semantic and Diverse Summarization of Egocentric Photo Events

Pipeline

22

Page 23: Semantic and Diverse Summarization of Egocentric Photo Events

Diversity re-ranking

Re-ranking by Soft Max Diversity Fusion

23

Color similarity

Faces similarity

Page 24: Semantic and Diverse Summarization of Egocentric Photo Events

Diversity re-ranking

Re-ranking by Soft Max Diversity Fusion

24

Color similarity

Faces similarity

Page 25: Semantic and Diverse Summarization of Egocentric Photo Events

Diversity re-ranking

Re-ranking by Soft Max Diversity Fusion

25

Color similarity

Faces similarity

Page 26: Semantic and Diverse Summarization of Egocentric Photo Events

Similarity measure

26

ImageNetEuclidean distance between features (L2 norm).

CNN trained with ImageNet DB (1000 classes) using CaffeNet Architecture.

Fully connected layer 8 removed.

Page 27: Semantic and Diverse Summarization of Egocentric Photo Events

Pipeline

27

Page 28: Semantic and Diverse Summarization of Egocentric Photo Events

Pipeline

28

Page 29: Semantic and Diverse Summarization of Egocentric Photo Events

Assesment

29

Validation of automatic approach

Manually annotated summaries

• 7 dataset with labelled ground-truth • 2 Online questionnaires• Mean Opinion Score

Psychologists feedback:

INTERMEDIATE VALIDATION FINAL EVALUATION

Page 30: Semantic and Diverse Summarization of Egocentric Photo Events

Subjective problem

30

Precision

GROUND-TRUTH SELECTED

Page 31: Semantic and Diverse Summarization of Egocentric Photo Events

Metric

31

Mean Normalized Sum of Max Similarities (MNSMS)

MN

SMS

n (%)

Normalization in both axesY: Divide by GT samplesX: Reshape samples to N bins

Ground-Truth

Sor

ted

List

(Res

ults

)

n=1

Similarity Sum= + +

Page 32: Semantic and Diverse Summarization of Egocentric Photo Events

Metric

32

Mean Normalized Sum of Max Similarities (MNSMS)

MN

SMS

n (%)

Normalization in both axesY: Divide by GT samplesX: Reshape samples to N bins

Ground-Truth

Sor

ted

List

(Res

ults

)

n=2

Similarity Sum= + +

Page 33: Semantic and Diverse Summarization of Egocentric Photo Events

Metric

33

Mean Normalized Sum of Max Similarities (MNSMS)

MN

SMS

n (%)

Normalization in both axesY: Divide by GT samplesX: Reshape samples to N bins

Ground-Truth

Sor

ted

List

(Res

ults

)

n= 3

Similarity Sum= + +

Page 34: Semantic and Diverse Summarization of Egocentric Photo Events

Metric

34

Mean Normalized Sum of Max Similarities (MNSMS)

MN

SMS

n (%)

Normalization in both axesY: Divide by GT samplesX: Reshape samples

Ground-Truth

Sor

ted

List

(Res

ults

)

Similarity Sum= + +

n= 4

Page 35: Semantic and Diverse Summarization of Egocentric Photo Events

AUC

Metric

35

Mean Normalized Sum of Max Similarities (MNSMS)

MN

SMS

n (%)

Normalization in both axesY: Divide by GT samplesX: Reshape samples

Ground-Truth

Sor

ted

List

(Res

ults

)

Similarity Sum= + +

n= 4

Page 36: Semantic and Diverse Summarization of Egocentric Photo Events

Assesment

36

Validation of automatic approach

Manually annotated summaries

• 7 dataset with labelled ground-truth• MNSMS (ImageNet) AUC

• 2 Online questionnaires• Mean Opinion Score

Psychologists feedback:

INTERMEDIATE VALIDATION FINAL EVALUATION

Page 37: Semantic and Diverse Summarization of Egocentric Photo Events

Intermediate validation

37

Prefiltering•Informativeness Network

•Hand Crafter Estimators

• Not prefitering

Page 38: Semantic and Diverse Summarization of Egocentric Photo Events

Intermediate validation

38

• SalNet

• SalNet + Gaussian

Objects Relevance• LSDA (object detector)

• MCG (object candidates)

0,7

0,75

0,8

0,85

0,9

SalNet SalNet + Gauss

0,7

0,75

0,8

0,85

0,9

LSDA MCG

Saliency RelevanceSaliency Relevance AUC

Objects Relevance AUC

Page 39: Semantic and Diverse Summarization of Egocentric Photo Events

Intermediate validation

Affective Relevance• Positive

• Negative

•Extremum

•Random

Sentiment analysis CNN • 2 classes: positive / negative

39

Page 40: Semantic and Diverse Summarization of Egocentric Photo Events

Assesment

40

Validation of automatic approach

Manually annotated summaries

• 7 dataset with labelled ground-truth• MNSMS (ImageNet) AUC

• 2 rounds of online questionnaires• Mean Opinion Score

Psychologists feedback:

INTERMEDIATE VALIDATION FINAL EVALUATION

Page 41: Semantic and Diverse Summarization of Egocentric Photo Events

Final evaluation

41

SIMILARITY• ImageNet CNN (fc8 removed)

• Places CNN (fc8 removed)

• LSDA (only spatial NMS)

• Fusion (ImageNet + Places + LSDA)

(Diversity re-ranking + Weight fusion in MNSMS)

Page 43: Semantic and Diverse Summarization of Egocentric Photo Events

Final evaluation

43

MEAN OPINION SCORE• ImageNet configuration

• Uniform Sampling

• Ground-truth (previous manual annotation)

Page 45: Semantic and Diverse Summarization of Egocentric Photo Events

Final resultsRepresentativity of summaries:

Preferred summary:

Mean Opinion Score (1 worse - 5 best)

45

Page 46: Semantic and Diverse Summarization of Egocentric Photo Events

GeneralizationMediaeval diverse task

• APPLICATION: Finding more information about a place to visit. • GOAL: Povide a ranked list of Flickr photos for a predefined set of queries. The

refined list should be both relevant to the query and also diverse.

46A. Lidon, M. Bolaños, M. Seidl, X. Giro-i Nieto, P. Radeva, and M. Zeppelzauer, “Upc-ub-stp @ mediaeval 2015 diversity task: Iterative reranking of relevant images,” in MediaEval 2015 Workshop, Wurzen, Germany, 2015.

0,40,420,440,460,48

0,50,520,540,56

Run 1 F1@20 (Visual)

Page 47: Semantic and Diverse Summarization of Egocentric Photo Events

Conclusions

• Contributions: – Mean Normalized Sum of Max Similarities. – New criterion for semantic diversity (based on LSDA).– New method for diversity fusion.– Online evaluation questionnaires.

47

Page 48: Semantic and Diverse Summarization of Egocentric Photo Events

Conclusions• Tested in two applications:

– Memory reinforcement for mild-dementia.– Diverse Social Images Task from the scientific MediaEval benchmark.

• Mean Opinion Score of 4.6 out of 5.00.

• Publications:– Working-notes paper in MediaEval challenge.– Wearable and Ego-vision Systems for Augmented Experience of the

journal IEEE Transactions on Human-Machine Systems.

• Code available: https://imatge.upc.edu/web/resources/semantic-and-diverse-summarization-egocentric-photo-events-software

48

Page 49: Semantic and Diverse Summarization of Egocentric Photo Events

Future work

• Further in other relevance criterion.• Higher level of semantics. • Determine automatically the summary length.

49

Page 50: Semantic and Diverse Summarization of Egocentric Photo Events

Thanks for your attention!

50

Page 51: Semantic and Diverse Summarization of Egocentric Photo Events

Prefiltering

51

Hand-crafted estimators

Blur

Black

Burned Color mean

Crete et al.

Informativeness network

•CNN trained with ImageNet + Places.

•Finetuned with human annotations: relevant / irrelevant

by Marc Bolaños (UB)

Page 52: Semantic and Diverse Summarization of Egocentric Photo Events

Relevance ranking

52

Affective

• VitorNet CNN (2 classes sentiment prediccions)

by Victor Campos (UPC)

Page 53: Semantic and Diverse Summarization of Egocentric Photo Events

Relevance ranking

53

Late fusion

• Score normalization:•By Rank

•By Score

• Aggregate scores

Using MNSMS weights will be learned

Page 54: Semantic and Diverse Summarization of Egocentric Photo Events

Similarity measure

54

ImageNet

Places

LSDa

CNN trained with ImageNet DB (1000 classes) using CaffeNet Architecture.

Fully connected layer 8 removed.

CNN trained with Places (476 classes) DB using CaffeNet Architecture.

Fully connected layer 8 removed.

Object detector : Large Scale Detection through Adaptation (7500 classes).Knowledgement transfer: Classifiers without bounding box annotated data into detectorsTwo post-processing steps of no-maxima supression.

Page 55: Semantic and Diverse Summarization of Egocentric Photo Events

ResultMediaeval diverse task

• APPLICATION: Finding more information about a place to visit. • GOAL: Povide a ranked list of Flickr photos for a predefined set of queries. The

refined list should be both relevant to the query and also diverse.

Ranking for relevance

Filtering

Distance computation

Diversity

Informativeness network, Textual

Keep N% top results

ImageNet, Places, Textual

Diverse top results

Page 56: Semantic and Diverse Summarization of Egocentric Photo Events

ResultMediaeval diverse task

• APPLICATION: Finding more information about a place to visit. • GOAL: Povide a ranked list of Flickr photos for a predefined set of queries. The

refined list should be both relevant to the query and also diverse.

Visual Textual Multi Crediv. Multi