39
Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Embed Size (px)

Citation preview

Page 1: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Information Extraction from Multimedia Content on the Social Web

Stefan SiersdorferL3S Research Centre, Hannover, Germany

Page 2: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Meta Data and Visual Data on the Social Web

Meta Data:• Tags• Title Descriptions• Timestamps• Geo-Tags• Comments• Numerical Ratings• Users and Social Links

Visual Data:• Photos • Videos

How to exploit combined information from visual data and meta data?

Page 3: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Example 1: Photos in Flickr

Page 4: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Example 2: Videos in Youtube

Page 5: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Social Web Environments as Graph Structure

User 1

Video 1

Video 2Video 3

User 3

User 2tag1

tag2tag3

Group 2

Entities (Nodes): • Rescources (Videos, Photos)• Users• Tags• Groups

Relationships (Edges):• User-User: Contacts, Friendship• User-Resources: Ownership, Favorite Assignment, Rating• User-Groups: Membership• Resource-Resource: visual similarity, meta data similarity

Page 6: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

User Feedback on the Social Web

• Numeric Ratings, Favorite Assignments• Comments• Clicks/Views• Contacts, Friendships• Community Tagging• Blog Entries • Upload of Content

How can exploit the community feedback?

Page 7: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Outline• Part 1: Photos on the Social Web

1.1) Photo Attractiveness 1.2) Generating Photo Maps 1.3) Sentiment in Photos

• Part 2: Videos on the Social Web Video Tagging

Page 8: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Part I: Photos on the Social Web

Page 9: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

1.1) Photo Attractiveness *

* Stefan Siersdorfer, Jose San PedroRanking and Classifying Attractiveness of Photos in Folksonomies18th International World Wide Web Conference, WWW 2009, Madrid, Spain

Page 10: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

10

Attractiveness of Images

Landscape Portrait Flower

Which factors influence the human perception of attractiveness?

Page 11: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

11

Attractiveness Visual Features

Human visual perception mainly influenced byColor distribution

Coarseness

These are complex conceptsConvey multiple orthogonal aspects

Necessity to consider different low level features

Page 12: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

12

Attractiveness Visual Features

Color FeaturesBrightness

Contrast

Luminance, RGB

Colorfulness

Naturalness

Saturation

Mean, Variance

Intensity of the colors

Saturation is 0 for grey scale images

Page 13: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

13

Visual FeaturesCoarseness

Resolution + Acutance

Sharpness

Critical importance for final appearance of photos [Savakis 2000]

Page 14: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Textual FeaturesWe consider user generated meta data

Correlation of topics with image appealing (ground truth: favorite assignments)

Tags seem appropriate to capture this information

Page 15: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Attractiveness of Photos

Community-based models for classifying/ranking images according to their appeal. [WWW´09]

Content(visual features)

Metadata(textual features)

Community Feedback(photo’s interestingness) Classification &

Regression Attractiveness Models

Generator

InputsFlickr Photo Stream

cat, fence, house

#views#comments#favorites...

Page 16: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

16

Classification & Regression Models

Page 17: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

17

Experiments

Page 18: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

1.2) Generating Photo Maps *

*Work and illustrations from

David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg,Mapping the World's Photos, 18th International World Wide Web Conference, WWW 2009, Madrid, Spain

Page 19: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Outline: Photos maps• Use geo-location, tags, and visual features of photos to

Identify popular locations and landmarks Find out location of photos Estimate representative images

Page 20: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Spatial Clustering

Each data point corresponds to (longitude,latidue) of an image

Mean shift clustering is applied to get hierarchical structure

Most distinctive popular tags are used as labels(# photos tag in cluster/ # photos with tag in overall set)

london

paris

eiffel

louvre

trafalgarsquare

tatemodern

Page 21: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Estimating Location of Photos without tags• Train SVMs on Clusters

Positive Examples: Photos in Clusters Negative Examples: Photos outside the Cluster

• Feature Representation Tags Visual features (SIFT)

• Best Performance for Combination of Tags and SIFT features

Page 22: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Finding Representative Images

Construct Weighted Graph: -Weight based on visual similarity of images (using SIFT features)-Use Graph Clustering (e.g. spectral clustering) to identify tightly connected components-Choose image from this connected component

Page 23: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Example 1: Europe

Page 24: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Example 2:New York

Page 25: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

1.2) Sentiment in Photos *

* Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan DengAnalyzing and Predicting Sentiment of Images on the Social Web 18th ACM Multimedia Conference (MM 2010), Florence, Italy

Page 26: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Sentiment Analysis of Images

Data: more than 500,000 Flickr PhotosImage Features Global Color Histogram: a color is present in the image Local Color Histogram: a color is present at a particular location SIFT Visual Terms: b/w patterns rotated and scaledImage Sentiment SentiWordNet: provides sentiment values for terms

e.g. (pos, neg, obj) = (0.875, 0.0 , 0.125) for term „good“ used for obtaining sentiment categories training set + ground truth for experiments

Page 27: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Which are the most discriminative visual terms?

• Use Mutual Information Measure to determine these features:

• Probabilities (estimated through counting in image corpus): P(t): Probability that visual term t occurs in image P(c): Probability that image has sentiment category c („pos“ or „neg“) P(t,c): Prob. that image is in category c and has visual term t

• Intuition: „Terms that have high co-occurence with a category are more characteristic for that category.“

Page 28: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Most Discriminative FeaturesMost discriminative visual features: Extracted using the Mutual Information measure [ACM MM’11]

Page 29: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Part 2: Videos on the Social Web *

*Stefan Siersdorfer, Jose San Pedro, Mark SandersonContent Redundancy in YouTube and its Application to Video TaggingACM Transactions on Information Systems (TOIS), 2011

Stefan Siersdorfer, Jose San Pedro, Mark SandersonAutomatic Video Tagging using Content Redundancy 32nd ACM SIGIR Conference, Boston, USA, 2009

Page 30: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Near-duplicate Video Content

Youtube: most important video sharing environment [SIGCOM’07]: 85 M videos, 65 k videos/day, 100 M downloads per day,

Traffic to/from Youtube = 10% / 20% of the Web total

Redundancy: 25% of the videos are near duplicates

Can we use reduandancy to obtain richer video annotations? Automatic tagging

Page 31: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Automatic Tagging

What is it good for? Additional information Better user experience Richer feature vectors for ...

Automatic data organization (classification and clustering)

Video Search Knowledge Extraction ( creating ontologies)

Page 32: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Overlap Graph

Video 1

Video 3

Video 2

Video 5

Video 4

Video 1

Video 5

Video 2

Video 3

Video 4

Page 33: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Neighbor-based Tagging (1): Idea

• Video 4 contains original tags A, B; tags F,E are obtained from neighbors

• Criteria for automatic tagging: Prefer tags used by many neighbors Prefer tags from neighbors with a strong link

Video 1 Video 2 Video 3

Video 4

ABC

AE

BEF

ABFE

automaticallygenerated

Page 34: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Neighbor-based Tagging (2): Formal

Weights correspond to

overlap

Indicator functionSum over all

neighbors

Given: GO = (VO ;EO ) directed overlap graph

with weights w(vi ;vj ) = jvi \ vj jjvj j

Relevance of tag t for video vi :

rel(t;vi ) =P

(vj ;vi )2E OI (t;vj )w(vj ;vi )

Given: GO = (VO ;EO ) directed overlap graph

with weights w(vi ;vj ) = jvi \ vj jjvj j

Relevance of tag t for video vi :

rel(t;vi ) =P

(vj ;vi )2E OI (t;vj )w(vj ;vi )

Page 35: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Neighbor-based Tagging (3)Apply additional smoothing for redundant regions

Number of neighbors with tag t

Subsets of neighbors

Smoothing factor

Overlap Region

rel(t;v) =X

X 2P (N (v))

k(X )¡ 1X

i=0

®i ¢

¯̄¯̄¯̄v\

\

x2X

x¡[

u2N (v)¡ X

u

¯̄¯̄¯̄

jvjrel(t;v) =

X

X 2P (N (v))

k(X )¡ 1X

i=0

®i ¢

¯̄¯̄¯̄v\

\

x2X

x¡[

u2N (v)¡ X

u

¯̄¯̄¯̄

jvj

Page 36: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

TagRank

• Takes also transitive relationships into account• PageRank-like weight propagation

rel(t;vi ) = TR(vi ;t) =X

(vj ;vi )2E O

TR(vj ;t)w(vj ;vi )

or in matrix form as Eigenvector equation

T R (t) =

0

BB@

w(v1;v1) w(v1;v2) ¢¢¢ w(v1;vn )w(v2;v1) w(v2;v2) ¢¢¢ w(v2;vn )

......

......

w(vn ;v1) w(vn ;v2) ¢¢¢ w(vn ;vn )

1

CCA

T

¢

0

BB@

TR(v1; t)TR(v2; t)

...TR(vn ; t)

1

CCA

with start vector

T R (t) =³I (t;v1);: :: ; I (t;vn)

´T

rel(t;vi ) = TR(vi ;t) =X

(vj ;vi )2E O

TR(vj ;t)w(vj ;vi )

or in matrix form as Eigenvector equation

T R (t) =

0

BB@

w(v1;v1) w(v1;v2) ¢¢¢ w(v1;vn )w(v2;v1) w(v2;v2) ¢¢¢ w(v2;vn )

......

......

w(vn ;v1) w(vn ;v2) ¢¢¢ w(vn ;vn )

1

CCA

T

¢

0

BB@

TR(v1; t)TR(v2; t)

...TR(vn ; t)

1

CCA

with start vector

T R (t) =³I (t;v1);: :: ; I (t;vn)

´T

Page 37: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Applications of Extended Tag Respresentation

• Use relevancies rel( t, vi) for constructing enriched feature vectors for videos: combine original tags with new tags weighted by relevance values

• automatic annotation : use thresholding to select most relevant tags for a given videos Manual assessment of tags show their relavance

• Data organization: Clustering and Classification experiments (Ground truth: Youtube categories of

videos) Improved performance through enriched feature representation

Page 38: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Summary

• Social Web contains visual information (photos, videos) and meta data (tags, time stamps, social links, spatial information, ..)

• A large variety of users provide explicit and implict feedback in social web environments (ratings, views, favorite assignments, comments, content of uploaded material)

• Visual Information & annotations can be combined to obtain enhanced feature representations

• Visual information can help to establish links between resources such as videos (application: information propagation)

• Feature representations in combination with community feedback can be used for machine learning (appliciation: classification, mapping).

Page 39: Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

References

Stefan Siersdorfer, Jose San Pedro, Mark SandersonContent Redundancy in YouTube and its Application to Video TaggingACM Transactions on Information Systems (TOIS), 2011

Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan DengAnalyzing and Predicting Sentiment of Images on the Social Web

18th ACM Multimedia Conference (MM 2010), Florence, Italy

Stefan Siersdorfer, Jose San Pedro, Mark SandersonAutomatic Video Tagging using Content Redundancy 32nd ACM SIGIR Conference, Boston, USA, 2009

Stefan Siersdorfer, Jose San PedroRanking and Classifying Attractiveness of Photos in Folksonomies18th International World Wide Web Conference, WWW 2009, Madrid, Spain

David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg Mapping the World's Photos 18th International World Wide Web Conference, WWW 2009, Madrid, Spain