Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany

Information Extraction from Multimedia Content on the Social Web

Stefan SiersdorferL3S Research Centre, Hannover, Germany

Meta Data and Visual Data on the Social Web

Meta Data:• Tags• Title Descriptions• Timestamps• Geo-Tags• Comments• Numerical Ratings• Users and Social Links

Visual Data:• Photos • Videos

How to exploit combined information from visual data and meta data?

Example 1: Photos in Flickr

Example 2: Videos in Youtube

Social Web Environments as Graph Structure

User 1

Video 1

Video 2Video 3

User 3

User 2tag1

tag2tag3

Group 2

Entities (Nodes): • Rescources (Videos, Photos)• Users• Tags• Groups

Relationships (Edges):• User-User: Contacts, Friendship• User-Resources: Ownership, Favorite Assignment, Rating• User-Groups: Membership• Resource-Resource: visual similarity, meta data similarity

User Feedback on the Social Web

• Numeric Ratings, Favorite Assignments• Comments• Clicks/Views• Contacts, Friendships• Community Tagging• Blog Entries • Upload of Content

How can exploit the community feedback?

Outline• Part 1: Photos on the Social Web

1.1) Photo Attractiveness 1.2) Generating Photo Maps 1.3) Sentiment in Photos

• Part 2: Videos on the Social Web Video Tagging

Part I: Photos on the Social Web

1.1) Photo Attractiveness *

* Stefan Siersdorfer, Jose San PedroRanking and Classifying Attractiveness of Photos in Folksonomies18th International World Wide Web Conference, WWW 2009, Madrid, Spain

10

Attractiveness of Images

Landscape Portrait Flower

Which factors influence the human perception of attractiveness?

11

Attractiveness Visual Features

Human visual perception mainly influenced byColor distribution

Coarseness

These are complex conceptsConvey multiple orthogonal aspects

Necessity to consider different low level features

12

Attractiveness Visual Features

Color FeaturesBrightness

Contrast

Luminance, RGB

Colorfulness

Naturalness

Saturation

Mean, Variance

Intensity of the colors

Saturation is 0 for grey scale images

13

Visual FeaturesCoarseness

Resolution + Acutance

Sharpness

Critical importance for final appearance of photos [Savakis 2000]

Textual FeaturesWe consider user generated meta data

Correlation of topics with image appealing (ground truth: favorite assignments)

Tags seem appropriate to capture this information

Attractiveness of Photos

Community-based models for classifying/ranking images according to their appeal. [WWW´09]

Content(visual features)

Metadata(textual features)

Community Feedback(photo’s interestingness) Classification &

Regression Attractiveness Models

Generator

InputsFlickr Photo Stream

cat, fence, house

#views#comments#favorites...

16

Classification & Regression Models

17

Experiments

1.2) Generating Photo Maps *

*Work and illustrations from

David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg,Mapping the World's Photos, 18th International World Wide Web Conference, WWW 2009, Madrid, Spain

Outline: Photos maps• Use geo-location, tags, and visual features of photos to

Identify popular locations and landmarks Find out location of photos Estimate representative images

Spatial Clustering

Each data point corresponds to (longitude,latidue) of an image

Mean shift clustering is applied to get hierarchical structure

Most distinctive popular tags are used as labels(# photos tag in cluster/ # photos with tag in overall set)

london

paris

eiffel

louvre

trafalgarsquare

tatemodern

Estimating Location of Photos without tags• Train SVMs on Clusters

Positive Examples: Photos in Clusters Negative Examples: Photos outside the Cluster

• Feature Representation Tags Visual features (SIFT)

• Best Performance for Combination of Tags and SIFT features

Finding Representative Images

Construct Weighted Graph: -Weight based on visual similarity of images (using SIFT features)-Use Graph Clustering (e.g. spectral clustering) to identify tightly connected components-Choose image from this connected component

Example 1: Europe

Example 2:New York

1.2) Sentiment in Photos *

* Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan DengAnalyzing and Predicting Sentiment of Images on the Social Web 18th ACM Multimedia Conference (MM 2010), Florence, Italy

Sentiment Analysis of Images

Data: more than 500,000 Flickr PhotosImage Features Global Color Histogram: a color is present in the image Local Color Histogram: a color is present at a particular location SIFT Visual Terms: b/w patterns rotated and scaledImage Sentiment SentiWordNet: provides sentiment values for terms

e.g. (pos, neg, obj) = (0.875, 0.0 , 0.125) for term „good“ used for obtaining sentiment categories training set + ground truth for experiments

Which are the most discriminative visual terms?

• Use Mutual Information Measure to determine these features:

• Probabilities (estimated through counting in image corpus): P(t): Probability that visual term t occurs in image P(c): Probability that image has sentiment category c („pos“ or „neg“) P(t,c): Prob. that image is in category c and has visual term t

• Intuition: „Terms that have high co-occurence with a category are more characteristic for that category.“

Most Discriminative FeaturesMost discriminative visual features: Extracted using the Mutual Information measure [ACM MM’11]

Part 2: Videos on the Social Web *

*Stefan Siersdorfer, Jose San Pedro, Mark SandersonContent Redundancy in YouTube and its Application to Video TaggingACM Transactions on Information Systems (TOIS), 2011

Stefan Siersdorfer, Jose San Pedro, Mark SandersonAutomatic Video Tagging using Content Redundancy 32nd ACM SIGIR Conference, Boston, USA, 2009

Near-duplicate Video Content

Youtube: most important video sharing environment [SIGCOM’07]: 85 M videos, 65 k videos/day, 100 M downloads per day,

Traffic to/from Youtube = 10% / 20% of the Web total

Redundancy: 25% of the videos are near duplicates

Can we use reduandancy to obtain richer video annotations? Automatic tagging

Automatic Tagging

What is it good for? Additional information Better user experience Richer feature vectors for ...

Automatic data organization (classification and clustering)

Video Search Knowledge Extraction ( creating ontologies)

Overlap Graph

Video 1

Video 3

Video 2

Video 5

Video 4

Video 1

Video 5

Video 2

Video 3

Video 4

Neighbor-based Tagging (1): Idea

• Video 4 contains original tags A, B; tags F,E are obtained from neighbors

• Criteria for automatic tagging: Prefer tags used by many neighbors Prefer tags from neighbors with a strong link

Video 1 Video 2 Video 3

Video 4

ABC

AE

BEF

ABFE

automaticallygenerated

Neighbor-based Tagging (2): Formal

Weights correspond to

overlap

Indicator functionSum over all

neighbors

Given: GO = (VO ;EO ) directed overlap graph

with weights w(vi ;vj ) = jvi \ vj jjvj j

Relevance of tag t for video vi :

rel(t;vi ) =P

(vj ;vi )2E OI (t;vj )w(vj ;vi )

Given: GO = (VO ;EO ) directed overlap graph

with weights w(vi ;vj ) = jvi \ vj jjvj j

Relevance of tag t for video vi :

rel(t;vi ) =P

(vj ;vi )2E OI (t;vj )w(vj ;vi )

Neighbor-based Tagging (3)Apply additional smoothing for redundant regions

Number of neighbors with tag t

Subsets of neighbors

Smoothing factor

Overlap Region

rel(t;v) =X

X 2P (N (v))

k(X )¡ 1X

i=0

®i ¢

¯̄¯̄¯̄v\

\

x2X

x¡[

u2N (v)¡ X

u

¯̄¯̄¯̄

jvjrel(t;v) =

X

X 2P (N (v))

k(X )¡ 1X

i=0

®i ¢

¯̄¯̄¯̄v\

\

x2X

x¡[

u2N (v)¡ X

u

¯̄¯̄¯̄

jvj

TagRank

• Takes also transitive relationships into account• PageRank-like weight propagation

rel(t;vi ) = TR(vi ;t) =X

(vj ;vi )2E O

TR(vj ;t)w(vj ;vi )

or in matrix form as Eigenvector equation

T R (t) =

0

BB@

w(v1;v1) w(v1;v2) ¢¢¢ w(v1;vn )w(v2;v1) w(v2;v2) ¢¢¢ w(v2;vn )

......

......

w(vn ;v1) w(vn ;v2) ¢¢¢ w(vn ;vn )

1

CCA

T

¢

0

BB@

TR(v1; t)TR(v2; t)

...TR(vn ; t)

1

CCA

with start vector

T R (t) =³I (t;v1);: :: ; I (t;vn)

´T

rel(t;vi ) = TR(vi ;t) =X

(vj ;vi )2E O

TR(vj ;t)w(vj ;vi )

or in matrix form as Eigenvector equation

T R (t) =

0

BB@

w(v1;v1) w(v1;v2) ¢¢¢ w(v1;vn )w(v2;v1) w(v2;v2) ¢¢¢ w(v2;vn )

......

......

w(vn ;v1) w(vn ;v2) ¢¢¢ w(vn ;vn )

1

CCA

T

¢

0

BB@

TR(v1; t)TR(v2; t)

...TR(vn ; t)

1

CCA

with start vector

T R (t) =³I (t;v1);: :: ; I (t;vn)

´T

Applications of Extended Tag Respresentation

• Use relevancies rel( t, vi) for constructing enriched feature vectors for videos: combine original tags with new tags weighted by relevance values

• automatic annotation : use thresholding to select most relevant tags for a given videos Manual assessment of tags show their relavance

• Data organization: Clustering and Classification experiments (Ground truth: Youtube categories of

videos) Improved performance through enriched feature representation

Summary

• Social Web contains visual information (photos, videos) and meta data (tags, time stamps, social links, spatial information, ..)

• A large variety of users provide explicit and implict feedback in social web environments (ratings, views, favorite assignments, comments, content of uploaded material)

• Visual Information & annotations can be combined to obtain enhanced feature representations

• Visual information can help to establish links between resources such as videos (application: information propagation)

• Feature representations in combination with community feedback can be used for machine learning (appliciation: classification, mapping).

References

Stefan Siersdorfer, Jose San Pedro, Mark SandersonContent Redundancy in YouTube and its Application to Video TaggingACM Transactions on Information Systems (TOIS), 2011

Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan DengAnalyzing and Predicting Sentiment of Images on the Social Web

18th ACM Multimedia Conference (MM 2010), Florence, Italy

Stefan Siersdorfer, Jose San Pedro, Mark SandersonAutomatic Video Tagging using Content Redundancy 32nd ACM SIGIR Conference, Boston, USA, 2009

Stefan Siersdorfer, Jose San PedroRanking and Classifying Attractiveness of Photos in Folksonomies18th International World Wide Web Conference, WWW 2009, Madrid, Spain

David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg Mapping the World's Photos 18th International World Wide Web Conference, WWW 2009, Madrid, Spain

http://www.l3s.de/~siersdorfer/sources/2011/TOIS-2011-youtube%28pre-print%29.pdf

http://www.l3s.de/~siersdorfer/sources/2010/mm10-siersdorfer.pdf

http://www.l3s.de/~siersdorfer/sources/2009/sigir225-siersdorfer.pdf

http://www.l3s.de/~siersdorfer/sources/2009/p771.pdf

Documents

Information Extraction from Multimedia Content on the Social Web Stefan Siersdorfer L3S Research Centre, Hannover, Germany