Upload
evan-short
View
220
Download
0
Embed Size (px)
Citation preview
Information Extraction from Multimedia Content on the Social Web
Stefan SiersdorferL3S Research Centre, Hannover, Germany
Meta Data and Visual Data on the Social Web
Meta Data:• Tags• Title Descriptions• Timestamps• Geo-Tags• Comments• Numerical Ratings• Users and Social Links
Visual Data:• Photos • Videos
How to exploit combined information from visual data and meta data?
Example 1: Photos in Flickr
Example 2: Videos in Youtube
Social Web Environments as Graph Structure
User 1
Video 1
Video 2Video 3
User 3
User 2tag1
tag2tag3
Group 2
Entities (Nodes): • Rescources (Videos, Photos)• Users• Tags• Groups
Relationships (Edges):• User-User: Contacts, Friendship• User-Resources: Ownership, Favorite Assignment, Rating• User-Groups: Membership• Resource-Resource: visual similarity, meta data similarity
User Feedback on the Social Web
• Numeric Ratings, Favorite Assignments• Comments• Clicks/Views• Contacts, Friendships• Community Tagging• Blog Entries • Upload of Content
How can exploit the community feedback?
Outline• Part 1: Photos on the Social Web
1.1) Photo Attractiveness 1.2) Generating Photo Maps 1.3) Sentiment in Photos
• Part 2: Videos on the Social Web Video Tagging
Part I: Photos on the Social Web
1.1) Photo Attractiveness *
* Stefan Siersdorfer, Jose San PedroRanking and Classifying Attractiveness of Photos in Folksonomies18th International World Wide Web Conference, WWW 2009, Madrid, Spain
10
Attractiveness of Images
Landscape Portrait Flower
Which factors influence the human perception of attractiveness?
11
Attractiveness Visual Features
Human visual perception mainly influenced byColor distribution
Coarseness
These are complex conceptsConvey multiple orthogonal aspects
Necessity to consider different low level features
12
Attractiveness Visual Features
Color FeaturesBrightness
Contrast
Luminance, RGB
Colorfulness
Naturalness
Saturation
Mean, Variance
Intensity of the colors
Saturation is 0 for grey scale images
13
Visual FeaturesCoarseness
Resolution + Acutance
Sharpness
Critical importance for final appearance of photos [Savakis 2000]
Textual FeaturesWe consider user generated meta data
Correlation of topics with image appealing (ground truth: favorite assignments)
Tags seem appropriate to capture this information
Attractiveness of Photos
Community-based models for classifying/ranking images according to their appeal. [WWW´09]
Content(visual features)
Metadata(textual features)
Community Feedback(photo’s interestingness) Classification &
Regression Attractiveness Models
Generator
InputsFlickr Photo Stream
cat, fence, house
#views#comments#favorites...
16
Classification & Regression Models
17
Experiments
1.2) Generating Photo Maps *
*Work and illustrations from
David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg,Mapping the World's Photos, 18th International World Wide Web Conference, WWW 2009, Madrid, Spain
Outline: Photos maps• Use geo-location, tags, and visual features of photos to
Identify popular locations and landmarks Find out location of photos Estimate representative images
Spatial Clustering
Each data point corresponds to (longitude,latidue) of an image
Mean shift clustering is applied to get hierarchical structure
Most distinctive popular tags are used as labels(# photos tag in cluster/ # photos with tag in overall set)
london
paris
eiffel
louvre
trafalgarsquare
tatemodern
Estimating Location of Photos without tags• Train SVMs on Clusters
Positive Examples: Photos in Clusters Negative Examples: Photos outside the Cluster
• Feature Representation Tags Visual features (SIFT)
• Best Performance for Combination of Tags and SIFT features
Finding Representative Images
Construct Weighted Graph: -Weight based on visual similarity of images (using SIFT features)-Use Graph Clustering (e.g. spectral clustering) to identify tightly connected components-Choose image from this connected component
Example 1: Europe
Example 2:New York
1.2) Sentiment in Photos *
* Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan DengAnalyzing and Predicting Sentiment of Images on the Social Web 18th ACM Multimedia Conference (MM 2010), Florence, Italy
Sentiment Analysis of Images
Data: more than 500,000 Flickr PhotosImage Features Global Color Histogram: a color is present in the image Local Color Histogram: a color is present at a particular location SIFT Visual Terms: b/w patterns rotated and scaledImage Sentiment SentiWordNet: provides sentiment values for terms
e.g. (pos, neg, obj) = (0.875, 0.0 , 0.125) for term „good“ used for obtaining sentiment categories training set + ground truth for experiments
Which are the most discriminative visual terms?
• Use Mutual Information Measure to determine these features:
• Probabilities (estimated through counting in image corpus): P(t): Probability that visual term t occurs in image P(c): Probability that image has sentiment category c („pos“ or „neg“) P(t,c): Prob. that image is in category c and has visual term t
• Intuition: „Terms that have high co-occurence with a category are more characteristic for that category.“
Most Discriminative FeaturesMost discriminative visual features: Extracted using the Mutual Information measure [ACM MM’11]
Part 2: Videos on the Social Web *
*Stefan Siersdorfer, Jose San Pedro, Mark SandersonContent Redundancy in YouTube and its Application to Video TaggingACM Transactions on Information Systems (TOIS), 2011
Stefan Siersdorfer, Jose San Pedro, Mark SandersonAutomatic Video Tagging using Content Redundancy 32nd ACM SIGIR Conference, Boston, USA, 2009
Near-duplicate Video Content
Youtube: most important video sharing environment [SIGCOM’07]: 85 M videos, 65 k videos/day, 100 M downloads per day,
Traffic to/from Youtube = 10% / 20% of the Web total
Redundancy: 25% of the videos are near duplicates
Can we use reduandancy to obtain richer video annotations? Automatic tagging
Automatic Tagging
What is it good for? Additional information Better user experience Richer feature vectors for ...
Automatic data organization (classification and clustering)
Video Search Knowledge Extraction ( creating ontologies)
Overlap Graph
Video 1
Video 3
Video 2
Video 5
Video 4
Video 1
Video 5
Video 2
Video 3
Video 4
Neighbor-based Tagging (1): Idea
• Video 4 contains original tags A, B; tags F,E are obtained from neighbors
• Criteria for automatic tagging: Prefer tags used by many neighbors Prefer tags from neighbors with a strong link
Video 1 Video 2 Video 3
Video 4
ABC
AE
BEF
ABFE
automaticallygenerated
Neighbor-based Tagging (2): Formal
Weights correspond to
overlap
Indicator functionSum over all
neighbors
Given: GO = (VO ;EO ) directed overlap graph
with weights w(vi ;vj ) = jvi \ vj jjvj j
Relevance of tag t for video vi :
rel(t;vi ) =P
(vj ;vi )2E OI (t;vj )w(vj ;vi )
Given: GO = (VO ;EO ) directed overlap graph
with weights w(vi ;vj ) = jvi \ vj jjvj j
Relevance of tag t for video vi :
rel(t;vi ) =P
(vj ;vi )2E OI (t;vj )w(vj ;vi )
Neighbor-based Tagging (3)Apply additional smoothing for redundant regions
Number of neighbors with tag t
Subsets of neighbors
Smoothing factor
Overlap Region
rel(t;v) =X
X 2P (N (v))
k(X )¡ 1X
i=0
®i ¢
¯̄¯̄¯̄v\
\
x2X
x¡[
u2N (v)¡ X
u
¯̄¯̄¯̄
jvjrel(t;v) =
X
X 2P (N (v))
k(X )¡ 1X
i=0
®i ¢
¯̄¯̄¯̄v\
\
x2X
x¡[
u2N (v)¡ X
u
¯̄¯̄¯̄
jvj
TagRank
• Takes also transitive relationships into account• PageRank-like weight propagation
rel(t;vi ) = TR(vi ;t) =X
(vj ;vi )2E O
TR(vj ;t)w(vj ;vi )
or in matrix form as Eigenvector equation
T R (t) =
0
BB@
w(v1;v1) w(v1;v2) ¢¢¢ w(v1;vn )w(v2;v1) w(v2;v2) ¢¢¢ w(v2;vn )
......
......
w(vn ;v1) w(vn ;v2) ¢¢¢ w(vn ;vn )
1
CCA
T
¢
0
BB@
TR(v1; t)TR(v2; t)
...TR(vn ; t)
1
CCA
with start vector
T R (t) =³I (t;v1);: :: ; I (t;vn)
´T
rel(t;vi ) = TR(vi ;t) =X
(vj ;vi )2E O
TR(vj ;t)w(vj ;vi )
or in matrix form as Eigenvector equation
T R (t) =
0
BB@
w(v1;v1) w(v1;v2) ¢¢¢ w(v1;vn )w(v2;v1) w(v2;v2) ¢¢¢ w(v2;vn )
......
......
w(vn ;v1) w(vn ;v2) ¢¢¢ w(vn ;vn )
1
CCA
T
¢
0
BB@
TR(v1; t)TR(v2; t)
...TR(vn ; t)
1
CCA
with start vector
T R (t) =³I (t;v1);: :: ; I (t;vn)
´T
Applications of Extended Tag Respresentation
• Use relevancies rel( t, vi) for constructing enriched feature vectors for videos: combine original tags with new tags weighted by relevance values
• automatic annotation : use thresholding to select most relevant tags for a given videos Manual assessment of tags show their relavance
• Data organization: Clustering and Classification experiments (Ground truth: Youtube categories of
videos) Improved performance through enriched feature representation
Summary
• Social Web contains visual information (photos, videos) and meta data (tags, time stamps, social links, spatial information, ..)
• A large variety of users provide explicit and implict feedback in social web environments (ratings, views, favorite assignments, comments, content of uploaded material)
• Visual Information & annotations can be combined to obtain enhanced feature representations
• Visual information can help to establish links between resources such as videos (application: information propagation)
• Feature representations in combination with community feedback can be used for machine learning (appliciation: classification, mapping).
References
Stefan Siersdorfer, Jose San Pedro, Mark SandersonContent Redundancy in YouTube and its Application to Video TaggingACM Transactions on Information Systems (TOIS), 2011
Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan DengAnalyzing and Predicting Sentiment of Images on the Social Web
18th ACM Multimedia Conference (MM 2010), Florence, Italy
Stefan Siersdorfer, Jose San Pedro, Mark SandersonAutomatic Video Tagging using Content Redundancy 32nd ACM SIGIR Conference, Boston, USA, 2009
Stefan Siersdorfer, Jose San PedroRanking and Classifying Attractiveness of Photos in Folksonomies18th International World Wide Web Conference, WWW 2009, Madrid, Spain
David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg Mapping the World's Photos 18th International World Wide Web Conference, WWW 2009, Madrid, Spain