Task 1 (I1.1): Fundamentals of Context-aware Real-time Data
Fusion
Fundamental of Multi-modal Data Fusion on Multimedia Information
NetworksPrincipal Investigator: Thomas Huang
Post Doctor: Xi ZhouPhD Student: Guo-Jun Qi
Electrical and Computer Engineering University of Illinois at Urbana-Champaign
Project Team• Principal Investigator: Thomas Huang• Collaborators:
– IBM: Charu Aggarval (QoI and sensor networks)– IBM: Zhen Wen (social networks)– UIUC: Tarek Abdelzaher (communication networks)– CUNY: Heng Ji (natural language processing)
• Post doctorate researcher: Xi Zhou• PhD student: Guo-Jun Qi, Mert Dickman and Zhaowen
Wang• Undergraduate student: Shiyu Chang
Motivation• Structured information networks
– Can handle heterogeneous structure with various input types
– can effectively model large structured ontological network at semantic level
– Structure is a way to represent context
• Utilization– efficient and effective inference engine – Information and knowledge extraction from
ontological networks
Contributions to I1.1
• Connections to constrained conditional model (CCM)– Discover constraint links
• between heterogeneous objects • Between concept nodes
• Connections to latency analysis– Reveal cross-media redundancy/relationship– Trade-off between low-latency and high quality
Multimedia Information Network
• Is graph with both data nodes and concept nodes– Edges linking concepts: ontology– Edges linking data nodes: similarity, association
and co-occurrence– Edges linking concept and data: attachment of
concept to data
Multimedia Information Network (MINet)
• Data nodes: heterogeneous networks with cross-media contents– Videos/Images/Speech– Surrounding text/user tags– GPS meta-data
• Concept nodes: Ontological Networks with correlated categories– Non-flat concept structure– Example links between concepts
• A is a subclass of B• C is a part of D• X attacks Y
Network Structure
Potential Army Impact
• Construct large scale MINets combining – Cross-media heterogeneous data networks
• Examples– Battlefield videos/images– Satellite images– Acoustics Sensor signal
– Ontological concept networks• Military-related concepts and their links
• Make better military decisions – More timely and more accurately– More robust with missing information
Technical Contributions• Cross-Domain Knowledge Propagation
– Propagating Knowledge in surrounding text to visual data– Published in WWW’11, collaboration with Dr. Charu Aggarwal, IBM
• Cross-Category Knowledge Sharing– Exploring the concept correlations to enhance the inference accuracy– To appear in CVPR’11, collaboration with Dr. Charu Aggarwal, IBM
• Modeling Context-Aware Image Similarity – Using Hierarchical Gaussianization (HG), ICCV’09– Applications into Disaster Assessment (Collaboration with Prof. Tarek)– KDD’11, submitted
Cross-Domain Knowledge Propagation:
Two Steps• How to bridge the domain gap between text and
image? – Our approach: We construct a translator function
between text and images that establishes “virtual” links between them.
• How can we annotate image labels from text labels?– Our approach: The labels of text can be propagated
into that of images via the learned translator.
Challenges
• The model can– Work in constrained environment
• Missing links between text and images• Learn translation function to link text and images
– Be resistant to noisy cross-media links, improved QoI
• Misleading related text surrounding images• Use a compact intermediate representation to remove
nonessential and noisy links– Low-rank principle with fewest topics for across-domain
translation
Cross-Domain Label Propagation
( )
( ) ( ) ( ) ( )
1
,sn
s s t ti i T
i
y T x x f x
Label Propagation:
Cross-Domain Label Propagation
( )
( ) ( ) ( ) ( )
1
,sn
s s t ti i T
i
y T x x f x
Source labels
Label Propagation:
Cross-Domain Label Propagation
( )
( ) ( ) ( ) ( )
1
,sn
s s t ti i T
i
y T x x f x
Cross-domain translator
Label Propagation:
Cross-Domain Label Propagation
( )
( ) ( ) ( ) ( )
1
,sn
s s t ti i T
i
y T x x f x
Prediction function
Label Propagation:
Learning Optimal Translator
Learning formulation via optimizing translator function:
• The first term: maximize across-domain association from a set of
co-occurrence pairs of source-target instances.
• The second term: minimize the training loss
• The third term: regularizer for preference of concise translator to
tedious one
• Improve QoI : remove nonessential and noisy observation from translation
process
( )
( ) ( ) ( ) ( ),
1
min ,tn
s t t tk l k l j T jT
j
c T x x y f x T
C
Constructing Cross-Domain Translator
Source instances (text)
Target instances (images)
Bridge the cross-domain gap?
Constructing Cross-Domain Translator
Source instances (text)
Target instances (images)
W(s) W(t)
Common Latent Space
( ) ( ) ( ) ( ), , ' ' 's t s t s t s ts t s tT x x W x W x x W W x x Sx
Inner product in latent space as translator
Constructing Cross-Domain Translator
• A low dimensional latent space is preferred – Impose Normal l2 regularizer to improve the prediction
accuracy
Trace norm
– Equivalent to a low-rank prior on latent space
– Indicate Principle of concise cross-domain translation: “fewer latent topics (dimensionality) are preferred!”
( ) ( )
2 2( ) ( )
'inf
s t
s t
F FS W WS W W
Experiments: Cross-Domain Dataset
• Text corpus and associated images are crawled from Flickr.com and wikipedia.com.
• We extract and spam all tokens in each text document, whose frequencies are used as text features.
• For each image, visual words are extracted with a size of 500 codebook.
Dataset Statistics
Category Number of crawled pairs
Category Number of crawled pairs
Birds 930 Horses 654
Buildings 9216 Mountain 4153
Cars 728 Plane 1356
Cat 229 Train 457
Dog 486 Waterfall 22006
The number of text and image pairs for each category
Dataset (cont’d)
Compared Algorithms
• Image only – only the visual features are used for modeling
classifiers on the target image domain.• Translated Learning by minimizing Risk (TLRisk)
– Transfer text labels in the source domain to the target image domain via a Markovian chain.
• Heterogeneous Transfer Learing (HTL)– Implicitly construct a distance function between
images by a matrix factorization between images and text documents
Results• Average error rates with respect to different
number of training samples in image domain.
Results• Average error rates with respect to different
number of text/image co-occurrence pairs with five training examples)
Results• Number of Topics in latent space for
establishing cross-domain translator Category # topics
Birds 11
Buildings 88
Cars 19
Cat 18
Dog 7
Horses 4
Mountain 6
Plane 15
Train 6
Waterfall 21
Too many building variants!
Revisit Technical Contributions• Cross-Domain Knowledge Propagation
– Propagating Knowledge in surrounding text to visual data– Published in WWW’11, collaboration with Dr. Charu Aggarwal, IBM
• Cross-Category Knowledge Sharing– Exploring the concept correlations to enhance the inference
accuracy– To appear in CVPR’11, collaboration with Dr. Charu Aggarwal, IBM
• Modeling Image Similarity – Hierarchical Gaussianization (HG), ICCV’09– Applications into Disaster Assessment (Collaboration with Prof.
Tarek)
Future Work (Q3)
• Resource allocation based on heterogeneous
links for communication
– Low-redundancy: In base station, send the most
informative message (text/multimedia data)
– High-quality: In data center, recover the lost
information based on redundancy in cross-media links
• Effective linkage analysis with constraints in CCM
Future Work (Q4)
• Develop the stochastic and dynamic model
and theory for MINet
– The effect of structural changes in MINet
• For latency analysis in communication networks
• For constrained linkage discovery in CCM
– The changes of QoI in a dynamic MINet
Path Ahead: Theory and Algorithm• Construct Cross-Media Analysis (CMA) Theory
– Stochastic model for cross-media relation and redundancy
• QoI theory in cross-media networks• Information recovery based on cross-media redundancy • Dynamic model for cross-media networks• Analyze constrained links for CCM
• Practical algorithms for sharing and transmitting information in cross-media links
• Improve low latency and high quality in communication networks based on cross-media analysis
• Applications into CCM for robust constrained link discovery• Cross-media knowledge sharing and discovery
Collaboration Summary
• INARC 1.1: Prof. Tarek Abdelzaher– Cross-media analysis for communication networks– Trade-off between Low latency and high quality
• INARC 1.2: Dr. Charu Aggarwal– Cross-domain knowledge propagation– Cross-Category knowledge sharing– Quality of Information
Publications• Collaboration with Dr. Charu Aggarwal (IBM)
– Guo-Jun Qi, Charu Aggarwal and Thomas Huang, Towards Cross-Domain Knowledge Propagation from Text Corpus to Web Images, to appear in Proc. of International World Wide Web conference (WWW 2011), Hyderabad, India, March 28-April 1, 2011.
– Guo-Jun Qi, Charu Aggarwal, Yong Rui, Qi Tian, Shiyu Chang and Thomas Huang. Towards Cross-Category Knowledge Propagation for Learning Visual Concepts. To appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, Colorado, June 21-23, 2011.
– Guo-Jun Qi, Charu Aggarwal, Thomas Huang. Transfer learning with distance functions between text and web images. Submitted to the ACM KDD Conference, 2011.
• Collaboration with Prof. Tarek Abdelzaher– Md Y. S. Uddin, Guo-Jun Qi, and Tarek Abdelzaher, Thomas Huang, Guohong
Cao, “PhotoNet: A Similarity-aware Image Delivery Service for Situation Awareness,” IPSN Demo, April 2011
Thanks! Q&A
Dataset in Target DomainThe number of images for each category in target domain
Category # Positive examples
# Negative examples
Birds 338 349
Buildings 2301 2388
Cars 120 125
Cat 67 72
Dog 132 142
Horses 263 268
Mountain 927 1065
Plane 509 549
Train 52 53
Waterfall 5153 5737
Learning Optimal Translator
Learning formulation via optimizing translator function:
• The first term: measuring the consistency between the observed occurrence of text and images.• Occurrence set
• is monotonically decreasing function, so that a pair with larger occurrence number ck,l will be weighted more.
• Co-occurring pairs of source and target samples probably share the same labels, and the translator T shall have larger response to propagate the labels between them.
( )
( ) ( ) ( ) ( ),
1
min ,tn
s t t tk l k l j T jT
j
c T x x y f x T
C
( ) ( ), , ,s t
k l k lc x xC
( ) ( ),, is a source/target pair with occurrence number s t
k l k lx x c
Learning Optimal Predictor
Learning formulation via optimizing translator function:
• The second term: the loss function of predictor fT on training set (e.g., logistic loss).
• encode the discriminative knowledge in the training set.
• Large margin principle: it can reduces the noisy information in the occurring set for the classification task.
( )
( ) ( ) ( ) ( ),
1
min ,tn
s t t tk l k l j T jT
j
c T x x y f x T
C
log 1 exp( )z z
Learning Optimal Predictor
Learning formulation via optimizing translator function:
• The third term: encoding the preference of concise semantic
translation to the tedious one.
• The Principle of constructing “Cross-Domain translator.”
• Nonessential and noisy observation can be filtered out from translation
process
( )
( ) ( ) ( ) ( ),
1
min ,tn
s t t tk l k l j T jT
j
c T x x y f x T
C
Results• Number of Topics in latent space for
establishing cross-domain translator Category Two Trn. Ex. Ten Trn. Ex.
Birds 11 9
Buildings 88 102
Cars 19 3
Cat 18 2
Dog 7 5
Horses 4 1
Mountain 6 1
Plane 15 25
Train 6 3
Waterfall 21 26
Too many building variants!
Modeling Context-Aware Image Similarity• Current method
– Image visual similarity – Hierarchical Gaussianization ICCV’09 (Zhou, Huang etc.)
– Hard to model image similarity at semantic level• Model image semantic similarity
– Link images to text documents by translator– Compare associated text similarity for comparing
image semantics– Advantage
• ``Semantic gap” in text documents are smaller• Such similarity reflects semantic level information
DiagramImage Similarity(target domain)
Text-image Associationby learned translator
T (x , y)
Text Similarity(source domain)
Path ahead
• Improve the Quality of Information (QoI) transmitted across domains.– In some cases, the transmitted information may
make a negative effect on classification task (negative information transfer).
– Construct a new model which allows to predict upon target domain itself when the cross-domain information is detected to be noise.
Future Work
• Semantic Level Image similarity in heterogeneous networks– Different sources of heterogeneous sensors, e.g.,
cameras, human annotations and textual descriptions
– Fusing heterogeneous sources in the networks to learn a more descriptive image similarity
– Collaboration with Dr. Charu Aggarwal in IBM on sensor networks and Prof. Tarek Abdelzaher in UIUC on Fact Finder
Linked to INARC Projects
• Collaborator
– Prof. Tarek Abdelzaher in CS, UIUC (I 1.1)
• Fact Finder: Compare the image similarity at semantic
level for discovering trustful sources
– Dr. Charu Aggarwal in IBM (I 1.2)
• Sensor networks: comparing the signal similarity with
cross-domain knowledge