Download ppt - Task 1 (I1.1): Fundamentals of Context-aware Real-time Data Fusion

Task 1 (I1.1): Fundamentals of Context-aware Real-time Data

Fusion

Fundamental of Multi-modal Data Fusion on Multimedia Information

NetworksPrincipal Investigator: Thomas Huang

Post Doctor: Xi ZhouPhD Student: Guo-Jun Qi

Electrical and Computer Engineering University of Illinois at Urbana-Champaign

Project Team• Principal Investigator: Thomas Huang• Collaborators:

– IBM: Charu Aggarval (QoI and sensor networks)– IBM: Zhen Wen (social networks)– UIUC: Tarek Abdelzaher (communication networks)– CUNY: Heng Ji (natural language processing)

• Post doctorate researcher: Xi Zhou• PhD student: Guo-Jun Qi, Mert Dickman and Zhaowen

Wang• Undergraduate student: Shiyu Chang

Motivation• Structured information networks

– Can handle heterogeneous structure with various input types

– can effectively model large structured ontological network at semantic level

– Structure is a way to represent context

• Utilization– efficient and effective inference engine – Information and knowledge extraction from

ontological networks

Contributions to I1.1

• Connections to constrained conditional model (CCM)– Discover constraint links

• between heterogeneous objects • Between concept nodes

• Connections to latency analysis– Reveal cross-media redundancy/relationship– Trade-off between low-latency and high quality

Multimedia Information Network

• Is graph with both data nodes and concept nodes– Edges linking concepts: ontology– Edges linking data nodes: similarity, association

and co-occurrence– Edges linking concept and data: attachment of

concept to data

Multimedia Information Network (MINet)

• Data nodes: heterogeneous networks with cross-media contents– Videos/Images/Speech– Surrounding text/user tags– GPS meta-data

• Concept nodes: Ontological Networks with correlated categories– Non-flat concept structure– Example links between concepts

• A is a subclass of B• C is a part of D• X attacks Y

Network Structure

Potential Army Impact

• Construct large scale MINets combining – Cross-media heterogeneous data networks

• Examples– Battlefield videos/images– Satellite images– Acoustics Sensor signal

– Ontological concept networks• Military-related concepts and their links

• Make better military decisions – More timely and more accurately– More robust with missing information

Technical Contributions• Cross-Domain Knowledge Propagation

– Propagating Knowledge in surrounding text to visual data– Published in WWW’11, collaboration with Dr. Charu Aggarwal, IBM

• Cross-Category Knowledge Sharing– Exploring the concept correlations to enhance the inference accuracy– To appear in CVPR’11, collaboration with Dr. Charu Aggarwal, IBM

• Modeling Context-Aware Image Similarity – Using Hierarchical Gaussianization (HG), ICCV’09– Applications into Disaster Assessment (Collaboration with Prof. Tarek)– KDD’11, submitted

Cross-Domain Knowledge Propagation:

Two Steps• How to bridge the domain gap between text and

image? – Our approach: We construct a translator function

between text and images that establishes “virtual” links between them.

• How can we annotate image labels from text labels?– Our approach: The labels of text can be propagated

into that of images via the learned translator.

Challenges

• The model can– Work in constrained environment

• Missing links between text and images• Learn translation function to link text and images

– Be resistant to noisy cross-media links, improved QoI

• Misleading related text surrounding images• Use a compact intermediate representation to remove

nonessential and noisy links– Low-rank principle with fewest topics for across-domain

translation

Cross-Domain Label Propagation

( )

( ) ( ) ( ) ( )

1

,sn

s s t ti i T

i

y T x x f x

Label Propagation:


( )

( ) ( ) ( ) ( )

1

,sn

s s t ti i T

i

y T x x f x

Source labels

Label Propagation:


( )

( ) ( ) ( ) ( )

1

,sn

s s t ti i T

i

y T x x f x

Cross-domain translator

Label Propagation:


( )

( ) ( ) ( ) ( )

1

,sn

s s t ti i T

i

y T x x f x

Prediction function

Label Propagation:

Learning Optimal Translator

Learning formulation via optimizing translator function:

• The first term: maximize across-domain association from a set of

co-occurrence pairs of source-target instances.

• The second term: minimize the training loss

• The third term: regularizer for preference of concise translator to

tedious one

• Improve QoI : remove nonessential and noisy observation from translation

process

( )

( ) ( ) ( ) ( ),

1

min ,tn

s t t tk l k l j T jT

j

c T x x y f x T

C

Constructing Cross-Domain Translator

Source instances (text)

Target instances (images)

Bridge the cross-domain gap?


Source instances (text)

Target instances (images)

W(s) W(t)

Common Latent Space

( ) ( ) ( ) ( ), , ' ' 's t s t s t s ts t s tT x x W x W x x W W x x Sx

Inner product in latent space as translator


• A low dimensional latent space is preferred – Impose Normal l2 regularizer to improve the prediction

accuracy

Trace norm

– Equivalent to a low-rank prior on latent space

– Indicate Principle of concise cross-domain translation: “fewer latent topics (dimensionality) are preferred!”

( ) ( )

2 2( ) ( )

'inf

s t

s t

F FS W WS W W

Experiments: Cross-Domain Dataset

• Text corpus and associated images are crawled from Flickr.com and wikipedia.com.

• We extract and spam all tokens in each text document, whose frequencies are used as text features.

• For each image, visual words are extracted with a size of 500 codebook.

Dataset Statistics

Category Number of crawled pairs

Category Number of crawled pairs

Birds 930 Horses 654

Buildings 9216 Mountain 4153

Cars 728 Plane 1356

Cat 229 Train 457

Dog 486 Waterfall 22006

The number of text and image pairs for each category

Dataset (cont’d)

Compared Algorithms

• Image only – only the visual features are used for modeling

classifiers on the target image domain.• Translated Learning by minimizing Risk (TLRisk)

– Transfer text labels in the source domain to the target image domain via a Markovian chain.

• Heterogeneous Transfer Learing (HTL)– Implicitly construct a distance function between

images by a matrix factorization between images and text documents

Results• Average error rates with respect to different

number of training samples in image domain.

Results• Average error rates with respect to different

number of text/image co-occurrence pairs with five training examples)

Results• Number of Topics in latent space for

establishing cross-domain translator Category # topics

Birds 11

Buildings 88

Cars 19

Cat 18

Dog 7

Horses 4

Mountain 6

Plane 15

Train 6

Waterfall 21

Too many building variants!

Revisit Technical Contributions• Cross-Domain Knowledge Propagation

– Propagating Knowledge in surrounding text to visual data– Published in WWW’11, collaboration with Dr. Charu Aggarwal, IBM

• Cross-Category Knowledge Sharing– Exploring the concept correlations to enhance the inference

accuracy– To appear in CVPR’11, collaboration with Dr. Charu Aggarwal, IBM

• Modeling Image Similarity – Hierarchical Gaussianization (HG), ICCV’09– Applications into Disaster Assessment (Collaboration with Prof.

Tarek)

Future Work (Q3)

• Resource allocation based on heterogeneous

links for communication

– Low-redundancy: In base station, send the most

informative message (text/multimedia data)

– High-quality: In data center, recover the lost

information based on redundancy in cross-media links

• Effective linkage analysis with constraints in CCM

Future Work (Q4)

• Develop the stochastic and dynamic model

and theory for MINet

– The effect of structural changes in MINet

• For latency analysis in communication networks

• For constrained linkage discovery in CCM

– The changes of QoI in a dynamic MINet

Path Ahead: Theory and Algorithm• Construct Cross-Media Analysis (CMA) Theory

– Stochastic model for cross-media relation and redundancy

• QoI theory in cross-media networks• Information recovery based on cross-media redundancy • Dynamic model for cross-media networks• Analyze constrained links for CCM

• Practical algorithms for sharing and transmitting information in cross-media links

• Improve low latency and high quality in communication networks based on cross-media analysis

• Applications into CCM for robust constrained link discovery• Cross-media knowledge sharing and discovery

Collaboration Summary

• INARC 1.1: Prof. Tarek Abdelzaher– Cross-media analysis for communication networks– Trade-off between Low latency and high quality

• INARC 1.2: Dr. Charu Aggarwal– Cross-domain knowledge propagation– Cross-Category knowledge sharing– Quality of Information

Publications• Collaboration with Dr. Charu Aggarwal (IBM)

– Guo-Jun Qi, Charu Aggarwal and Thomas Huang, Towards Cross-Domain Knowledge Propagation from Text Corpus to Web Images, to appear in Proc. of International World Wide Web conference (WWW 2011), Hyderabad, India, March 28-April 1, 2011.

– Guo-Jun Qi, Charu Aggarwal, Yong Rui, Qi Tian, Shiyu Chang and Thomas Huang. Towards Cross-Category Knowledge Propagation for Learning Visual Concepts. To appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, Colorado, June 21-23, 2011.

– Guo-Jun Qi, Charu Aggarwal, Thomas Huang. Transfer learning with distance functions between text and web images. Submitted to the ACM KDD Conference, 2011.

• Collaboration with Prof. Tarek Abdelzaher– Md Y. S. Uddin, Guo-Jun Qi, and Tarek Abdelzaher, Thomas Huang, Guohong

Cao, “PhotoNet: A Similarity-aware Image Delivery Service for Situation Awareness,” IPSN Demo, April 2011

Thanks! Q&A

Dataset in Target DomainThe number of images for each category in target domain

Category # Positive examples

# Negative examples

Birds 338 349

Buildings 2301 2388

Cars 120 125

Cat 67 72

Dog 132 142

Horses 263 268

Mountain 927 1065

Plane 509 549

Train 52 53

Waterfall 5153 5737

Learning Optimal Translator


• The first term: measuring the consistency between the observed occurrence of text and images.• Occurrence set

• is monotonically decreasing function, so that a pair with larger occurrence number ck,l will be weighted more.

• Co-occurring pairs of source and target samples probably share the same labels, and the translator T shall have larger response to propagate the labels between them.

( )

( ) ( ) ( ) ( ),

1

min ,tn


j

c T x x y f x T

C

( ) ( ), , ,s t

k l k lc x xC

( ) ( ),, is a source/target pair with occurrence number s t

k l k lx x c

Learning Optimal Predictor


• The second term: the loss function of predictor fT on training set (e.g., logistic loss).

• encode the discriminative knowledge in the training set.

• Large margin principle: it can reduces the noisy information in the occurring set for the classification task.

( )

( ) ( ) ( ) ( ),

1

min ,tn


j

c T x x y f x T

C

log 1 exp( )z z

Learning Optimal Predictor


• The third term: encoding the preference of concise semantic

translation to the tedious one.

• The Principle of constructing “Cross-Domain translator.”

• Nonessential and noisy observation can be filtered out from translation

process

( )

( ) ( ) ( ) ( ),

1

min ,tn


j

c T x x y f x T

C

Results• Number of Topics in latent space for

establishing cross-domain translator Category Two Trn. Ex. Ten Trn. Ex.

Birds 11 9

Buildings 88 102

Cars 19 3

Cat 18 2

Dog 7 5

Horses 4 1

Mountain 6 1

Plane 15 25

Train 6 3

Waterfall 21 26

Too many building variants!

Modeling Context-Aware Image Similarity• Current method

– Image visual similarity – Hierarchical Gaussianization ICCV’09 (Zhou, Huang etc.)

– Hard to model image similarity at semantic level• Model image semantic similarity

– Link images to text documents by translator– Compare associated text similarity for comparing

image semantics– Advantage

• ``Semantic gap” in text documents are smaller• Such similarity reflects semantic level information

DiagramImage Similarity(target domain)

Text-image Associationby learned translator

T (x , y)

Text Similarity(source domain)

Path ahead

• Improve the Quality of Information (QoI) transmitted across domains.– In some cases, the transmitted information may

make a negative effect on classification task (negative information transfer).

– Construct a new model which allows to predict upon target domain itself when the cross-domain information is detected to be noise.

Future Work

• Semantic Level Image similarity in heterogeneous networks– Different sources of heterogeneous sensors, e.g.,

cameras, human annotations and textual descriptions

– Fusing heterogeneous sources in the networks to learn a more descriptive image similarity

– Collaboration with Dr. Charu Aggarwal in IBM on sensor networks and Prof. Tarek Abdelzaher in UIUC on Fact Finder

Linked to INARC Projects

• Collaborator

– Prof. Tarek Abdelzaher in CS, UIUC (I 1.1)

• Fact Finder: Compare the image similarity at semantic

level for discovering trustful sources

– Dr. Charu Aggarwal in IBM (I 1.2)

• Sensor networks: comparing the signal similarity with

cross-domain knowledge