32
I1.1 Fundamentals for Context-aware Real-time Data Fusion Lead: Roth (UIUC) Abdelzaher (UIUC) Huang (UIUC) Lei (IBM) Presented by: Tarek Abdelzaher

Lead: Roth (UIUC) Abdelzaher (UIUC) Huang (UIUC) Lei (IBM) Presented by: Tarek Abdelzaher

Embed Size (px)

Citation preview

Fundamentals of Context-aware Real-time Data Fusion

I1.1 Fundamentals for Context-aware Real-time Data FusionLead: Roth (UIUC) Abdelzaher (UIUC)Huang (UIUC)Lei (IBM)

Presented by: Tarek Abdelzaher

Task Goal and OverviewAccounting for Prior Knowledge with Constrained Conditional ModelsPrior KnowledgeInformationNetworkUncovering Links in Heterogeneous Content?Communication NetworkResource BottleneckSensor, text, image, and human sourcesLatencyLatency AnalysisGoal:Foundations for utilizing context and prior knowledge in fusionFoundations for analysis of fusion latency.

Data Fusion ThreadsThread 1: Enable exploitation of prior knowledge and information network links in the design of algorithms for data fusion (Dan Roth: UIUC)Thread 2: Enhance ability to uncover links between heterogeneous content items, such as text and video (Huang, UIUC)Thread 3: Advance latency analysis of distributed data fusion algorithms (Abdelzaher, UIUC) Thread 4: Validate the results on viable platforms and crowd-sourcing applications (Lei, IBM Research)

OutlineAccounting for Prior Knowledge with Constrained Conditional ModelsPrior KnowledgeInformationNetworkUncovering Links in Heterogeneous Content?Communication NetworkResource BottleneckSensor, text, image, and human sourcesLatencyLatency Analysis

OutlineAccounting for Prior Knowledge with Constrained Conditional ModelsPrior KnowledgeInformationNetworkUncovering Links in Heterogeneous Content?Communication NetworkResource BottleneckSensor, text, image, and human sourcesLatencyLatency AnalysisThread 1

Thread 1: A Framework for Integrating Prior Knowledge: Fundamentals of Context-aware Real-time Data FusionAdvances in Learning & Inference of Constrained Conditional ModelsCCM: A computational framework for learning and inference with interdependent variables in constrained settingsFormulating Information Fusion as CCMs. Preliminary theoretical and experimental work on Information FusionKey Publications: R. Samdani and D. Roth, Efficient Learning for Constrained Structured Prediction, submitted. M. Chang, M. Connor and D. Roth, The Necessity of Combining Adaptation Methods, EMNLP10. M. Chang, V. Srikumar, D. Goldwasser and D. Roth, Structured Output Learning with Indirect Supervision, ICML10. M. Chang, D. Goldwasser, D. Roth and V. Srikumar, Discriminative Learning over Constrained Latent Representations, NAACL10G. Kundu, D. Roth and R. Samdani, Constrained Conditional Models for Information Fusion, submitted.6

7Predict values of multiple, interdependent labels (in contexts as diverse as information extraction, information trustworthiness, information fusion, etc.) Modeling complex dependencies leads to intractability of learning & inference (decision making)Leads to over-simplification & unjustified independence assumptions

Constrained conditional models (CCMs) pair relatively simple learning models with expressive prior knowledge in the form of declarative constraints in supporting global decisions. Learn models for sub-problems; incorporate models information, along with prior knowledge/constraints, in making globally coherent decisions Fusion as a Decision Problem

Learn models; Acquire knowledge/constraints; Make decisions.

Recent Progress: LoCL (Locally Consistent Learning): a scheme which is consistent with Global Learning under certain conditions while being efficient. Theoretical contribution and experimental confirmation on info extraction tasks. Illustrative ExampleLearning an optimal path ACABCABoptBCoptSub-problem ABSub-problem BCIllustrative ExampleLearning an optimal pathABCABoptBCoptConstraint: No left turns at BIllustrative ExampleLearning an optimal pathABCBCoptConstraint: No left turns at BLoCL: Using local models + constraints find global optimaGlobal OptimumGlobal OptimumApplication: Disaster Scenario

Information SourcesResource constraints: RouterCommand CenterSelected Information

Predicted States @ various locations: {y1,y2,yn}FeedbackText MessagesImagesData from sensorsPredict output states of different locations over consecutive time stepsOutput space is spatially and temporally structuredExpressing this structure using constraints can help make coherent predictions and boost accuracy.

OutlineAccounting for Prior Knowledge with Constrained Conditional ModelsPrior KnowledgeInformationNetworkUncovering Links in Heterogeneous Content?Communication NetworkResource BottleneckSensor, text, image, and human sourcesLatencyLatency Analysis

Thread 1OutlineAccounting for Prior Knowledge with Constrained Conditional ModelsPrior KnowledgeInformationNetworkUncovering Links in Heterogeneous Content?Communication NetworkResource BottleneckSensor, text, image, and human sourcesLatencyLatency Analysis

Thread 2Thread 1Thread 2: Constructing Cross-Domain Translator(UIUC, IBM)Source instances (text)Target instances (images)Bridge the cross-domain gap?Concrete the translators we used for cross-domain knowledge propagation.

14Constructing Cross-Domain TranslatorSource instances (text)Target instances (images)W(s)W(t)Common Latent Space

Inner product in latent space as translatorTwo different domains are mapped into a common intermediate space with the same dimension.

Inner product in this intermediate representation space is used to bridge two heterogeneous space.

15Technical ContributionsCross-Domain Knowledge PropagationPropagating Knowledge in surrounding text to visual dataPublished in WWW11, collaboration with Dr. Charu Aggarwal, IBMCross-Category Knowledge SharingExploring the concept correlations to enhance the inference accuracyTo appear in CVPR11, collaboration with Dr. Charu Aggarwal, IBMModeling Context-Aware Image Similarity Applications into Disaster Assessment (Collaboration with Prof. Tarek Abdelzaher)KDD11, submitted

Cross-Domain: hereogeneis data fusion (special concept)

16OutlineAccounting for Prior Knowledge with Constrained Conditional ModelsPrior KnowledgeInformationNetworkUncovering Links in Heterogeneous Content?Communication NetworkResource BottleneckSensor, text, image, and human sourcesLatencyLatency Analysis

Thread 1Thread 2OutlineAccounting for Prior Knowledge with Constrained Conditional ModelsPrior KnowledgeInformationNetworkUncovering Links in Heterogeneous Content?Communication NetworkResource BottleneckSensor, text, image, and human sourcesLatencyLatency Analysis

Thread 3Thread 1Thread 2Thread 3: Latency AnalysisIn Collaboration with Aylin Yener, CNARCGoal: Answer the question: How much work can be done on time (given different data fusion workflows and different end-to-end deadlines)Derive the real-time capacity region (load region where deadlines are met) Model:Different data flows share distributed computational and communication resourcesEach flow is represented by its own workflow graphDifferent flows have different end-to-end deadlines (worst-case allowable end-to-end latency)Results:An algebra for reducing distributed workflows to equivalent canonical centralized systemsA real-time capacity region for the canonical system

A Reduction Theory for Distributed SystemsIn collaboration with CNARC (OICC)Based on reduction of distributed systems to an equivalent uniprocessorC1,1 = 2C1,2 = 1.1C2,1 = 1C2,2 = 1.8Stage 1Stage 2C1,max = 2C2,max = 1.8Cmax,1 = 2Cmax,2 = 1.8F1 F2 C1eq = 2C2eq = 1.8Equivalent Uniprocessor

Reduction of Busy PipelinesC1,1 = 2C1,2 = 1.1C2,1 = 1C2,2 = 1.8Stage 1Stage 2C1,max = 2C2,max = 1.8Cmax,1 = 2Cmax,2 = 1.8F1 F2 10 pipeline jobs of F110 pipeline jobs of F2Stage 2Stage 1TimeTime(b) Uniprocessor Approximation10 uniprocessor jobs of C1,max each10 uniprocessor jobs of C2,max each(a) Original Pipeline Execution

Reduction of Data Fusion Trees1221311F1 F2 21222F3 212112212233F1 F2 F3 (a) Distributed Data Fusion System of Three Workflows(b) Equivalent Uniprocessor

New: Reduction of Data Fusion Trees1221311F1 F2 21222F3 212112212233F1 F2 F3 (a) Distributed Data Fusion System of Three Workflows(b) Equivalent Uniprocessor

The Real-time Capacity RegionThe real-time capacity theorem: In a system with a set, S, of processing workflows, where each workflow Fi in S incurs an effective utilization uieffect on an equivalent uniprocessor and has a job rate Ri and a per-job end-to-end maximum latency constraint, Di, all jobs meet their end-to-end deadlines if:

where:

The Real-time Capacity RegionThe real-time capacity theorem: In a system with a set, S, of processing workflows, where each workflow Fi in S incurs an effective utilization uieffect on an equivalent uniprocessor and has a job rate Ri and a per-job end-to-end maximum latency constraint, Di, all jobs meet their end-to-end deadlines if:

where:

Guaranteed (safe) real-time capacity region

Performance EvaluationTheoretically predicted real-time capacity bound is very close to empirical onset of deadline misses

Thread 4: ValidationIBM, UIUCDevelop a general platform reusable for different mobile crowd-sensing applications to experiment with data fusion applications

Mobile SensingDevices

Access Appliances

Wide AreaNetwork

Application Gateway

Data Center

Smart supply chainSmart gridSmart healthcareSmart building

app1app2app3

MCS Data Broker

MCS Gateway

Domain Analytics LibrarySocial ArchitectureMCS Device AgentMCS Data Collector

Road AheadAnalysis of trade-offs between timeliness and fusion qualityInvestigation of the dependency of fusion quality and timeliness on distributed resource allocation. Integration of prior knowledge, constraints, and resource distribution issues into future data fusion algorithms.Improving quality/cost trade-offs via link discovery (between text and video)Information-network-aware real-time capacity of data fusion. Validation, documentation and publications.

CollaborationsFusion TaskI1.1New fusion algorithmsAccurate, timelyQoI TaskI1.2Better storage policies Better fusion from human sources Capacity TaskI1.2Community ModelingS2.2CNARC QoI TaskI1.2In-networkStorageI2.1/C2.1Decisions under StressS3.1Provenance TaskT1.3

Characterization of QoI/cost trade-offsImproved diagnostic capabilities in fusion systemsImproved network QoI optimization for fusion systemsImproved effective operational capacityPapersThread 1 (Q1):(UIUC): Gourab Kundu, Rajhans Samdani, Dan Roth, Constrained Conditional Models for Information Fusion, submitted to Fusion 2011(UIUC): Dan Roth at al. Efficient Learning for Constrained Structured Prediction Submitted to ICML 2011

Thread 2 (Q2):(INARC+CNARC): Forrest Iandola, Fatemeh Saremi, Tarek Abdelzaher, Praveen Jayachandran, Aylin Yener, Real-time Capacity of Networked Data Fusion, submitted to Fusion 2011

More PapersThread 3 (I1.1-I1.2 Collaboration/Multi-institution):(UIUC+IBM) G. Qi, C. Aggarwal, T. Huang, Towards Semantic Knowledge Propagation between text and web images, WWW Conference, 2011.(UIUC+IBM) Guo-Jun Qi, Charu Aggarwal, Yong Rui, Qi Tian, Shiyu Chang and Thomas Huang, Towards Cross-Category Knowledge Propagation for Learning Cross-domain Concepts, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, Colorado, June 21-23, 2011(IBM+UIUC) C. Aggarwal, Y. Zhao, P. Yu. On Wavelet Decomposition of Uncertain Text Streams, CIKM Conference, 2011. (UIUC+IBM) G. Qi, C. Aggarwal, T. Huang, Transfer learning with distance functions between text and web images, Submitted to the ACM KDD Conference, 2011.(UIUC+IBM) G. Qi, C. Aggarwal, H. Ji, T. Huang, Exploring Content and Context-based Links in Social Media: A Latent Space Method, Submitted to IEEE Transactions on Pattern Mining (TPAMI)Thread 4 (Q3/Q4)Raghu Ganti, Fan Ye, Hui Lei, Mobile Crowdsensing: Current State and Future Challenges, in submission to IEEE Comm. Magazine

Military RelevanceEnhanced warfighters ability to interpret reports, sensory data, and soft information sources for making the right decisionsEnhanced exploitation of semantic links between information items to improve data fusion accuracyImproved ability to utilize context and background knowledge in interpreting dataSignificantly improved situation assessment in the presence of heterogeneous content Improved latency analysis algorithms for data fusion systems to ensure timeliness of fusion results