9
Information Integration BIRN supports integration across complex data sources Can process wide variety of structured & semi-structured sources (DBMS, XML, HTML, Excel, XML, SOAP Use Schema & Data Source Modeling & Record Linkage • Infrastructure capabilities Security, Efficient Query Execution, SQL-like syntax across multiple sources. Decision Support Application Programs Mediator Knowledge Bases Databases Computer Programs The Web

Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,

Embed Size (px)

Citation preview

Page 1: Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,

Information Integration

• BIRN supports integration across complex data sources– Can process wide variety of

structured & semi-structured sources (DBMS, XML, HTML, Excel, XML, SOAP

• Use Schema & Data– Source Modeling & Record

Linkage

• Infrastructure capabilities– Security, Efficient Query

Execution, SQL-like syntax across multiple sources.

Decision Support

Application Programs

Mediator

KnowledgeBases

Databases Computer Programs

The Web

Page 2: Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,

Information Mediator

• Virtual Integration Architecture:– Virtual organization: community of data providers

and consumers that want to share data for specific purpose

– Autonomous sources: data, control remains at sources; no change to access methods, schemas; data accessed real-time in response to user queries

– Mediator: integrator defines domain schema and describes source contents

• Domain schema: agreed upon view of the domain preferred by the virtual organization

• Source descriptions: logical formulas relating source and domain schemas

Page 3: Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,

BIRN MEDIATOR

Project Overview

Page 4: Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,

Information Mediator

• Query Answering– User writes query in domain schema– Mediator:

• Determines sources relevant to user query• Rewrites query in sources schemas• Breaks query into sub-queries for sources• Optimizes query evaluation plan• Combines answers from sources

– Efficient query evaluation• Streaming dataflow

Page 5: Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,

The Information MediatorUser Queries / Web Portal / ServicesUser Queries / Web Portal / Services

Secu

ritySecu

rity

Information Integration

‘Capabilities’

Information Integration

‘Capabilities’

Other BIRN ‘Capabilities’Other BIRN ‘Capabilities’ Application SpecificApplication Specific

key

Execution EngineExecution Engine

OptimizerOptimizer

ReformulationReformulation

WrapperWrapper WrapperWrapperWrapperWrapper

Logical SourceDescriptions

Data Sources

Page 6: Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,

Database Consolidation

• Construct a ‘virtual organization’ • A community of data providers and

consumers sharing data for common specific purpose

• All sources are autonomous • data, control remains at sources• no change to access methods, schemas; • data accessed real-time in response to user

queries• Work consists of modeling domain

schema and source contents• Domain schema = agreed upon view of the

domain preferred by the virtual organization• Source descriptions = logical formulas

relating source and domain schemas• Implemented solutions in multiple

domains: • fMRI : Ashish, et al. (2010) “Neuroscience

Data Integration through Mediation: An (F)BIRN Case Study.” Front. Neuroinf. 4:118

• Cardiovascular Research Grid• Non-Human Primate Research

Consortium• Child Neurodevelopmental Disorders

Use Cases

Scientists from different groups want to query across two databases with different schema

Databases may be completely different (i.e., one group uses Excel spreadsheets, another uses Filemaker Pro and a third uses Oracle)

Page 7: Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,

Database Extension

• Reapply the mediator technology to sources from different subjects• e.g., link genetics data to imaging.

• Not dependent on a universal, global ontology, but a locally-defined model specific for the application

Use Cases

Scientists want to bring together data from different sources into a a single, common domain model

Sources will require linkage at the level of schema and data

Page 8: Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,

Screenshots

Page 9: Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,

Screenshots