Upload
grant-mcgee
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Information Integration
• BIRN supports integration across complex data sources– Can process wide variety of
structured & semi-structured sources (DBMS, XML, HTML, Excel, XML, SOAP
• Use Schema & Data– Source Modeling & Record
Linkage
• Infrastructure capabilities– Security, Efficient Query
Execution, SQL-like syntax across multiple sources.
Decision Support
Application Programs
Mediator
KnowledgeBases
Databases Computer Programs
The Web
Information Mediator
• Virtual Integration Architecture:– Virtual organization: community of data providers
and consumers that want to share data for specific purpose
– Autonomous sources: data, control remains at sources; no change to access methods, schemas; data accessed real-time in response to user queries
– Mediator: integrator defines domain schema and describes source contents
• Domain schema: agreed upon view of the domain preferred by the virtual organization
• Source descriptions: logical formulas relating source and domain schemas
BIRN MEDIATOR
Project Overview
Information Mediator
• Query Answering– User writes query in domain schema– Mediator:
• Determines sources relevant to user query• Rewrites query in sources schemas• Breaks query into sub-queries for sources• Optimizes query evaluation plan• Combines answers from sources
– Efficient query evaluation• Streaming dataflow
The Information MediatorUser Queries / Web Portal / ServicesUser Queries / Web Portal / Services
Secu
ritySecu
rity
Information Integration
‘Capabilities’
Information Integration
‘Capabilities’
Other BIRN ‘Capabilities’Other BIRN ‘Capabilities’ Application SpecificApplication Specific
key
Execution EngineExecution Engine
OptimizerOptimizer
ReformulationReformulation
WrapperWrapper WrapperWrapperWrapperWrapper
Logical SourceDescriptions
Data Sources
Database Consolidation
• Construct a ‘virtual organization’ • A community of data providers and
consumers sharing data for common specific purpose
• All sources are autonomous • data, control remains at sources• no change to access methods, schemas; • data accessed real-time in response to user
queries• Work consists of modeling domain
schema and source contents• Domain schema = agreed upon view of the
domain preferred by the virtual organization• Source descriptions = logical formulas
relating source and domain schemas• Implemented solutions in multiple
domains: • fMRI : Ashish, et al. (2010) “Neuroscience
Data Integration through Mediation: An (F)BIRN Case Study.” Front. Neuroinf. 4:118
• Cardiovascular Research Grid• Non-Human Primate Research
Consortium• Child Neurodevelopmental Disorders
Use Cases
Scientists from different groups want to query across two databases with different schema
Databases may be completely different (i.e., one group uses Excel spreadsheets, another uses Filemaker Pro and a third uses Oracle)
Database Extension
• Reapply the mediator technology to sources from different subjects• e.g., link genetics data to imaging.
• Not dependent on a universal, global ontology, but a locally-defined model specific for the application
Use Cases
Scientists want to bring together data from different sources into a a single, common domain model
Sources will require linkage at the level of schema and data
Screenshots
Screenshots