Upload
tyrone-bean
View
38
Download
0
Tags:
Embed Size (px)
DESCRIPTION
XWRAPComposer. Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources. Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan Rocco Georgia Tech. Outline. State of Art Users’ Perspective Technology Perspective Why SDM Technology – XWRAP Composer - PowerPoint PPT Presentation
Citation preview
1
Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources
Ling Liu, Calton Pu
David Buttler, Wei Han
Henrique Paques, Dan RoccoGeorgia Tech
2
Outline
State of Art Users’ Perspective Technology Perspective
Why SDM Technology – XWRAP Composer Users’ Perspective Technology Perspective
Progress Report and Near Term Deliverables Related Long Term Research
3
Today: Today: Simple Simple Query-Based Query-Based SearchingSearching
Today: Today: Simple Simple Query-Based Query-Based SearchingSearching
Web
Why Automating Complex Associative AccessWhy Automating Complex Associative AccessWhy Automating Complex Associative AccessWhy Automating Complex Associative Access
Large & Unorganized Document CollectionsLarge & Unorganized Document Collections
Tomorrow with SDM Tomorrow with SDM Technology Technology Tomorrow with SDM Tomorrow with SDM Technology Technology
Semantic Semantic
Web Web
Query 3
Query 2
Query 1
Query 4 Query
Complex Associative Access requires experts
Complex Associative Access requires experts
Complex Associative Access is automated (one stop shopping)
Complex Associative Access is automated (one stop shopping)
4
Why Automating Complex Associative AccessWhy Automating Complex Associative AccessWhy Automating Complex Associative AccessWhy Automating Complex Associative Access
Large & Unorganized Document CollectionsLarge & Unorganized Document Collections
CharacterizeCharacterize
SortSort
PartitionPartition
FilterFilter
WebWeb
Today: Today: Simple Simple Query-Based Query-Based SearchingSearching
Today: Today: Simple Simple Query-Based Query-Based SearchingSearching
SummarizeSummarize
Tomorrow with SDM Tomorrow with SDM Technology Technology Tomorrow with SDM Tomorrow with SDM Technology Technology
Semantic Semantic
Web Web
Query 3
Query 2
Query 1
Query 4
5
Automating Complex Associative Access
Wrapper Technology Workflow Technology Semantic Web Technology
Service Discovery Service Selection Service Composition
Research Issues Semantic Data Integration, Interoperability Scalability, High Performance Trusted Computing, Dependable, Survivable
6
XWRAPComposer
What is it? A wrapper generation system that can semi-automatically
generate wrappers (info. extraction programs) capable of accessing multiple scientific Web pages in one
shot. What makes it different from other existing XWRAP
tools? Capable of generating wrappers that extract information
from multiple Web pages connected by URLs (page links) and compose them into an integrated XML document
Extremely useful for Automating Complex Associative Access to multiple scientific data sources
7
Existing Wrapper Existing Wrapper TechnologyTechnologyExisting Wrapper Existing Wrapper TechnologyTechnology
SDM Enabling Technology: XWRAPComposerSDM Enabling Technology: XWRAPComposerSDM Enabling Technology: XWRAPComposerSDM Enabling Technology: XWRAPComposer
Query 1
Query 3
Query 2
Query 4
Seq. LinkWrapper
SequenceWrapper
Blast SumWrapper
Blast DetailWrapper
Extracting Data from a single Web Document
AA045112
CACCTGGAGAAACTTCTGCACTGGCACTGTGTTCCNAGAGCTCCTTCTATGCGTCCCTCC
CAAGTGATTTAATTTCAGCTGATTGGACTACGAATTCACAAGGCAGAAAAGTCAAGGTCA
TTTGGNATCTGGAGACAGGAGAACTCAAGGAACCNAAAGGACT
htgs
8
WrapperComposerWrapperComposerTechnologyTechnologyWrapperComposerWrapperComposerTechnologyTechnology
SDM Enabling Technology: XWRAPComposerSDM Enabling Technology: XWRAPComposerSDM Enabling Technology: XWRAPComposerSDM Enabling Technology: XWRAPComposer
Query 1AA045112
Query 2
Full SeqWrapper
CACCTGGAGAAACTTCTGCACTGGCACTGTGTTCCNAGAGCTCCTTCTATGCGTCCCTCC
CAAGTGATTTAATTTCAGCTGATTGGACTACGAATTCACAAGGCAGAAAAGTCAAGGTCA
TTTGGNATCTGGAGACAGGAGAACTCAAGGAACCNAAAGGACT
htgsBlast
Wrapper
Extracting Data from Multiple Web Documents
9
Given a sequence,
list all matching DNAs.
XWRAPComposer: Technical Perspective
NCBi Blast SiteWeb
Blast Wrapper
Blast Query Page
Blast Format Page
Blast Delay Page
Blast Summary Page
Interface/Outerface Specification Composer Script
Multi-page Control Flow Modeling Data Extraction Workflow
Blast Detail Page
10
SDM Center Data Integration Infrastructure
User (Matt)
Workflow Agent
Service registryand brokering
Data Integration Agent(s)
Data Mediation
Wrapper based AgentWrapper based Agent
Wrapper based Agent
Other Agents(e.g., VIPAR)
Database Access
Com
mu
nic
atio
n P
roto
col G
atew
ay
External Program
XML Wrapper
XML Wrapper
XML Wrapper
Data Source
Data Source
Data Source
XML WrapperXML WrapperXML WrapperXML WrapperData SourceData SourceData SourceData Source
Executable Workflow
Plan: “Matt’s WF”
DB
Data Sources
External InterfaceProgram Interfacing
Other I/O Agents
ExtractionRules
Human Knowledge
GUI
Code Generator
Parameterized Workflow Specification (PWS)
Source Capabilities (SC)
Binding Patterns
User Agent
User constraints & parameters
Workflow ResolutionService (WRS)
Domain Map/Ontology
Workflow InstantiationService (WIS)
WF feasible
WF infeasible:report reason
Data Registration Services Registration
DB
11
Progress Report
Status Produced Three Deliverables
Composer Interface/Outerface Specification Five Java Wrappers for Pilot Scenario Composer Script Examples for Pilto Scenario
XWRAPComposer design and development Near Term Plan
Finish the design of XWRAP Composer scripting language ( Nov. 2002)
Develop the first prototype of XWRAP Composer system (Jan. 2003)
Performance Evaluation (March. 2003)
12
Related Long Term Research
Semantic Web and Semantic Data Integration Service Discovery
dynamic content crawler
Service Selection Adaptive query routing
Service Composition Infopipe Technology