Upload
amit-sheth
View
1.009
Download
0
Embed Size (px)
DESCRIPTION
Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999.
Citation preview
Bethesda, Maryland, April 6, 1999Bethesda, Maryland, April 6, 1999
Amit ShethAmit Sheth
Large Scale Distributed Information Systems LabLarge Scale Distributed Information Systems Lab
University of GeorgiaUniversity of Georgia
http://lsdis.cs.uga.eduhttp://lsdis.cs.uga.edu
Information Integration PerspectiveInformation Integration Perspectivedistribution
auto
no
my
heterogeneity
Three perspectives to GlobISThree perspectives to GlobIS
Information Brokering PerspectiveInformation Brokering Perspective
data
meta-data
semantic (terminological,contextual)
““Vision” PerspectiveVision” Perspectivedataconnectivity computing
information
knowledge
MermaidMermaidDDTSDDTS
Multibase, MRDSM, ADDS, Multibase, MRDSM, ADDS, IISS, Omnibase, ...IISS, Omnibase, ...
Generation IGeneration I
1980s1980s
Evolving targets and approaches in integratingEvolving targets and approaches in integratingdata and information data and information (a personal perspective)(a personal perspective)
DL-II projectsDL-II projectsADEPT,ADEPT,InfoQuiltInfoQuilt
Generation IIIGeneration III
1997...1997...
InfoSleuth, KMed, DL-I projectsInfoSleuth, KMed, DL-I projectsInfoscopes, HERMES, SIMS, Infoscopes, HERMES, SIMS,
Garlic,TSIMMIS,Harvest, RUFUS,... Garlic,TSIMMIS,Harvest, RUFUS,...
Generation IIGeneration II
1990s1990s
VisualHarnessVisualHarnessInfoHarnessInfoHarness
a society for ubiquitous exchange of (tradeable) information in all digital forms of representation;
information anywhere, anytime, any forms
Generation IGeneration I
• Data recognized as corporate resource — leverage it!
• Data predominantly in structured databases, different data models,
transitioning from network and hierarchical to relational DBMSs
• Heterogeneity (system, modeling and schematic) as well as need to
support autonomy posed main challenges;
major issues were data access and connectivity
• Information integration through Federated architecture
• Support for corporate IS applications as the primary objective,
update often required, data integrity important
(heterogeneity in FDBMSs)
CCoommmmuunniiccaattiioonn
Hardware/System• instruction set• data representation/coding• configuration
Operating System• file system• naming, file types, operation• transaction support• IPC
Database System• Semantic HeterogeneitySemantic Heterogeneity• Differences in DBMSDifferences in DBMS
• data models data models (abstractions, constraints, query languages)• System level support System level support (concurrency control, commit, recovery)
1970s1970s
1980s1980s
Generation IGeneration I
Generation IGeneration I(Federated Database Systems: Schema Architecture)
ComponentDBS
LocalSchema
ComponentSchema
ExportSchema
ExportSchema
ExportSchema
FederatedSchema
ExternalSchema
ExternalSchema
. . .. . .
ComponentDBS
LocalSchema
ComponentSchema
. . .. . .
. . .. . .
. . .. . .
. . .. . .
schematranslation
schemaintegration
• Model Heterogeneity:
Common/Canonical
Data Model
Schema Translation
• Information sharing
while preserving
autonomy
• Dimensions for
interoperability and
integration:
distribution, autonomy
and heterogeneity
(characterization of schematic conflicts in multidatabase systems)
SchematicSchematicConflictsConflicts
Sheth & Kashyap, Kim & SeoSheth & Kashyap, Kim & Seo
Generalization Conflicts
Aggregation Conflicts
Abstraction LevelAbstraction LevelIncompatibilityIncompatibility
Data Value Attribute Conflict
Entity Attribute Conflict
Data Value Entity Conflict
SchematicSchematicDiscrepanciesDiscrepancies
Naming Conflicts
Database Identifier Conflicts
Schema Isomorphism
Conflicts
Missing Data Items Conflicts
Entity DefinitionEntity DefinitionIncompatibilityIncompatibility
Naming Conflicts
Data Representation Conflicts
Data Scaling Conflicts
Data Precision Conflicts
Default Value Conflicts
Attribute Integrity Constraint Conflicts
Domain DefinitionDomain DefinitionIncompatibilityIncompatibility
Known Inconsistency
Temporal Inconsistency
Acceptable Inconsistency
Data ValueData ValueIncompatibilityIncompatibility
B U Tthese techniques for dealing with schematic heterogeneity do not directly map to dealing with much larger variety of heterogeneous
media
Generation IGeneration I
Generation IIGeneration II
• Significant improvements in computing and connectivity (standardization
of protocol, public network, Internet/Web); remote data access as given;
• Increasing diversity in data formats, with focus on variety of textual data
and semi-structured documents
• Many more data sources, heterogeneous information sources,
but not necessarily better understanding of data
• Use of data beyond traditional business applications:
mining + warehousing, marketing, e-commerce
• Web search engines for keyword based querying against HTML pages;
attribute-based querying available in a few search systems
• Use of metadata for information access; early work on ontology support
distribution applied to metadata in some cases
• Mediator architecture for information management
(limited types of metadata, extractors, mappers, wrappers)
Generation IIGeneration II
Global/EnterpriseWeb Repositories
METADATAMETADATA
EXTRACTORSEXTRACTORS
Digital Maps
NexisUPIAP
Documents
Digital Audios
Data Stores
Digital Videos
Digital Images. . .
. . . . . .
Find Marketing Manager positions in a
company that is within 15 miles of San
Francisco and whose stock price has
been growing at a rate of at least 25%
per year over the last three years
Junglee, SIGMOD Record, Dec. 1997
(a metadata classification: the informartion pyramid)
Generation IIGeneration II
Data (Heterogeneous Types/Media)(Heterogeneous Types/Media)
Content Independent Metadata (creation-date, location, type-of-sensor...)(creation-date, location, type-of-sensor...)
Content Dependent Metadata (size, max colors, rows, columns...)(size, max colors, rows, columns...)
Direct Content Based Metadata (inverted lists, document vectors, WAIS, Glimpse, LSI)(inverted lists, document vectors, WAIS, Glimpse, LSI)
Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML(C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...)Document Type Definitions, C program structure...)
Domain Specific Metadata area, population (Census),area, population (Census), land-cover, relief (GIS),metadata land-cover, relief (GIS),metadata concept descriptions from ontologiesconcept descriptions from ontologies
OntologiesClassificationsClassificationsDomain ModelsDomain Models
User METADATA STANDARDSMETADATA STANDARDS
General Purpose:
Dublin Core, MCF
Domain/industry specific:
Geographic (FGDC, UDK, …),
Library (MARC,…)
Move in thisMove in this
direction to direction to
tackletackle
informationinformation
overload!! overload!!
VisualHarness – an exampleVisualHarness – an example
Query processing and information requestsQuery processing and information requests
NOWNOW
traditional queries based on keywords attribute based queries content-based queries
NEXTNEXT
‘high level’ information requests involving
ontology-based, iconic, mixed-media, and
media-independent information rrequests user selected ontology, use of profiles
What’s next (after comprehensive use of metadata)?What’s next (after comprehensive use of metadata)?
GIS Data Representation – ExampleGIS Data Representation – Example
multiple heterogeneous metadata models with different tag names for the same data in the same GIS domain
FGDC Metadata ModelFGDC Metadata Model
Theme keywordsTheme keywords:: digital line graph,
hydrography, transportation...
TitleTitle: Dakota Aquifer
Online linkageOnline linkage::
http://gisdasc.kgs.ukans.edu/dasc/
Direct Spatial Reference Method:Direct Spatial Reference Method: Vector
Horizontal Coordinate System Definition:Horizontal Coordinate System Definition:
Universal Transverse Mercator
… … … ...
UDK Metadata ModelUDK Metadata Model
Search termsSearch terms:: digital line graph,
hydrography, transportation...
TopicTopic:: Dakota Aquifer
Adress Id:Adress Id:
http://gisdasc.kgs.ukans.edu/dasc/
Measuring Techniques:Measuring Techniques: Vector
Co-ordinate System:Co-ordinate System:
Universal Transverse Mercator
… … … ...
Kansas StateKansas State
Generation IIIGeneration III
• Increasing information overload and broader variety of information
content (video content, audio clips etc) with increasing amount of visual
information, scientific/engineering data
• Continued standardization related to Web for representational and metadata
issues (MCF, RDF, XML)
• Changes in Web architecture; distributed computing (CORBA, Java)
• Users demand simplicity, but complexities continue to rise
• Web is no longer just another information source, but decision supportdecision support through
“data mining and information discovery, information fusion, information
dissemination, knowledge creation and management”, “information management
complemented by cooperation between the information system and humans”
• Information Brokering Architecture proposed for information management
Information Brokering: An Enabler for the InfocosmInformation Brokering: An Enabler for the Infocosm
INFORMATION/DATAINFORMATION/DATAOVERLOADOVERLOAD
INFORMATION PROVIDERS
Newswires
Universities
Corporations
Research Labs
InformationSystem
DataRepository
InformationSystem
INFORMATION CONSUMERS
Corporations
Universities
People
Government
Programs
User Query
User Query
User Query
arbitration between information consumers and providers for resolving
information impedance
INFORMATION BROKERINGINFORMATION BROKERING
InformationSystem
DataRepository
InformationSystem
InformationRequest
InformationRequest
InformationRequest
dynamic reinterpretation of information requests for determination of relevant
information services and products—
dynamic creation and composition of information products
Information Brokering: Three DimensionsInformation Brokering: Three Dimensions
S E M A N T I C SS E M A N T I C S
S T R U C T U R ES T R U C T U R E
S Y N T A XS Y N T A X
S Y S T E MS Y S T E M
C O N S U M E R SC O N S U M E R S
B R O K E R SB R O K E R S
P R O V I D E R SP R O V I D E R S
D A
T A
D A
T A
M E
T A
D A
T A
M E
T A
D A
T A
V O
C A
B U
L A
R Y
V O
C A
B U
L A
R Y
T H R E E D I M E N S I O N S
Objective:Objective: Reduce the problem of knowing structure and semantics of data in the huge
number of information sources on a global scale to: understanding and
navigating a significantly smaller number of domain ontologies
W W WW W W
a confusing heterogeneity of media,formats (Tower of Babel)
information correlation using physical (HREF)links at the extensional data level
location dependent browsing of informationusing physical (HREF) links
user has to keep track of information content !!
W W WW W W + Information Brokering + Information Brokering
Domain Specific Ontologies as “semantic conceptual views”
Information correlation using concept mappings at the intensional concept level
Browsing of information using terminological relationships across ontologies
Higher level of abstraction, closerto user view of information !!
What else can Information Brokering do?What else can Information Brokering do?
Concepts, tools and techniques to support semanticsConcepts, tools and techniques to support semantics
context
media-independentinformation correlations
semanticproximity inter-ontological
relations
ontologies(esp. domain-specific) profiles
domain-specific metadata
Tools to support semanticsTools to support semantics
• Context, context, contextContext, context, context
• Media-independent information correlations
• Multiple ontologies
– Semantic Proximity (relationships between concepts within
and across ontologies) using domain, context,
modeling/abstraction/representation, state
– Characterizing Loss of Information incurred due to
differences in vocabulary
BIG challenge:BIG challenge: identifying relationship oridentifying relationship or
similarity between objects of different media, similarity between objects of different media,
developed and managed by different persons and systemsdeveloped and managed by different persons and systems
We shall focus on these!
Information Brokering over Heterogeneous Information Brokering over Heterogeneous Digital Data: A Metadata-based ApproachDigital Data: A Metadata-based Approach
I N F O R M A T I O N O V E R L O A D =I N F O R M A T I O N O V E R L O A D =
HETEROGENEITY + GLOBALIZATION HETEROGENEITY + GLOBALIZATION
Systems Heterogeneity:Systems Heterogeneity: information system
heterogeneity (DBMSs, concurrency control);
platform Heterogeneity (operating systems,
hardware)
Syntactic Heterogeneity:Syntactic Heterogeneity: different formats
and storage for digital media; machine readable
aspects of data representation
Structural Heterogeneity:Structural Heterogeneity: heterogeneity in
data model constructs;
schematic/representational heterogeneity
Semantic Heterogeneity:Semantic Heterogeneity: terminological/vocabulary heterogeneity;
contextual heterogeneity
Information Resource DiscoveryInformation Resource Discovery
– which/where are the relevant
information sources ?
Modeling of information ContentModeling of information Content
– increasing number of modeling
possibilities
Querying of Information ContentQuerying of Information Content
– Information Focusing
– Information Correlation
– combinatorial combinations of
combining/subsetting information
Heterogeneity...Heterogeneity... … … is a Babel Tower!!is a Babel Tower!!
SEMANTIC INTEROPERABILITYSEMANTIC INTEROPERABILITY
metadata
ontologies
contexts
SEMANTIC HETEROGENEITYSEMANTIC HETEROGENEITY
The InfoQuilt ProjectThe InfoQuilt Project
THE INFOQUILT VISIONTHE INFOQUILT VISION
Semantic interoperability between systems, sharing knowledge Semantic interoperability between systems, sharing knowledge using multiple ontologies using multiple ontologies
Logical correlation of informationLogical correlation of information
Media independent information processingMedia independent information processing
REALIZATION OF THE VISIONREALIZATION OF THE VISION
fully distributed, adaptable, agent-based systemfully distributed, adaptable, agent-based system
information/knowledgement supported by collaborative information/knowledgement supported by collaborative processes processes
http://lsdis.cs.uga.edu/proj/iq/iq.html
InfoQuilt Project: using the InfoQuilt Project: using the MMetadata etadata REFREFerence linkerence link
http://lsdis.cs.uga.edu/proj/iq/iq.html
MREF MREF
Complements HREF, creating a “logical web” through media Complements HREF, creating a “logical web” through media independent ontology & metadata based correlationindependent ontology & metadata based correlation
It is a description of the information asset we want to retrieveIt is a description of the information asset we want to retrieve
MREFMREF
domain ontologies
IQ_Asset ontology +extension ontologies
attributesrelations
constraints
keywords content attributes(color, scene cuts, …)
Semantic Correlation using MREF MREF Concept
Model for logical
correlation using
ontological terms
and metadata
Framework for
representing MREF’s
Serialization
(one implementation
choice)
X M L
M R E F
R D F
domain specific metadata: terms chosen from domain specific ontologies
Domain Specific Correlation – exampleDomain Specific Correlation – example
Potential locations for a future shopping mall identified by all regionsregions having a
populationpopulation greater than 5000, and areaarea greater than 50 sq. ft. having an urban
land coverland cover and moderate reliefrelief <A MREF ATTRIBUTES(population > 5000; area > 50;
region-type = ‘block’; land-cover = ‘urban’; relief = ‘moderate’) can be viewed here</A>
Population:Area:
Land cover:Relief:
Boundaries:
Census DB TIGER/Line DB US Geological Survey
Regions(SQL):
Boundaries
Image Features (image processing routines)
=> media-independent
relationships between domain
specific metadata: population,
area, land cover, relief
=> correlation between image
and structured data at a
higher domain specific level
as opposed to physical “link-
chasing” in the WWW
Domain Specific Correlation – exampleDomain Specific Correlation – example
A DL II approach for Information BrokeringA DL II approach for Information Brokering
CONSTRUCTING ADDITIONALMETA-INFORMATION RESOURCES
Physical/SimulationWorld
DISCOVERING COLLECTIONS OF HETEROGENEOUS INFORMATION AND
META-INFORMATION RESOURCES
Images Data Stores Documents Digital Media
DomainSpecific
Ontologies
Domain Independent Ontologies
Iscape N
CONSTRUCTING APPROPRIATE INFORMATION LANDSCAPESCONSTRUCTING APPROPRIATE INFORMATION LANDSCAPES
Iscape 1
ADEPT Information Landscape Concept PrototypeADEPT Information Landscape Concept Prototype(a scenario for Digital Earth:
learning in the context of the “El Niño” phenomenon)
Sample Iscapes Requests:
– How does El Niño affect sea animals? Look for
broadcast videos of less than 2 minutes.
– How are some regions affected by El Niño? Look at
East/West Pacific regions.
– What disasters have been related to El Niño?
– What storm occurrences are attributed to El Niño?
– Show reports related to El Niño that contain Clinton.
TRY ISCAPE CONCEPT DEMO
request information using
keywordskeywords
domain-specific attributesdomain-specific attributes
domain-independent attributesdomain-independent attributes
Putting MREFs to workPutting MREFs to work
UserAgent
ProfileManager
user information
MREF request
retri
eve
prof
ile
User
display results
changeprofile
design MREFdomain ontologies
MREF Builder
IQ_Asset ontology +extension ontologies
construct new MREF
Broker Agent
send MREFsend results
retrieve MREF
retrieve MREF
MREFrepository
MREFrepository
Userprofiles
Context: the lynchpin of semanticsContext: the lynchpin of semantics
“For instance, if you were to use Yahoo! or Infoseek to
search the web for pizza, your results would probably
be hundreds of matches for the word pizza. Many of
these could be pizza parlors around the world. Yet if
you run the same search within NeighborNet, you will
allows you to order pizza to be delivered instead of
shipped.”
From a Press Resease of FutureOne, Inc. March 24, 1999
http://home.futureone.com/about/pr/021699.asp
Cricket
Constructing c-contexts from ontological termsConstructing c-contexts from ontological terms
Advantages: Use of ontologies for an intensional
domain specific description of data Representation of extra information
Relationships between objects not represented in the database schema
Using terminological relationships in the ontology
ONTOLOGICAL TERMS
C-CONTEXT:
“All documents stored in the database
have been published by some agency”
=> Cdef(DOC) = <(hasOrganization, AgencyConcept)> C-Context = <(C1 , V1) (C2 , V2) ... (Ck , Vk) >
a collection of
contextual coordinates Ci s (roles) and
values Vi s (concepts/concept descriptions)
AgencyConcept
DATABASEOBJECTS
DocumentConcepthasOrganization
AGENCY(RegNo, Name, Affiliation)
DOC(Id, Title, Agency)
Using c-contexts to reason about Using c-contexts to reason about
information in databaseinformation in database
Cdef(DOC)
<(hasOrganization, AgencyConcept)>
CQ
<(hasOrganization, { “USGS”})>
- Reasoning with c-contexts: glb(Cdef(DOC), CQ)
- Ontological Inferences:
- DocumentConcept
- (hasOrganization, { “USGS” })
Challenge 1: use of multiple ontologies
Challenge 2: estimating the loss of information
EXAMPLEEXAMPLE
glb(Cdef(DOC), CQ)
<(self, DocumentConcept),(hasOrganization, { “USGS” })>
Estimating information loss for multi-ontology basedEstimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt system query processing in the OBSERVER/InfoQuilt system
OBSERVER architectureOBSERVER architecture
Data Repositories
Mappings
Ontologies
COMPONENT NODE
Data Repositories
Mappings
Ontologies
COMPONENT NODE
Data Repositories
Mappings
OntologyServer
QueryProcessor
UserQuery
Ontologies
USER NODE
InterontologiesTerminologicalRelationships
IRM
IRM NODE
OntologyServer
OntologyServer
QueryProcessor
QueryProcessor
Eduardo Mena (III’98)
Estimating information loss for multi-ontology basedEstimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt system query processing in the OBSERVER/InfoQuilt system
“Get title and number of pages of books written by Carl Sagan”
Query construction - ExampleQuery construction - Example
Eduardo Mena (III’98)
User ontology: WN
[name pages] for
(AND book (FILLS creator “Carl Sagan”))
Target ontology: Stanford-I
Integrated ontology WN-Stanford-I
[title number-of-pages] for
(AND book (FILLS doc-author-name “Carl Sagan”))
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.htmlOntologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
Estimating information loss for multi-ontology basedEstimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt system query processing in the OBSERVER/InfoQuilt system
“Get title and number of pages of books written by Carl Sagan”
Query construction - ExampleQuery construction - Example
Eduardo Mena (III’98)
User ontology: WN
[name pages] for
(AND book (FILLS creator “Carl Sagan”))
Target ontology: Stanford-I
Integrated ontology WN-Stanford-I
[title number-of-pages] for
(AND book (FILLS doc-author-name “Carl Sagan”))
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.htmlOntologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
Biblio-Thing
Document
Book
Edited-Book
Technical-Report
Periodical-Publication
Journal
Magazine
Newspaper
Miscellaneous-Publication
Technical-Manual
Computer-Program
Multimedia-DocumentArtwork
Cartographic-Map
Thesis
Doctoral-Thesis
Master-Thesis
Proceedings
Conference Agent
PersonAuthor Organization
Publisher University
Re-use of Knowledge:Bibliography Data Ontology
Re-use of Knowledge:Bibliography Data OntologyStanford-I
Estimating information loss for multi-ontology basedEstimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt system query processing in the OBSERVER/InfoQuilt system
“Get title and number of pages of books written by Carl Sagan”
Query construction - ExampleQuery construction - Example
Eduardo Mena (III’98)
User ontology: WN
[name pages] for
(AND book (FILLS creator “Carl Sagan”))
Target ontology: Stanford-I
Integrated ontology WN-Stanford-I
[title number-of-pages] for
(AND book (FILLS doc-author-name “Carl Sagan”))
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.htmlOntologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
Re-use of Knowledge:A subset of WordNet 1.5Re-use of Knowledge:
A subset of WordNet 1.5Print-Media
Press Publication Journalism
Newspaper MagazineBook
Periodical
Trade-Book Brochure TextBook
Reference-BookSongBook
PrayerBook
PictorialSeries
Journals
CookBook
Instruction-BookWordBook HandBook Directory Annual
Encyclopedia
Manual Bible GuideBook
Instructions Reference-Manual
Estimating information loss for multi-ontology basedEstimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt system query processing in the OBSERVER/InfoQuilt system
“Get title and number of pages of books written by Carl Sagan”
Query construction - ExampleQuery construction - Example
Eduardo Mena (III’98)
User ontology: WN
[name pages] for
(AND book (FILLS creator “Carl Sagan”))
Target ontology: Stanford-I
Integrated ontology WN-Stanford-I
[title number-of-pages] for
(AND book (FILLS doc-author-name “Carl Sagan”))
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.htmlOntologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
WN ontology and user query
Estimating information loss for multi-ontology basedEstimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt system query processing in the OBSERVER/InfoQuilt system
Estimating the loss of informationEstimating the loss of information
Eduardo Mena (III’98)
To choose the plan with the least loss
To present a level of confidence in the answer
Based on intensional information (terminological difference)
Based on extensional information (precision and recall)
Plans in the examplePlans in the example User Query: (AND book (FILLS doc-author-name “Carl Sagan”))
Plan 1: (AND document (FILLS doc-author-name “Carl Sagan”))
Plan 2: (AND periodical-publication (FILLS doc-author-name “Carl Sagan”))
Plan 3: (AND journal (FILLS doc-author-name “Carl Sagan”))
Plan 4: (AND UNION(book, proceedings, thesis, misc-publication, technical-report)
(FILLS doc-author-name “Carl Sagan”))
Estimating information loss for multi-ontology basedEstimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt system query processing in the OBSERVER/InfoQuilt system
Loss of information based on intensional informationLoss of information based on intensional information
Eduardo Mena (III’98)
User Query: (AND book (FILLS doc-author-name “Carl Sagan”))
Plan 1:
(AND document (FILLS doc-author-name “Carl Sagan”))
book:=(AND publication (AT-LEAST 1 ISBN))
publication:=(AND document (AT-LEAST 1 place-of-publication))
Loss: “Instead of books written by Carl Sagan, OBSERVER is
providing all the documents written by Carl Sagan (even if they
do not have an ISBN and place of publication)”
Estimating information loss for multi-ontology basedEstimating information loss for multi-ontology basedquery processing in the OBSERVER/InfoQuilt system query processing in the OBSERVER/InfoQuilt system
Example: loss for the plansExample: loss for the plans
Eduardo Mena (III’98)
Plan 1: (AND document (FILLS doc-author-name “Carl Sagan”)) [case 2]
91.57% < (1-Loss) < 91.75%
Plan 2: (AND periodical-publication (FILLS doc-author-name “Carl Sagan”))
94.03% < (1-Loss) < 100% [case 3]
Plan 3: (AND journal (FILLS doc-author-name “Carl Sagan”)) [case 3]
98.56% < (1-Loss) < 100%
Plan 4: (AND UNION(book, proceedings, thesis, misc-publication, technical-
report) (FILLS doc-author-name “Carl Sagan”)) [case 1]
0% < (1-Loss) < 7.22%
Summary Summary
TextTextStructured DatabasesStructured Databases DataData Syntax,Syntax,
SystemSystem Federated DBFederated DB
Semi-structuredSemi-structured MetadataMetadata Structural,Structural,SchematicSchematic
Mediator,Mediator,Federated ISFederated IS
Visual,Visual,Scientific/Eng.Scientific/Eng. KnowledgeKnowledge SemanticSemantic
Knowledge Mgmt.,Knowledge Mgmt.,InformationInformationBrokering,Brokering,
Cooperative ISCooperative IS
Agenda for research Agenda for research
Interoperation not at systems level, but at informational and
possibly knowledge level
– traditional database and information retrieval solutions
do not suffice
– need to understand context; measures of similarities
Need to increase impetus on semantic level issues involving
terminological and contextual differences, possible
perceptual
or cognitive differences in future
– information systems and humans need to cooperate,
possible involving a coordination and collaborative
processes
http://lsdis.cs.uga.eduhttp://lsdis.cs.uga.edu[See publications on Metadata, Semantics,Context, [See publications on Metadata, Semantics,Context, InfoHarness/InfoQuilt]InfoHarness/InfoQuilt]
[email protected]@cs.uga.edu
Acknowledgements:Acknowledgements:Tarcisio LimaTarcisio Lima
Vipul KashyapVipul Kashyap
Related ReadingRelated Reading
Books: Information Brokering for Digital Media, Kashyap and Sheth, Kluwer,
1999 (to appear)
Multimedia Data Management: Using Metadata to Integrate and Apply
Digital Media, Sheth and Klas Eds, McGraw-Hill, 1998
Cooperative Information Systems, Papazoglou and Schlageter Eds.,
Academic Press, 1998
Management of Heterogeneous and Autonomous Database Systems,
Elmagarmid, Rusinkiewica, Sheth Eds, Morgan Kaufmann, 1998.
Special Issues and Proceedings: Formal Ontologies in Information Systems, Guarino Ed., IOS Press, 1998
Semantic Interoperability in Global Information Systems, Ouksel and
Sheth, SIGMOD Record, March 1999.