57
Planning and Optimizing Planning and Optimizing Semantic Information Requests Semantic Information Requests Using Domain Modeling and Using Domain Modeling and Resource Characteristics Resource Characteristics by Shuchi Patel

by Shuchi Patel

Embed Size (px)

DESCRIPTION

Planning and Optimizing Semantic Information Requests Using Domain Modeling and Resource Characteristics. by Shuchi Patel. Outline. Motivation InfoQuilt Background Planning and Optimization IScape Execution Execution Monitoring Related Work Conclusions and Future Work. Motivation. - PowerPoint PPT Presentation

Citation preview

Planning and Optimizing Planning and Optimizing Semantic Information Requests Semantic Information Requests Using Domain Modeling and Using Domain Modeling and Resource CharacteristicsResource Characteristics

byShuchi Patel

OutlineOutline

Motivation InfoQuilt Background Planning and Optimization IScape Execution Execution Monitoring Related Work Conclusions and Future Work

MotivationMotivation

Explosion of data Heterogeneities between sources Limitations of web search techniques Limitations of database search

techniques Need to manually integrate data

Related WorkRelated Work

SIMS (USC) TSIMMIS (Stanford, IBM Almaden) Information Manifold (AT&T) OBSERVER Infomaster (Stanford)

InfoQuilt - GoalInfoQuilt - Goal

To provide an environment that allows users to query and analyze the data available from a multitude of diverse autonomous sources (including web-based sources), gain better understanding of the domains and their interactions as well as study hypothetical relationships to establish or disprove them.

Important Building BlocksImportant Building Blocks

Semantic domain modeling Semantic inter-domain relationship

modeling Resource characteristics modeling Complex operations Semantic information request modeling Learning paradigm

OntologiesOntologies

Disaster

eventDate

description

region => latitude, longitude

sitelatitude

longitude

Natural Disaster

Man-made Disaster

damage

numberOfDeaths

damagePhoto

Volcano

EarthquakeNuclearTest

magnitude

bodyWaveMagnitude

conductedBy

explosiveYield

bodyWaveMagnitude < 10

bodyWaveMagnitude > 0

magnitude < 10

magnitude > 0

Terms/Concepts(Attributes)

Functional Dependencies

(FDs)

Domain Rules

Hierarchies

Ontologies..Ontologies..

NuclearTest ( site, explosiveYield, bodyWaveMagnitude,

testType, eventDate, conductedBy,

latitude, longitude,

bodyWaveMagnitude > 0,

bodyWwaveMagnitude < 10,

testSite -> latitude longitude );

OperationsOperations

Complex operators Post-processing data Simulations

Clarke Urban Growth ModelModeling urban growth and land use

change

Clarke Urban Growth Model Clarke Urban Growth Model (UGM)(UGM)

Source: http://edcdgs9.cr.usgs.gov/urban/factsht.pdf

Inter-Ontological RelationshipsInter-Ontological Relationships

A nuclear test could have caused an earthquakeif the earthquake occurred some time after thenuclear test was conducted and in a nearby region.

NuclearTest Causes Earthquake

<= dateDifference( NuclearTest.eventDate,

Earthquake.eventDate ) < 30

AND distance( NuclearTest.latitude,

NuclearTest.longitude,

Earthquake,latitude,

Earthquake.longitude ) < 10000

Resource modelingResource modeling

Attributes available Data Characteristic (DC) rules

A web site on earthquakesearthquakes after January 1, 1990

eventDate > “January 1, 1990”

Resource Modeling...Resource Modeling...

Local CompletenessHartsfield International Airport(http://atlanta-airport.com/)

flights to and from Atlanta airport

toCity = “Atlanta”

fromCity = “Atlanta”

Resource Modeling...Resource Modeling...

Binding Pattern

[toCity, toState, fromCity, fromState, departureMonth, departureDay]

AirTran Airways (www.airtran.com)

Information Scape (IScape)Information Scape (IScape)

Specified in terms of components of knowledge base

“understands” user’s information request

“Find all earthquakes with epicenter less than 5000 mile from the location at latitude 60.790 North and longitude 97.570 East and find all tsunamis that they might have caused”

Learning ParadigmLearning Paradigm

Understand domains and relationships between them

Query and analyze data from multiple autonomous and heterogeneous sources

Explore potential relationships Analyze data to support or disprove

potential relationships

Where we are...Where we are...

Motivation ADEPT Background Planning and Optimization IScape Execution Execution Monitoring Related Work Conclusions and Future Work

Planning and OptimizationPlanning and Optimization

IScapes are specified in terms of ontologies

Source selection Execution plans that are executable Plans that retrieve more complete

information Integrate data from sources Optimization using domain and resource

characteristics

IScape 1IScape 1NuclearTestsDB( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, [dc] waveMagnitude > 3, [dc] eventDate > “January 1, 1985” );

NuclearTestSites( testSite, latitude, longitude );

SignificantEarthquakesDB( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, [dc] eventDate > “January 1, 1970” );

NuclearTest( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, latitude, longitude, waveMagnitude > 0, waveMagnitude < 10, testSite -> latitude longitude );

Earthquake( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, magnitude > 0 );

“Find all nuclear tests conducted by India or Pakistan after January 1, 1995 with seismic body wave magnitude > 4.5 and find all earthquakes that could have been

caused due to these tests.”

NuclearTest Causes Earthquake <= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000

Ontology Ontology

ResourceResource

Resource

Relationship

IScape

Semantic CheckSemantic Check

Apply domain rules Check if IScape is semantically

correct Constraint reduction

IScape 1IScape 1NuclearTestsDB( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, [dc] waveMagnitude > 3, [dc] eventDate > “January 1, 1985” );

NuclearTestSites( testSite, latitude, longitude );

SignificantEarthquakesDB( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, [dc] eventDate > “January 1, 1970” );

NuclearTest( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, latitude, longitude, waveMagnitude > 0, waveMagnitude < 10, testSite -> latitude longitude );

Earthquake( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, magnitude > 0 );

“Find all nuclear tests conducted by India or Pakistan after January 1, 1995 with seismic body wave magnitude > 4.5 and find all earthquakes that could have been

caused due to these tests.”

NuclearTest Causes Earthquake <= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000

Ontology Ontology

ResourceResource

Resource

Relationship

IScape

Source SelectionSource Selection

One ontology at a time First check locally complete sources

The DC rules of the resource should not falsify the IScape constraint

If none found, select all sources The DC rules of a resource should not falsify

the IScape’s constraint Binding Patterns on resources should be

respected

IScape 1IScape 1

NuclearTestsDB( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, [dc] waveMagnitude > 3, [dc] eventDate > “January 1, 1985” );

NuclearTestSites( testSite, latitude, longitude );

SignificantEarthquakesDB( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, [dc] eventDate > “January 1, 1970” );

NuclearTest( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, latitude, longitude, waveMagnitude > 0, waveMagnitude < 10, testSite -> latitude longitude );

Earthquake( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, magnitude > 0 );

“Find all nuclear tests conducted by India or Pakistan after January 1, 1995 with seismic body wave magnitude > 4.5 and find all earthquakes that could have been

caused due to these tests.”

NuclearTest Causes Earthquake <= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000

Ontology Ontology

ResourceResource

Resource

Relationship

IScape

Missing AttributesMissing Attributes

Use functional dependencies (FD) <attribute>+ -> <missing attributes><attribute>*

Couple with associate resource

Primary resource Associate resource

Join (using LHS attributes)

All available attributes(A, B, C, D)

LHS attributes + missing attributes(B, C, E, F)BC -> DEF

Primary.B = Associate.B ANDPrimary.C = Associate.C

A, B, C, D A, B, C, E, F

Missing Attributes…Missing Attributes…

Criteria for FD All the missing attributes should be in the

RHS of the FD All attributes in the LHS of the FD should be

available from the primary resource.

Primary resource Associate resource

Join (using LHS attributes)

All available attributes(A, B, C, D)

LHS attributes + missing attributes(B, C, E, F)BC -> DEF

Primary.B = Associate.B ANDPrimary.C = Associate.C

A, B, C, D A, B, C, E, F

Missing Attributes…Missing Attributes…

Criteria for associate resource Provide missing attributes + attributes in LHS of FD Resource rules should not falsify query constraint Resource rules should not falsify resource rules on

primary resource BP can be supplied by primary resource

Primary resource Associate resource

Join (using LHS attributes)

All available attributes(A, B, C, D)

LHS attributes + missing attributes(B, C, E, F)BC -> DEF

Primary.B = Associate.B ANDPrimary.C = Associate.C

A, B, C, D A, B, C, E, F

BPSupplier and Join

IScape 1IScape 1

NuclearTestsDB( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, [dc] waveMagnitude > 3, [dc] eventDate > “January 1, 1985” );

NuclearTestSites( testSite, latitude, longitude );

SignificantEarthquakesDB( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, [dc] eventDate > “January 1, 1970” );

NuclearTest( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, latitude, longitude, waveMagnitude > 0, waveMagnitude < 10, testSite -> latitude longitude );

Earthquake( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, magnitude > 0 );

“Find all nuclear tests conducted by India or Pakistan after January 1, 1995 with seismic body wave magnitude > 4.5 and find all earthquakes that could have been

caused due to these tests.”

NuclearTest Causes Earthquake <= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000

Ontology Ontology

ResourceResource

Resource

Relationship

IScape

Sources SelectedSources Selected

Resource Access

NuclearTestsDB

testSite, explosiveYield,waveMagnitude, testType,eventDate, conductedBy

Resource AccessNuclearTestSites

testSite, Latitude,longitude

Resource Access

SignificantEarthquakesDB

eventDate, description,Region, magnitude,Latitude, longitude,numberOfDeaths, damagePhoto

JointestSiteEquals ( NuclearTestsDB.testSite, NuclearTestSites.testSite )Select

NuclearTest.waveMagnitude > 4.5AND ( NuclearTest.conductedBy = “India” OR NuclearTest.conductedBy = “Pakistan” )

Use of FD to retrieve missing attributes

Use of function to resolve syntactic

heterogeneity

Data IntegrationData Integration

Integrate data from all sources Union Function Evaluations Constraint Checking Relationship Evaluation Aggregations Projection

Data IntegrationData Integration

Resource Access

NuclearTestsDB

Resource Access

NuclearTestSites

Resource Access

SignificantEarthquakesDB

Join

testSiteEquals ( NuclearTestsDB.testSite, NuclearTestSites.testSite )Select

NuclearTest.waveMagnitude > 4.5AND ( NuclearTest.conductedBy = “India” OR NuclearTest.conductedBy = “Pakistan” )

Union * Union *

Data IntegrationData Integration

Union * Union *

Function Evaluator

dateDifference ( “January 1, 1995”, NuclearTest.eventDate )

Select

dateDifference ( “January 1, 1995”, NuclearTest.eventDate ) > 0

NuclearTest Earthquake

Data IntegrationData Integration

Union * Union *

Function Evaluator

dateDifference ( “January 1, 1995”, NuclearTest.eventDate )

Select

dateDifference ( “January 1, 1995”, NuclearTest.eventDate ) > 0

NuclearTest Earthquake

Relationship Evaluator

NuclearTest Causes Earthquake

dateDifference ( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance ( NuclearTest.latitude, NuclearTest.longitude, Earthquake.latitude, Earthquake.longitude ) < 10000

Data Integration…Data Integration…

Relationship Evaluator

NuclearTest Causes Earthquake

Project

N.testSite, N.eventDate, N.testType, N.explosiveYield, N.waveMagnitude, N.conductedBy, E.eventDate, E.region, E.description, E.magnitude, E.numberOfDeaths, E.damagePhoto, dateDifference( N.eventDate, E.eventDate ), distance( N.latitude, N.longitude, E.latitude, E.longitude )

N = NuclearTestE = Earthquake

ResultsResults

IScape 2IScape 2

YahooTravel ( airlineCompany, flightNo, aircraft, fromCity, fromState, toCity, toState, departureDate, meals, departureTime, arrivalTime, [fromCity, fromState, toCity, toState, departureDate] );

AirlineLogos ( airlineCompany, airlineLogo );

WeatherChannel ( date, city, state, description, icon, hiTemp, loTemp, [city, state] );

DirectFlight ( airlineCompany, airlineLogo, flightNo, aircraft, fromCity, fromState, toCity, toState, departureDate, fare, meals, departureTime, arrivalTime, airlineCompany -> airlineLogo );

DailyWeather ( date, city, state, description, icon, hiTemp, loTemp );

“Find all direct flights from Atlanta, GA to Boston, MA for March 23, 2001 and show the weather in the destination city on that day.”

Ontology Ontology

ResourceResource

Resource

IScape

AirTranAirways( airlineCompany, flightNo, fromCity, fromState, toCity, toState, departureDate, fare, departureTime, arrivalTime, [dc] airlineCompany = “AirTran Airways”, [lc] airlineCompany = “AirTran Airways”, [fromCity, fromState, toCity, toState, departureDate]);

Resourcehttp://travel.yahoo.com http://www.airtran.com

http://www.weather.com

Plan GenerationPlan Generation

IScape is semantically correct Locally complete sources available but

not applicable

IScape 2IScape 2

YahooTravel ( airlineCompany, flightNo, aircraft, fromCity, fromState, toCity, toState, departureDate, meals, departureTime, arrivalTime, [fromCity, fromState, toCity, toState, departureDate] );

AirlineLogos ( airlineCompany, airlineLogo );

WeatherChannel ( date, city, state, description, icon, hiTemp, loTemp, [city, state] );

DirectFlight ( airlineCompany, airlineLogo, flightNo, aircraft, fromCity, fromState, toCity, toState, departureDate, fare, meals, departureTime, arrivalTime, airlineCompany -> airlineLogo );

DailyWeather ( date, city, state, description, icon, hiTemp, loTemp );

“Find all direct flights from Atlanta, GA to Boston, MA for March 23, 2001 and show the weather in the destination city on that day.”

Ontology Ontology

ResourceResource

Resource

IScape

AirTranAirways( airlineCompany, flightNo, fromCity, fromState, toCity, toState, departureDate, fare, departureTime, arrivalTime, [dc] airlineCompany = “AirTran Airways”, [lc] airlineCompany = “AirTran Airways”, [fromCity, fromState, toCity, toState, departureDate]);

Resourcehttp://travel.yahoo.com http://www.airtran.com

http://www.weather.com

Binding PatternsBinding Patterns

The execution plan should specify : Which BP is to be used How values for the BP attributes should be

supplied Values can be supplied in following

ways Query constraint Attributes in other ontologies Associate Resource

Binding Pattern Using Binding Pattern Using Associated ResourceAssociated Resource

Criteria for associate resource Should supply all attributes needed for BP Values for its BP, if any, should be supplied from

IScape’s constraint only Resource rules involving only BP attributes being

retrieved should not falsify IScape’s constraint

Primary resource Associate resource

BP Supplier

All attributes(A, B, C, D)

BP attributes(A, B)

A, B, C, D[A, B]

A, B, C, E, F

BP attributes(A, B)

Binding Patterns…Binding Patterns…

BP of AirTranAirways and YahooTravel can be supplied from query constraint

DirectFlight.fromCity = “Atlanta” AND DirectFlight.fromState = “GA” AND DirectFlight.toCity = “Boston” AND DirectFlight.toState = “MA” AND DirectFlight.departureDate = “March 23, 2001” AND DailyWeather.city = DirectFlight.toCity AND DailyWeather.state = DirectFlight.toState AND DailyWeather.date = DirectFlight.departureDate

[ fromCity: (“Atlanta”), fromState: (“GA”), toCity: (“Boston”), toState: (“MA”), departureDate: (“March 23, 2001”) ]

Binding Patterns…Binding Patterns…

BP of WeatherChannel to be supplied using attributes from DirectFlight

DirectFlight.fromCity = “Atlanta” AND DirectFlight.fromState = “GA” AND DirectFlight.toCity = “Boston” AND DirectFlight.toState = “MA” AND DirectFlight.departureDate = “March 23, 2001” AND DailyWeather.city = DirectFlight.toCity AND DailyWeather.state = DirectFlight.toState AND DailyWeather.date = DirectFlight.departureDate

Sources SelectedSources Selected

Resource Access

YahooTravel

airlineCompany, flightNo, aircraft, departureDate, departureTime, …

fromCity = “Atlanta”, fromState = “GA”, toCity = “Boston”, toState = “MA”, departureDate = “March 23, 2001”

Resource AccessAirlineLogos

airlineCompany, airlineLogo

Resource Access

AirTranAirways

airlineCompany, flightNo, aircraft, departureDate, departureTime, …

fromCity = “Atlanta”, fromState = “GA”, toCity = “Boston”, toState = “MA”, departureDate = “March 23, 2001”

Join

YahooTravel.airlineCompany = AirlineLogos.airlineCompany

Resource Access

WeatherChannel

Desrcription, icon, hiTemp, loTemp, city, state, date

Join

AirTranAirways.airlineCompany = AirlineLogos.airlineCompany

BPSupplier

W.city = F.toCity, W.state = F.toState, W.date = F.departureDate

W = DailyWeatherF = DirectFlight

Use of FD to retrieve missing attributes

BP supplied using IScape constraint

Only one Resource Access node for one resource if possible

Use BPSupplier Node when BP values are retrieved from

another ontology

Data IntegrationData Integration

Resource Access

YahooTravel

fromCity = “Atlanta”, fromState = “GA”, toCity = “Boston”, toState = “MA”, departureDate = “March 23, 2001”

Resource AccessAirlineLogos

airlineCompany, airlineLogo

Resource Access

AirTranAirways

fromCity = “Atlanta”, fromState = “GA”, toCity = “Boston”, toState = “MA”, departureDate = “March 23, 2001”

Join

YahooTravel.airlineCompany = AirlineLogos.airlineCompany

Resource Access

WeatherChannel

Desrcription, icon, hiTemp, loTemp, city, state, date

Join

AirTranAirways.airlineCompany = AirlineLogos.airlineCompany

BPSupplier

W.city = F.toCity, W.state = F.toState, W.date = F.departureDate

W = DailyWeatherF = DirectFlight

Union *Intermediate Union

Data Integration…Data Integration…

Resource Access

YahooTravel

Resource AccessAirlineLogos

Resource Access

AirTranAirways

Join

YahooTravel.airlineCompany = AirlineLogos.airlineCompany

Resource Access

WeatherChannel

Join

AirTranAirways.airlineCompany = AirlineLogos.airlineCompany

BPSupplier

W.city = F.toCity, W.state = F.toState, W.date = F.departureDate

W = DailyWeatherF = DirectFlight

Union *

BP values are retrieved from intermediate

union

Data Integration…Data Integration…

Resource Access

YahooTravel

Resource AccessAirlineLogos

Resource Access

AirTranAirways

Join

Resource Access

WeatherChannel

Join

BPSupplier

W = DailyWeatherF = DirectFlight

Union *

Join

W.city = F.toCity AND W.state = F.toState AND W.day = F.departureDay AND W.month = F.departureMonth

Project

F.airlineCompany, F.airlineLogo, F.flightNo, F.aircraft, …, W.description, W.icon, W.date, W.loTemp, W.hiTemp,…

ResultsResults

IScape ExecutionIScape Execution

IScape

Plan

Plan

KnowledgeIScape

Query Query Query

Data retrieved

Final Results

Final Results

Where we are...Where we are...

Motivation ADEPT Background Planning and Optimization IScape Execution Execution Monitoring Related Work Conclusions and Future Work

Execution MonitoringExecution Monitoring

IScape Processing Monitor (IPM) GUI High-level debugger Allows monitoring how much time each

phase of IScape processing takes Allows localizing errors

IScape Processing Monitor IScape Processing Monitor (IScape 1)(IScape 1)

IScape Processing Monitor IScape Processing Monitor (IScape 2)(IScape 2)

Related WorkRelated Work

Features of InfoQuilt not supported by any other systems Ability to assist in learning about domains

and complex inter-domain relationships Support for use of functions and simulations

to post-process Support for complex relationships and

constraints that can use functions as special operators

Powerful semantic query interface (IScapes)

Related Work…Related Work…

SIMS Mediator specialized to one domain Cannot use local completeness information

about sources One BP per resource

OBSERVER Limited to basic relationships Resource models are not as rich

Related Work…Related Work…

TSIMMIS Mediators defined using MSL Adding or removing sources is difficult Query-centric (uses pre-defined query templates) Can answer a restricted set of queries

Information Manifold No domain rules, FDs Local Completeness can not be modeled

precisely Capability records cannot model query capability

limitations precisely

ContributionsContributions

Planning and Optimization Algorithm Efficient source selection Ability to use sources in conjunction to

retrieve more complete information Generation of executable plans Integration of information retrieved from the

sources selected Multi-threaded IScape execution IScape Processing Monitor Framework for functions and simulations

Future WorkFuture Work

The Planning Agent could create backup plans that the Correlation Agent can switch to on failure

More precise specification of query capabilities of the resource

Better framework for simulations

Thank You!Thank You!