18
08/26/22 Michael Minock Michael Minock The Identification of Missing Information Resources through the Query Difference Operator Michael Minock Michael Minock

The Identification of Missing Information Resources through the Query Difference Operator

Embed Size (px)

DESCRIPTION

The Identification of Missing Information Resources through the Query Difference Operator. Michael Minock. Global Schema. Agree on a non-cyclic set of equi-joins over a set of relations. Virtual outer-join into single relation. Queries. - PowerPoint PPT Presentation

Citation preview

Page 1: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

The Identification of Missing Information Resources through the

Query Difference Operator

Michael MinockMichael Minock

Page 2: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Global Schema

MOVIETitleStar WarsEmpire SBRtrn JediAnne Hall

Year77808377

TypeSciFiSciFiSciFiComedy

SHOWTitleStar WarsEmbire SBEmpire SBGod Father

TheaterGatewayGatewayBruinGateway

Time10pm11pm9pm9pm

CityAustinAustinLAAustin

REVIEWTitleRtrn JediClock Orange

SourceSiskelEbert

EvalUpUp

TitleStar WarsEmpire SBEmpire SBRtrn JediAnne Hall

Year7780808377

TypeSciFiSciFiSciFiSciFiCommedy

Title

Rtrn Jedi

Clock Orange

Source

Siskel

Ebert

Eval

Up

Up

TitleStar WarsEmpire SBEmpire SB

God Father

TheaterGatewayGatewayBruin

Gateway

Time10pm11pm9pm

9pm

CityAustinAustinLA

Austin

Virtual outer-join into single relation

Agree on a non-cyclic set of equi-joins over a set of relations

Page 3: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Queries Simple queries have projection (entire relations) and selection

• Despite the similar appearance, this is not (exactly) relational algebra.

Query super-imposition operator gives “union” over non-union compatible projections. Such queries are termed compound queries.

nccX

...1

sciFitypeMovie

},{Re horrorsciFitypeMoviesciFitypeview

TitleStar WarsEmpire SBRtrn Jedi

Year778083

TypeSciFiSciFiSciFi

TitleStar WarsEmpire SBRtrn Jedi

Year778083

TypeSciFiSciFiSciFi

Title

Rtrn Jedi

Source

Siskel

Eval

Up

A conjunction of simple conditionsA conjunction of simple conditions

Page 4: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

The Query Difference Operator Theorem 1:

• May compute query intersection, subsumption, and equivalence.

Theorem 2:• Query Difference is distributive over compound queries

nnnnnn ccYXcccYXcccYXccYccX

...'...'...'...'... 1'111'11

...

))(())(()()( 4324314321 qqqqqqqqqq

1960195019601950

yearyearMovieyearMovieyearMovie

Page 5: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Query Simplification

CYXCYCX

kjicCXcCXcCX

cccwherekji

Horizontal Merge

Vertical Merge

Absorption

YXandCCwhereCYCYCX

'''

Page 6: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Limitations Caveat - take care in applying negation!

• Simplify when condition attribute is both:– 1.) functionally dependent on the key of each relation in projection set– 2.) not a proper subset of a primary key

Schema• Requires a non-cyclic set of equi-joins to be predetermined among a set of relations

Queries• Projections are (currently) entire relations

• No self-joins

• Implicit inclusion of predetermined schema equi-joins

)( MCCworkMCCworkPersonMCCworkPersonMCCworkPerson

1960195019601950

yearyearMovieyearMovieyearMovie

)18.(18.18.10.

agechildagechildPersonagechildPersonagechildPerson

Page 7: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Internet Distributed Conceptual Information Spaces

Where can Where can I see I see JAWS?JAWS?

Global schema models domain(E.g. Movies, Social Events, Electronics, etc.)

Data resources (agents) advertise their contents to Broker

User’s asks conceptual query to BrokerJaws austin Jaws austin Jaws LAJaws LA… … but no agent knowbut no agent knows about AAs about AA

Broker agent knows schema

BB

Page 8: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Problems Complex Semantics of Global Schema

• Agreement• Common Understanding

Quality of Data• Completeness• Consistency

Quality of Access• Novice query construction• Non-misleading answers

– (identification of missing resources, conceptual answers, etc.)

Page 9: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Approach

Use the Query Difference Operator (defined here) :• Valid only over a restricted class of schemas

• Defines a syntactic method of computing query (concept) difference

Applied here to:• Identify the exact portion of a user’s query that is not covered by any

agent in the information space

• Relevant to other problems as well...

Page 10: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Movie Example

Movie(title,year,type)Movie(title,year,type)

Show(title, theater, time, city)Show(title, theater, time, city)Review(title,source,evaluation)Review(title,source,evaluation)

......

““All films made after All films made after 1927”1927”

Movie Information and Movie Information and reviews for all thereviews for all thedrama and comedy drama and comedy films made betweenfilms made between1927 and 19831927 and 1983

TextbookTextbookMovie Information Movie Information for all the action, for all the action, documentary, documentary, drama, and comedy drama, and comedy films made between films made between 1955 and 19991955 and 1999

Catalog Catalog

Page 11: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Example Queries/Responses

Query 1:Query 1: “Give movie information and reviews for all dramas.” “Give movie information and reviews for all dramas.”

Response: “…, but no reviews for dramas made after 1983.”Response: “…, but no reviews for dramas made after 1983.”

Query 3:Query 3: “Give movie information for all films made in the thirties, “Give movie information for all films made in the thirties, forties, or fifties.”forties, or fifties.”

Query 2:Query 2: “Give movie information and reviews for all dramas “Give movie information and reviews for all dramas and documentaries made in the 1950’s.”and documentaries made in the 1950’s.”

Response: “…, but no movie information for non-drama and non-Response: “…, but no movie information for non-drama and non-comedy films made between 1930and 1954. Also no movie information comedy films made between 1930and 1954. Also no movie information for non-drama, non-action, non-comedy and non-documentary films for non-drama, non-action, non-comedy and non-documentary films made between 1930 and 1959.”made between 1930 and 1959.”

Response: “…, but no movie information or reviews for documentaries made Response: “…, but no movie information or reviews for documentaries made between 1950 and 1954. Also no review information for documentaries madebetween 1950 and 1954. Also no review information for documentaries madebetween 1955 and 1959. ”between 1955 and 1959. ”

Page 12: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Example Calculation(1 of 3)

19601950},{Re, yearyearydocumentardramatypeviewMovie

Query:Query:

19831927},{Re, yearyearcommedydramatypeviewMovie

TextbookTextbook::

Query’ = Query -TextbookQuery’ = Query -Textbook

Query’ =Query’ =

19601950}{Re, yearyearydocumentartypeviewMovie

},{19601950},{Re, comedydramatypeyearyearydocumentardramatypeviewMovie

192719601950},{Re, yearyearyearydocumentardramatypeviewMovie

198319601950},{Re, yearyearyearydocumentardramatypeviewMovie

““Give movie information and reviews for all dramas and Give movie information and reviews for all dramas and documentaries made in the 1950’s.”documentaries made in the 1950’s.”

Page 13: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Example Calculation(2 of 3)

19991955},,,{ yearyearcommedyydocumentaractiondramatypeMovie

Query’ =Query’ =

CatalogCatalog = =

Query’’ = Query’ -CatalogQuery’’ = Query’ -Catalog

19601950}{Re, yearyearydocumentartypeviewMovie

},,,{19601950}{ comedydramaactionydocumentartypeyearyearydocumentartypeMovie

195519601950}{ yearyearyearydocumentartypeMovie

199919601950}{ yearyearyearydocumentartypeMovie

19601950}{Re yearyearydocumentartypeview

Query’’ =Query’’ =

19601950}{Re19551950}{ yearyearydocumentartypeviewyearyearydocumentartypeMovie

Page 14: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Example Calculation(3 of 3)

Query’’ =Query’’ =

19601950}{Re19551950}{ yearyearydocumentartypeviewyearyearydocumentartypeMovie

Via Vertical Merge =Via Vertical Merge =

19551950}{ yearyearydocumentartypeMovie

19601955}{Re19551950}{Re yearyearydocumentartypeviewyearyearydocumentartypeview

Via Horizontal Merge =Via Horizontal Merge =

19601955}{Re19551950}{Re, yearyearydocumentartypeviewyearyearydocumentartypeviewMovie

““No movie information or reviews for documentaries made between 1950 and No movie information or reviews for documentaries made between 1950 and 1954. Also no review information for documentaries made1954. Also no review information for documentaries madebetween 1955 and 1959. ”between 1955 and 1959. ”

Page 15: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Prototype

Proof of concept in LISP

Condition types:

On PC calculates query plan over 1000 resources in <10 seconds

On PC calculates query residual over 1000 resources in <30 seconds

Worst case query lengths governed by number of attributes

Planning a straight C (or JAVA) implementation.

,,,,,

Page 16: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Formal Extensions

Materialized values, open and closed-world modifiers in language• Mixed Intensional/Extensional responses

• Summarization

Encompassing less-restrictive schemas and queries • Inheritance, Cycles through Cliques

• Spatial and Temporal conditions

• Meta-schema (Meta-Meta-schema, …)

• Row types and Semi-structured Data (XML)

Agents• Knowledge and Belief Operators

Page 17: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Applications

Conversational (mixed intensional/extensional) access to distributed information spaces

‘Perfect’ semantic caching over distributed agent systems

Reasoning over contracts, access restrictions, and regulations for electronic commerce among competitive agents

And more...

Page 18: The Identification of Missing Information Resources through the Query Difference Operator

04/19/23 Michael MinockMichael Minock

Conclusions

The Query Difference Operator solves conceptual equations over schema of non-cyclic equi-joins

Applied to the problem of identifying missing information resources

Prototype proves concept and gives good performance

Set of formal extensions and new application ideas proposed